# Notebook for make profiling of di-f Correlation experiments

## Experiment name: mxretailsalary1

## Experiment General Data
### Team roles:
* **PipeMaster**: jag.pascoe
* **BizEngineer**: 
* **DataEngineer**:
* **MLEngineer**:
* **SWEngineer**:

### Description (Use case):
Predict salary per day estimation to be obtained for working in retail sector in any state of Mexico.
Supposing you are looking for being hired in a Retail Business in any of Mexico's state you want to. You want to predict which would be the base salary per day you might get as attendant of that retail business. This salary not include any commision, tax, or any other concept.

### Type of experiment: Correlation
### Independent Variables (inputs):
1) State of Mexico where you are supposing to get hired (CAT). 
2) How many employees (including yourself) work in that particular business now (NUMBER)
3) How much sales per day in average, you estimate you will provide to that business in pesos (FLOAT)

### Dependent Variables (outputs):
1) Estimated base salary per day (FLOAT)

## Experiment preparation, imports and config.yaml

In [96]:
%load_ext autoreload
%autoreload 2
# The %load_ext autoreload and %autoreload 2 magic commands are used to automatically 
# reload modules when they are changed. This can be useful when you are developing code 
# in an interactive environment, as it allows you to see the changes you make to your modules 
# without having to restart the kernel.
import os
from hydra import initialize, initialize_config_module, initialize_config_dir, compose
from omegaconf import OmegaConf
import pandas as pd
import numpy as np
import os


# for global initialization: NOT RECOMMENDED
#initialize(version_base=None, config_path="../src/conf")
#compose(config_name='config')

with initialize(version_base=None, config_path="../src/conf"):
    cfg = compose(config_name='config')
    print(cfg)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
{'general_ml': {'seed': 123, 'encoding': 'iso-8859-1', 'cloud': 'AWS'}, 'paths': {'project_dir': '/home/jagpascoe/democlient-sklearn/dif-s-mxretailsalary1', 'raw_data': '${paths.project_dir}/data/raw', 'interim_data': '${paths.project_dir}/data/interim', 'processed_data': '${paths.project_dir}/data/processed', 'reports': '${paths.project_dir}/reports', 'models': '${paths.project_dir}/models', 'api': '${paths.project_dir}/API'}, 'cloud_paths': {'bucket_path': 'dif-b-democlient-sklearn', 'experiment_path': '${cloud_paths.bucket_path}/mxretailsalary1', 'mlflow_path': '${cloud_paths.experiment_path}/mlflow', 'reports_path': '${cloud_paths.experiment_path}/reports', 'rawdata_path': '${cloud_paths.experiment_path}/raw-data', 'dvc_path': '${cloud_paths.experiment_path}/dvc-store'}, 'file_names': {'raw_file': 'raw-data.csv', 'processed_data': 'processed_data.csv', 'processed_unseen_data': 'processed_unseen_

In [97]:
#Choose the Ml model to be applied, among: regression, Classifications, time_series, Clustering, NLP
from pycaret.regression import *
from pycaret import version_
version_

'3.0.3'

## Loading the model and unseen_data

### first the model
This function loads a previously saved pipeline.

In [98]:
selected_model = load_model(os.path.join(cfg.paths.models, cfg.file_names.ml_profiling_best))
selected_model

Transformation Pipeline and Model Successfully Loaded


### then, unseen_data

In [99]:
unseen_data = pd.read_csv(os.path.join(cfg.paths.processed_data, cfg.file_names.processed_unseen_data), 
                   #encoding=cfg.general_ml.encoding,
                   )
unseen_data.head()

Unnamed: 0,state,income_employee_day,employees_business,salary_employee_day
0,Jalisco,7748.607601,8,335.00455
1,BCS,17169.646707,10,427.141343
2,Michoacan,4440.25043,4,114.076594
3,Oaxaca,7157.242554,5,116.098021
4,Veracruz,8658.960204,6,252.697749


### now predict on unseen_data
This function generates the label using a trained model.  When data is None, it predicts label and score on the holdout set. 

In [100]:
predict_model(selected_model, data=unseen_data) 

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Gradient Boosting Regressor,68.4946,10684.3755,103.3653,0.8544,0.3527,0.3086


Unnamed: 0,state,income_employee_day,employees_business,salary_employee_day,prediction_label
0,Jalisco,7748.607422,8,335.004547,296.176613
1,BCS,17169.646484,10,427.141357,537.048199
2,Michoacan,4440.250488,4,114.076591,138.007091
3,Oaxaca,7157.242676,5,116.098022,209.174732
4,Veracruz,8658.959961,6,252.697754,310.186561
...,...,...,...,...,...
745,Sinaloa,24660.839844,12,708.758545,700.681651
746,Campeche,17880.515625,7,461.477997,556.023704
747,Chiapas,9882.582031,4,123.586868,271.182639
748,Jalisco,8075.348633,6,130.168106,250.560429


## Deployment

### as API with FastAPI
This function takes an input model and creates a POST API for inference. It only creates the API and doesn't run it automatically. 
Once you initialize API with the !python command. You can see the server on localhost:8000/docs.

In [101]:
create_api(selected_model, os.path.join(cfg.paths.api, cfg.file_names.api_ml_profiling))

API successfully created. This function only creates a POST API, it doesn't run it automatically. To run your API, please run this command --> !python /home/jagpascoe/democlient-sklearn/dif-s-mxretailsalary1/API/ml_profiling_best_API.py


## creating docker config files
This function creates a Dockerfile and requirements.txt for productionalizing API end-point.

In [102]:

create_docker(os.path.join(cfg.paths.api, cfg.file_names.api_ml_profiling))

Writing requirements.txt
Writing Dockerfile
Dockerfile and requirements.txt successfully created.
    To build image you have to run --> !docker image build -f "Dockerfile" -t IMAGE_NAME:IMAGE_TAG .
            


## create an app
This function creates a basic gradio app for inference. It will later be expanded for other app types such Streamlit.

In [103]:
create_app(selected_model)

Running on local URL:  http://127.0.0.1:7863

To create a public link, set `share=True` in `launch()`.


