# MLOPs Practical Sessioon

In this session we covers following.

1. Git
2. Experiment Tracking
3. Model Registery
4. CICD Pipelines.
5. Stream Lit App

<img src="arch.jpeg" alt="Alternative text" />


## Prerequisites 

Make sure that you have,

1. git and conda or python installed
2. Github account
3. Google Colab account
4. Streamlit Account


## Step 01

1. Sign Up at https://ngrok.com/ - This will allow you to expose mlflow instance to web
2. Create a copy of following notebook and replace the key from ngrok and run the full notebook on colab

https://colab.research.google.com/drive/1voZySe48KMO8A6Gn9R2wVQOzffAQkFis#scrollTo=5er5I9lwW-9H

## Step 02

Update the following with your relative path to project.
This will allow python to identify relative imports

In [None]:
import sys
sys.path.append(r"<your relative path to project>")

## Step 03 - Let's do Some Codeing!

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import mlflow
import xgboost as xgb
from data.load_data import load_cali_house_data, get_features_and_labels
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from mlflow.tracking.client import MlflowClient

In [None]:
# import os, ssl
# if (not os.environ.get('PYTHONHTTPSVERIFY', '') and
#     getattr(ssl, '_create_unverified_context', None)): 
#     ssl._create_default_https_context = ssl._create_unverified_context

In [None]:
# Loads default California Dataset
data = load_cali_house_data()

In [None]:
data.head()

In [None]:
data.count()

In [None]:
#Obtaining info of the correlations with a heatmap
plt.figure(figsize=(15,8))
corr = data.corr()
mask = np.zeros_like(corr, dtype=bool)
mask[np.triu_indices_from(mask)] = True
sns.heatmap(data.corr(), linewidths=.5,annot=True,mask=mask,cmap='coolwarm')


### Let's do some simple Modeling

play with model parameters and try to find the optimal parameters for the XGboost Model

#### XGBOOST Model

In [None]:
X, y = get_features_and_labels(data)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=123
)

xg_reg = xgb.XGBRegressor(
    objective="reg:linear",
    colsample_bytree=0.3,
    learning_rate=0.1,
    max_depth=5,
    alpha=10,
    n_estimators=10,
)

xg_reg.fit(X_train, y_train)

preds = xg_reg.predict(X_test)
rmse = np.sqrt(mean_squared_error(y_test, preds))

print(rmse)

#### XGBOOST with Cross Validation

In [None]:
X, y = get_features_and_labels(data)
data_dmatrix = xgb.DMatrix(data=X, label=y)

params = {
    "objective": "reg:squarederror",
    "colsample_bytree": 0.3,
    "learning_rate": 0.1,
    "max_depth": 6,
    "alpha": 10,
}

cv_results = xgb.cv(
    dtrain=data_dmatrix,
    params=params,
    nfold=3,
    num_boost_round=50,
    early_stopping_rounds=10,
    metrics="rmse",
    as_pandas=True,
    seed=123,
)

test_rmse = cv_results["test-rmse-mean"].tail(1)
xg_reg = xgb.train(params=params, dtrain=data_dmatrix, num_boost_round=10)

### Introducing Experiment Tracking

What if we can track our experiments automatically! Without needing to compare and save results all the time!

This is where Mlflow Comes into play!

In [None]:
from common.mlflow import setup_mlflow_experiment

# Replace Url (server) and Experiment id
setup_mlflow_experiment('https://b710-35-237-37-228.ngrok-free.app/', 959816713686281380)

XGB

In [None]:
mlflow.autolog(exclusive=False)

with mlflow.start_run():
   
    # Use the code from previous Experiment

    mlflow.log_metric("root_mean_squared_error", rmse)


XGB CV

In [None]:
mlflow.autolog(exclusive=False)

with mlflow.start_run():

    #  Use the code from previous Experiment
    
    mlflow.log_metric("root_mean_squared_error", test_rmse)

# More on Mlflow

In [None]:
MODEL_NAME = "xgb_california"
STAGE = "Staging"


loaded_model = mlflow.pyfunc.load_model(model_uri=f"models:/{MODEL_NAME}/{STAGE}")

In [5]:
from mlflow.tracking.client import MlflowClient

client = MlflowClient('https://35a0-35-222-156-22.ngrok-free.app/')


client.get_latest_versions("xgb_california")

[<ModelVersion: aliases=[], creation_timestamp=1698343556311, current_stage='None', description='', last_updated_timestamp=1698343556311, name='xgb_california', run_id='9610e71fe0b04e7b9e674dd60cabb7d6', run_link='', source='mlflow-artifacts:/790539312356347373/9610e71fe0b04e7b9e674dd60cabb7d6/artifacts/model', status='READY', status_message='', tags={}, user_id='', version='1'>]

In [6]:
models = client.get_latest_versions("xgb_california")

In [8]:
for model in models:
    if model.current_stage == 'Staging':
        model_version = model.version
        print(model_version)
