<a href="https://colab.research.google.com/github/ShridharBagalkote/python/blob/main/MLOps_workflow_shri.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Project Overview: Wine Quality Prediction using MLOps**

-Objective: To implement an end-to-end MLOps pipeline for predicting wine quality based on physicochemical properties.

-Dataset: The wine quality dataset from the UCI Machine Learning Repository.

**Key Steps**

-Data Preprocessing: Cleaning and feature engineering for optimal performance.

-Model Development: Training and evaluating multiple models to select the best one.

-Pipeline Automation: Integrating data preprocessing, model training, and evaluation into a CI/CD pipeline.

-Deployment: Deploying the final model as a web service using MLOps practices.

$Step-1$: **Install required packages**

In [4]:
!mlflow

/bin/bash: line 1: mlflow: command not found


In [5]:
!pip install  mlflow

Collecting mlflow
  Downloading mlflow-2.19.0-py3-none-any.whl.metadata (30 kB)
Collecting mlflow-skinny==2.19.0 (from mlflow)
  Downloading mlflow_skinny-2.19.0-py3-none-any.whl.metadata (31 kB)
Collecting alembic!=1.10.0,<2 (from mlflow)
  Downloading alembic-1.14.0-py3-none-any.whl.metadata (7.4 kB)
Collecting docker<8,>=4.0.0 (from mlflow)
  Downloading docker-7.1.0-py3-none-any.whl.metadata (3.8 kB)
Collecting graphene<4 (from mlflow)
  Downloading graphene-3.4.3-py2.py3-none-any.whl.metadata (6.9 kB)
Collecting gunicorn<24 (from mlflow)
  Downloading gunicorn-23.0.0-py3-none-any.whl.metadata (4.4 kB)
Collecting databricks-sdk<1,>=0.20.0 (from mlflow-skinny==2.19.0->mlflow)
  Downloading databricks_sdk-0.40.0-py3-none-any.whl.metadata (38 kB)
Collecting Mako (from alembic!=1.10.0,<2->mlflow)
  Downloading Mako-1.3.8-py3-none-any.whl.metadata (2.9 kB)
Collecting graphql-core<3.3,>=3.1 (from graphene<4->mlflow)
  Downloading graphql_core-3.2.5-py3-none-any.whl.metadata (10 kB)
Colle

In [6]:
!mlflow

Usage: mlflow [OPTIONS] COMMAND [ARGS]...

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  artifacts    Upload, list, and download artifacts from an MLflow...
  db           Commands for managing an MLflow tracking database.
  deployments  Deploy MLflow models to custom targets.
  doctor       Prints out useful information for debugging issues with MLflow.
  experiments  Manage experiments.
  gc           Permanently delete runs in the `deleted` lifecycle stage.
  models       Deploy MLflow models locally.
  recipes      Run MLflow Recipes and inspect recipe results.
  run          Run an MLflow project from the given URI.
  runs         Manage runs.
  sagemaker    Serve models on SageMaker.
  server       Run the MLflow tracking server.


In [7]:
!mlflow --version

mlflow, version 2.19.0


$Step-2$:**Import required packages**

In [13]:
import numpy as np
import pandas as pd
from sklearn.linear_model import ElasticNet
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score
import mlflow
import mlflow.sklearn

In [12]:
# Here i want to use a Regression model : ElasticNet

# Supervised algo ======= input and output

# output is numerical data ========= regression

# Name of the model : ElasticNet
# Data: wine quality data

$Step-3$:**Read the data**

In [15]:
data= pd.read_csv("/content/winequality_red.csv")
data.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


In [16]:
# It is a supervised learning
# quality is output columns has numrical data
# It is a regression problem

In [17]:
data.shape

# This dataset has 1599 rows and 12 columns are there
# 11columns are input columns (X)
# 1 column i.e quality is output/target (y)

(1599, 12)

In [18]:
data.columns

Index(['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'pH', 'sulphates', 'alcohol', 'quality'],
      dtype='object')

In [19]:
# Our interest is model development and deployment
# To develop model make sure we have all the columns have numerical data
# math works only on numbers
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 12 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   fixed acidity         1599 non-null   float64
 1   volatile acidity      1599 non-null   float64
 2   citric acid           1599 non-null   float64
 3   residual sugar        1599 non-null   float64
 4   chlorides             1599 non-null   float64
 5   free sulfur dioxide   1599 non-null   float64
 6   total sulfur dioxide  1599 non-null   float64
 7   density               1599 non-null   float64
 8   pH                    1599 non-null   float64
 9   sulphates             1599 non-null   float64
 10  alcohol               1599 non-null   float64
 11  quality               1599 non-null   int64  
dtypes: float64(11), int64(1)
memory usage: 150.0 KB


In [20]:
data.isnull().sum()

Unnamed: 0,0
fixed acidity,0
volatile acidity,0
citric acid,0
residual sugar,0
chlorides,0
free sulfur dioxide,0
total sulfur dioxide,0
density,0
pH,0
sulphates,0


$Step-4$: **Model development using MLflow**

- we divide data into two parts train_data and test_data

- we divide train_data into X_train and y_train

- we divide test_data into X_test and y_test

- Model development happens on train_data

- Model predictions happens on X_test data, this will give y_predictions

- Finally i will compare with y_test

- I will divide my data into

- 70:30 ratio

- 1599 (70%) ========== train_data

-1599 (30%) ========== test_data



In [21]:
train,test=train_test_split(data,test_size=0.3,random_state=1234)
train.shape,test.shape

((1119, 12), (480, 12))

In [22]:
train.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
642,9.9,0.54,0.45,2.3,0.071,16.0,40.0,0.9991,3.39,0.62,9.4,5
678,8.3,0.78,0.1,2.6,0.081,45.0,87.0,0.9983,3.48,0.53,10.0,5
412,7.1,0.735,0.16,1.9,0.1,15.0,77.0,0.9966,3.27,0.64,9.3,5
73,8.3,0.675,0.26,2.1,0.084,11.0,43.0,0.9976,3.31,0.53,9.2,4
985,7.4,0.58,0.0,2.0,0.064,7.0,11.0,0.99562,3.45,0.58,11.3,6


In [23]:
# Now divide data into train_X and train_y
train_x=train.drop(['quality'],axis=1)
test_x=test.drop(['quality'],axis=1)

train_y=train[['quality']]
test_y=test[['quality']]


In [24]:
train_x.shape,train_y.shape

((1119, 11), (1119, 1))

In [26]:
test_x.shape,test_y.shape

((480, 11), (480, 1))

$Step-5: **Start mlfow**

In [27]:
# Mlflow is cloud platform
# I want to develop a model , deployment
# Omkar create a model1 -----mlflow
# Abhishek model1 ========= mlflow

# mlflow will create some workspace
# in the name of experiment

In [28]:
mlflow.set_experiment("/mlflow/mlops_workspace_shridhar")

2024/12/21 11:29:08 INFO mlflow.tracking.fluent: Experiment with name '/mlflow/mlops_workspace_shridhar' does not exist. Creating a new experiment.


<Experiment: artifact_location='file:///content/mlruns/807102866266784869', creation_time=1734780548470, experiment_id='807102866266784869', last_update_time=1734780548470, lifecycle_stage='active', name='/mlflow/mlops_workspace_shridhar', tags={}>

In [29]:
print(mlflow.set_experiment("/mlflow/mlops_workspace_shridhar").experiment_id)
print(mlflow.set_experiment("/mlflow/mlops_workspace_shridhar").lifecycle_stage)
print(mlflow.set_experiment("/mlflow/mlops_workspace_shridhar").name)

807102866266784869
active
/mlflow/mlops_workspace_shridhar


In [30]:
def train_model(alpha,l1_ratio):
    #=========Develop train test split=================

    train,test=train_test_split(data,test_size=0.3,random_state=1234)
    train_x=train.drop(['quality'],axis=1)
    test_x=test.drop(['quality'],axis=1)
    train_y=train[['quality']]
    test_y=test[['quality']]

    # ================ Initiate mlflow==================
    with mlflow.start_run(experiment_id=807102866266784869,run_name='Regression',description='Performing a regression and analysis'):
      #=================== call ml model====================
      lr=ElasticNet(alpha=alpha,l1_ratio=l1_ratio)
      lr.fit(train_x,train_y)

      #===============Model prediction===================
      predicted_data=lr.predict(test_x)

      #==================Model evaluation================

      rmse=np.sqrt(mean_squared_error(test_y,predicted_data))
      mae=mean_absolute_error(test_y,predicted_data)
      r2=r2_score(test_y,predicted_data)
      print("rmse:", rmse)
      print("mae:",mae)
      print("r2:",r2)

      #=====================Log the metrics, parameters and model on mlflow=====
      mlflow.log_param("alpha",alpha)
      mlflow.log_param("l1_ratio",l1_ratio)

      mlflow.log_metric("rmse",rmse)
      mlflow.log_metric("mae",mae)
      mlflow.log_metric("r2",r2)

      mlflow.sklearn.log_model(lr,"model",registered_model_name='ElasticNet')


In [33]:
train_model(0.4,0.5)

rmse: 0.6940832047245566
mae: 0.5466837260017379
r2: 0.1552271885700255


Registered model 'ElasticNet' already exists. Creating a new version of this model...
Created version '3' of model 'ElasticNet'.


In [36]:
# We need to show this on mlflow userinterface

In [38]:
# To create mlflow UI from google colab notebook
# We need to create a tunnel

# tunnel name ======== ngrok

!pip install pyngrok

Collecting pyngrok
  Downloading pyngrok-7.2.2-py3-none-any.whl.metadata (8.4 kB)
Downloading pyngrok-7.2.2-py3-none-any.whl (22 kB)
Installing collected packages: pyngrok
Successfully installed pyngrok-7.2.2


In [39]:
# Create a tunnel
from pyngrok import ngrok
ngrok_tunnel=ngrok.connect(addr='5000',proto='http')
print("Tracking uri:",ngrok_tunnel.public_url)



ERROR:pyngrok.process.ngrok:t=2024-12-21T12:26:29+0000 lvl=eror msg="failed to reconnect session" obj=tunnels.session err="authentication failed: Usage of ngrok requires a verified account and authtoken.\n\nSign up for an account: https://dashboard.ngrok.com/signup\nInstall your authtoken: https://dashboard.ngrok.com/get-started/your-authtoken\r\n\r\nERR_NGROK_4018\r\n"
ERROR:pyngrok.process.ngrok:t=2024-12-21T12:26:29+0000 lvl=eror msg="session closing" obj=tunnels.session err="authentication failed: Usage of ngrok requires a verified account and authtoken.\n\nSign up for an account: https://dashboard.ngrok.com/signup\nInstall your authtoken: https://dashboard.ngrok.com/get-started/your-authtoken\r\n\r\nERR_NGROK_4018\r\n"
ERROR:pyngrok.process.ngrok:t=2024-12-21T12:26:29+0000 lvl=eror msg="terminating with error" obj=app err="authentication failed: Usage of ngrok requires a verified account and authtoken.\n\nSign up for an account: https://dashboard.ngrok.com/signup\nInstall your aut

PyngrokNgrokError: The ngrok process errored on start: authentication failed: Usage of ngrok requires a verified account and authtoken.\n\nSign up for an account: https://dashboard.ngrok.com/signup\nInstall your authtoken: https://dashboard.ngrok.com/get-started/your-authtoken\r\n\r\nERR_NGROK_4018\r\n.

In [46]:
from pyngrok import ngrok
#================If any open tunnels are there close that ===============
ngrok.kill()
auth_token="2qVdS7FE973iG4L7wqkAQ7MIuiT_3XXuByz9sbekEfBAcK75F"

ngrok.set_auth_token(auth_token)

ngrok_tunnel=ngrok.connect(addr="5000",proto="http")
print("Tracking uri:",ngrok_tunnel.public_url)

Tracking uri: https://adca-34-125-33-65.ngrok-free.app


In [None]:
!mlflow ui

[2024-12-21 13:32:35 +0000] [46132] [INFO] Starting gunicorn 23.0.0
[2024-12-21 13:32:35 +0000] [46132] [INFO] Listening at: http://127.0.0.1:5000 (46132)
[2024-12-21 13:32:35 +0000] [46132] [INFO] Using worker: sync
[2024-12-21 13:32:35 +0000] [46133] [INFO] Booting worker with pid: 46133
[2024-12-21 13:32:35 +0000] [46134] [INFO] Booting worker with pid: 46134
[2024-12-21 13:32:35 +0000] [46139] [INFO] Booting worker with pid: 46139
[2024-12-21 13:32:35 +0000] [46140] [INFO] Booting worker with pid: 46140


In [44]:
import mlflow
logged_model = 'runs:/89ac8089decb4a89a5089b9ee653bf83/model'

# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(logged_model)

# Predict on a Pandas DataFrame.
import pandas as pd
loaded_model.predict(pd.DataFrame(test_x))

array([5.62819943, 5.58487094, 5.71948348, 5.78147552, 5.86244829,
       5.551803  , 5.80027927, 5.93912114, 5.66008195, 6.0809841 ,
       5.29031265, 5.4462568 , 5.79314019, 5.60654462, 5.67310506,
       5.4361623 , 5.14025254, 5.54448201, 5.6992845 , 5.15313546,
       5.06937031, 5.60378506, 5.45639998, 5.68895735, 5.78155307,
       5.80620707, 5.51415174, 5.25874731, 5.63431516, 5.68024811,
       5.23255392, 5.89280838, 5.29464238, 5.70638679, 5.51420344,
       5.40741314, 5.98105344, 5.54027456, 5.60952795, 5.9159041 ,
       5.35666724, 5.74096924, 5.71074839, 5.54881475, 5.49560063,
       5.36693777, 5.05203634, 5.87256071, 5.63886675, 5.48099831,
       5.33070158, 5.69470817, 5.92753102, 5.77870201, 5.67450823,
       5.61958661, 5.91157533, 5.58630789, 5.88130278, 5.56326086,
       5.74829323, 5.16332733, 5.29041605, 5.62537723, 5.32071049,
       5.79454637, 5.58196421, 5.93482123, 5.37708588, 5.79740536,
       5.8855808 , 5.79890987, 5.51268497, 5.9072724 , 5.25874