<a href="https://akademie.datamics.com/kursliste/">![title](../screenshots/bg_datamics_top.png)</a>

<center><em>© Datamics</em></center><br><center><em>Check out our courses on <a href='https://akademie.datamics.com/kursliste/'>www.akademie.datamics.com</a></em>

<div class="alert alert-info">
    <h1>  Dataset: Wine Quality prediction </h1>
</div>

*This datasets is related to red variants of the Portuguese. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).*

*The datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are much more normal wines than excellent or poor ones).*

**Input Features:** 
- Fixed acidity 
- Volatile acidity
- Citric acid
- Residual sugar
- Chlorides
- Free sulfur dioxide
- Total sulfur dioxide
- Density
- pH
- Sulphate
- Alcohol

**Output column:** 
- Quality (score between 0 and 10)


Data is available at: https://www.kaggle.com/datasets/uciml/red-wine-quality-cortez-et-al-2009?resource=download

<div class="alert alert-info"> 
<h2>Quick Recap</h2>
</div>

**We learned, how to ...**

1. Run mlflow server
2. Train the regression model with MLflow
    - Add the MLflow loggers
    - Create experiments
    - Add run names and tags to each run
    - Add artifact store to log the model binaries in a standard format

<div class="alert alert-info"> 
<h2>What will we learn now</h2>
</div>

<div class="alert alert-warning">
    <h3> 1. Adding backend store</h3>
</div>

- Check the MLflow dashboard, try to register models
- Create a database using sqlite to log models in mlflow registry
- Create and run MLflow server with backend store
- Set model registry using `mlflow.set_registry_uri()`
- Optional, cloud solution
    - set tracking URL to a particular host using `mlflow.set_tracking_uri()` 

### 1.0 Install Sqlite (most likely to be present with python>2.5)

In [1]:
# !pip install sqlite3

### 1.1 Create a sqlite database
1. Create a new directory sqlite_backend_store, in the current mlflow folder
2. Open terminal, go to the newly created folder, run `sqlite3 backend_store.db`
3. Exit the terminal, cmd+d or clt+c

![title](../screenshots/sqlite3_db_creation.png)

### 1.2 SQLite URI
`sqlite:///path_to_db.db`

<div class="alert alert-warning"> 
    <p> Start the mlflow server (from Terminal) </p> 
    Command: <code>mlflow ui</code>
</div>

In [9]:
# Create the DB URI and add it in the tracking and registry URI

# Import the required libraries
import mlflow

import pandas as pd
import numpy as np 

from sklearn.linear_model import ElasticNet
from sklearn.model_selection import train_test_split
from sklearn import metrics

# Evaluation method
def eval_metrics(ground_truth, pred):
    
    rmse = np.sqrt(metrics.mean_squared_error(ground_truth, pred))
    mae = metrics.mean_absolute_error(ground_truth, pred)
    r2 = metrics.r2_score(ground_truth, pred)
    
    return rmse, mae, r2

# Read the wine-quality csv file
df = pd.read_csv("../data/winequality-red.csv")
np.random.seed(40)

# Split the data into training and test sets. (0.75, 0.25) split.
X = df.drop('quality',axis=1)
y = df['quality']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=43)

### Train the model ###

#######################################################
################### MLflow code #######################
#######################################################

# setting the experiment details
experiment_name = "Experiment-7"
current_run_name = "With new Backend store"
location = "/Users/saumyagoyal/JupyterNotebook/Datamics/MLCon_Berlin/local_artifact_store"
db_path = "/Users/saumyagoyal/JupyterNotebook/Datamics/MLCon_Berlin/sqlite_backend_store/backend_store.db"
db_uri = "sqlite:///"+db_path

# adding tags for each run
tags = {"Demo": "True",
        "created-by": "dev team ID"}

# Create and set a new experiment name
mlflow.create_experiment(experiment_name, location)
mlflow.set_experiment(experiment_name)

# Set registry and tracking URI
mlflow.set_registry_uri(db_uri)
mlflow.set_tracking_uri(db_uri)

with mlflow.start_run(run_name=current_run_name):
    
    alpha = 0.8
    l1 = 0.4
    
    lr = ElasticNet(alpha=alpha, l1_ratio=l1)
    lr.fit(X_train,y_train)

    # Get prediction on Test dataset
    y_pred = lr.predict(X_train)

    # Check model performance on test
    (rmse, mae, r2) = eval_metrics(y_train,y_pred)
    print(f"Dataset: Training \nRMSE:{rmse}\nMAE: {mae}\nR2:{r2}")

    ###################### Logging code ######################
    
    #log parameters of the model
    mlflow.log_param("alpha", alpha)
    mlflow.log_param("l1_ratio", l1)
    
    # log metrics 
    mlflow.log_metric("rmse", rmse)
    mlflow.log_metric("r2", r2)
    mlflow.log_metric("mae", mae)
    
    # log tags
    mlflow.set_tags(tags)

    # logging the model in artifact store
    mlflow.sklearn.log_model(lr, "Linear regression model 1")
    
    # check the storage location
    print("\n\nTracking URI: {}".format(mlflow.get_tracking_uri()))
    print("Artifact Location: {}".format(mlflow.get_artifact_uri()))
    print("--- Model logged successfully ---")

Dataset: Training 
RMSE:0.7800026855881376
MAE: 0.6367433839218013
R2:0.06818906420201365


Tracking URI: sqlite:////Users/saumyagoyal/JupyterNotebook/Datamics/MLCon_Berlin/sqlite_backend_store/backend_store.db
Artifact Location: /Users/saumyagoyal/JupyterNotebook/Datamics/MLCon_Berlin/local_artifact_store/90d9544bfc5f48a89d530bceea0404b9/artifacts
--- Model logged successfully ---


<font color=#FF0000>**Note:** The above code will not get logged to the previous mlflow dashboard</font><a href='http://127.0.0.1:5000'> http://127.0.0.1:5000 </a>

### 1.3 Create and run mlflow server

It connects the tracking URI, artifact and backend store to the non default locations

`mlflow server --default-artifact-root /Users/saumyagoyal/JupyterNotebook/Datamics/MLCon_Berlin/local_artifact_store  --backend-store-uri sqlite:////Users/saumyagoyal/JupyterNotebook/Datamics/MLCon_Berlin/sqlite_backend_store/backend_store.db --host 127.0.0.1 --port 5500 >> mlflow_server_log.txt`

<font color=#FF0000>**Note:** You can create mlflow server to run on a remote machine using the above command and connecting the backend (Example: AWS RDS instance) and artifact store(example: AWS S3 bucket) accordingly.</font>

![title](../screenshots/run_mlflow_server.png)

### 1.4 Check the logs in a new tracking server 
- Run the above code again
- Delete the past history of 1 hour
- Go to <a href='http://127.0.0.1:5500'> http://127.0.0.1:5500 </a>

![title](../screenshots/mlflow_dashboard_backend_store.png)

### 1.5 Explore Model Registry
- Register models in the model registry
- Create multiple versions
- Move models to various environments

![title](../screenshots/mlflow_model_registry.png)

![title](../screenshots/mlflow_registry_environments.png)

<div class="alert alert-success">
    <h3> END </h3>
</div> 