# A SIMPLIFIED MLOPS GUIDE USING MLFLOW

MLOps is one of the crucial phase in a machine learning lifecycle to keep machine learning models maintained and perform well. There's an open source MLOps tool that easily to use, comprehensive ML library, and manage end-to-end ML workkflows from development to production. It is called [MLflow](https://mlflow.org). Here's how to understand by doing a fundamental implementation using ML flow. Let's just dive into the notebook.

# 1. Install MLflow (Including libraries and dependencies)

run it on your IDE terminal

`pip install mlflow`

# 2. Launch the Mlflow Tracking Server

Run it on your IDE terminal

`mlflow server --host 127.0.0.1 --port 8080`

You need to launch the MLflow tracking server and always keep it ongoing during the tutorial, if you close/kill the terminal it'll shut down the server. The screen should be showing like [this](../MLOps/Screenshots/ss1.png) (or ss1.png in the Screenshots folder)

# 3. Using the MLFflow Client API using `MLflowClient`
it could use for :
1. Initiate a new Experiment.
2. Start Runs within an Experiment.
3. Document parameters, metrics, and tags for your Runs.
4. Log artifacts linked to runs, such as models, tables, plots, and more.



## 3.1 Import Dependencies

In [6]:
from mlflow import MlflowClient
from pprint import pprint
from sklearn.ensemble import RandomForestRegressor

#### Configuring the MLflow Tracking Client

In [2]:
client = MlflowClient(tracking_uri="http://127.0.0.1:8080")

# Default Experiment
It's an outset of starting MLflow Tracking Server that if you don’t explicitly create a new experiment in MLflow, any run data is automatically stored in the “Default Experiment” so that it isn’t lost.

## 3.2 Searching Experiments

[mlflow.client.MlflowClient.search_experiments()](https://mlflow.org/docs/latest/python_api/mlflow.client.html#mlflow.client.MlflowClient.search_experiments)

In [7]:
all_experiments = client.search_experiments()
print(all_experiments)

# the output would be a list of Experiment objects

[<Experiment: artifact_location='mlflow-artifacts:/0', creation_time=1736852050555, experiment_id='0', last_update_time=1736852050555, lifecycle_stage='active', name='Default', tags={}>]


To get familiar with accessing elements from returned collections from MLflow APIs, extract the `name` and the `lifecycle_stage` from the `search_experiments()` query and extract these attributes into a dict.

In [8]:
default_experiment = [
    {
        "name": experiment.name, "lifecycle_stage": experiment.lifecycle_stage
    }
    for experiment in all_experiments
    if experiment.name == "Default"
][0]

pprint(default_experiment)

{'lifecycle_stage': 'active', 'name': 'Default'}


# 4. Creating experiments

## Viewing the MLFlow UI
you could see the default experiment with no run data at [http://127.0.0.1:8080](http://127.0.0.1:8080)

## Notes on Tags vs Experiments
While MLflow does provide a default experiment, it primarily serves as a ‘catch-all’ safety net for runs initiated without a specified active experiment. However, it’s not recommended for regular use. Instead, creating unique experiments for specific collections of runs offers numerous advantages, as we’ll explore below.

**Benefits of Defining Unique Experiments**:

1. **Enhanced Organization**: Experiments allow you to group related runs, making it easier to track and compare them. This is especially helpful when managing numerous runs, as in large-scale projects.
2. **Metadata Annotation**: Experiments can carry metadata that aids in organizing and associating runs with larger projects.

Consider the scenario below: we’re simulating participation in a large demand forecasting project. This project involves building forecasting models for various departments in a chain of grocery stores, each housing numerous products. Our focus here is the ‘produce’ department, which has several distinct items, each requiring its own forecast model. Organizing these models becomes paramount to ensure easy navigation and comparison.

**When Should You Define an Experiment?**

The guiding principle for creating an experiment is the consistency of the input data. If multiple runs use the same input dataset (even if they utilize different portions of it), they logically belong to the same experiment. For other hierarchical categorizations, using tags is advisable.


In [9]:
# Provide an Experiment description that will appear in the UI
experiment_description = (
    "This is the grocery forecasting project. "
    "This experiment contains the produce models for apples."
)

# Provide searchable tags that define characteristics of the Runs that
# will be in this Experiment
experiment_tags = {
    "project_name": "grocery-forecasting",
    "store_dept": "produce",
    "team": "stores-ml",
    "project_quarter": "Q3-2023",
    "mlflow.note.content": experiment_description,
}

# Create the Experiment, providing a unique name
produce_apples_experiment = client.create_experiment(
    name="Apple_Models1", tags=experiment_tags
)


RestException: RESOURCE_ALREADY_EXISTS: Experiment 'Apple_Models' already exists in deleted state. You can restore the experiment, or permanently delete the experiment from the .trash folder (under tracking server's root folder) in order to use this experiment name again.

## Search based on tags

### Interactive tables

Effortlessly view, navigate, sort, and filter data. Create charts and access essential data insights, including descriptive statistics and missing values – all without writing a single line of code.

In [None]:
# Defining data for the dataframe
data = {
    'Basket': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P'],
    'Apples': [10, 20, 30, 56, 40, 40, 67, 47, 40, 4, 49, 52, 5, 56, 35, 45],
    'Bananas': [15, 6, 3, 45, 67, 44, 45, 11, 14, 18, 13, 12, 1, 34, 12, 12]
}

# Creating the dataframe
df = pd.DataFrame(data)

df

### Visualization in IDE

Create graphs and visualizations that match your chosen color scheme.

In [None]:
# Calculate the sums
sum_apples = df['Apples'].sum()
sum_bananas = df['Bananas'].sum()

# Create a bar chart
plt.bar(['Apples', 'Bananas'], [sum_apples, sum_bananas], color=['red', 'blue'])

# Set a title
plt.title('Comparison of total Apples and Bananas')

# Show the plot
plt.show()