# ✍️ Exercise: Intro to MLFlow - Part II

Once we've learned how to log metrics, parameters, and artifacts, we can use MLFlow to track our experiments and compare different models. In this exercise, we'll use some fake data to train a linear regression model. We'll then use MLFlow to track the performance of the model and some relevant information about the training process.

In this part we will cover the following topics:

- Create some Fake Data
- Plot the Data using Matplotlib.
- Split the data into training and testing sets.
- Train a linear regression model.
- Compute the accuracy the model.
- Log the model using MLFlow.
- Log the accuracy of the model using MLFlow.
- Log the plotted data using MLFlow.

First we need some data to work with. Let's generate some fake data.

In [1]:
import numpy as np


# Mocked data
X = np.random.rand(100, 1)  # Independent variable
y = 2 * X + np.random.randn(100, 1)  # Dependent variable with some noise

## Exercise I: Plot the Data using Matplotlib

¿Do you remember how we can plot data using `Matplotlib`? Let's do it! 🚀:

1. 👉 We have X (our input) and y (our output). So we can simply plot the data using [`plt.scatter`](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html).
2. 👉 Then we can save the plot using [`plt.savefig`](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html).

In [4]:
import matplotlib.pyplot as plt


# 👇 Add the relevant code below to plot the data





## Exercise II: Split the Data into Train and Test Sets

💡 Remember that we need to split our data into train and test sets. We can use the [`train_test_split` function](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) from `sklearn.model_selection` to do this. We should store the split into `X_train`, `y_train`, `X_test`, `y_test`.

In [None]:
from sklearn.model_selection import train_test_split

# 👇 Add the relevant code below to split the data into training and testing sets





## Exercise III: Train a Linear Regression Model

Then, train a [**linear regression model** using the scikit-learn library](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html).

1. 👉 Initialize the model calling the `LinearRegression` class.
2. 👉 Train the model using the `fit` method.

In [3]:
from sklearn.linear_model import LinearRegression

# Add code to train the model 👇





## Exercise IV: Compute the Accuracy of the Model

Finally, compute the accuracy of the model using the [`mean_squared_error` function](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html) from the `sklearn.metrics` module.

1. 👉 Compute the predictions by passing the `X_test` to the `predict` method of the model.
2. 👉 Compute the accuracy using the `mean_squared_error` function and passing the `y_test` and the `predictions` as arguments.
3. 👉 Print the accuracy.

In [9]:
from sklearn.metrics import mean_squared_error

# Add code to calculate the mean squared error 👇





## Exercise V: Create a Run and log the model and metrics.

1. 👉 Think. We've computed the mse of the model. ¿Would you log it as a parameter or as a metric?
2. 👉 Think. We've created a plot. ¿What kind of data is it? ¿How would you log it?
3. 👉 Log the model using the `mlflow.sklearn.log_model` function.
4. 👉 Extra: Log the signature of the model.

In [None]:
import mlflow


EXPERIMENT_NAME = "intro-to-mlflow"


experiment = mlflow.get_experiment_by_name(EXPERIMENT_NAME)


with mlflow.start_run(
    experiment_id=experiment.experiment_id,
) as run:
    
    # Add code to log the model, the mean squared error, and the model parameters 👇

    pass