# 1st Lesson - [Experiment tracking intro](https://www.youtube.com/watch?v=MiA7LQin9c8&list=PL3MmuxUbc_hIUISrluw_A7wDSmfOhErJK)

Important concepts:
1. ML experiment: the process of building an ML model;
2. Experiment run: each trial in an ML experiment;
3. Run Artifact: any file that is associated with an ML run;
4. Experiment metadata

### 1. ML experiment

- Keep track of relevant information from an ML experiment, which includes:
    1. Source code;
    2. ENvironment;
    3. Data
    4. Model
    5. Hyperparamenters;
    6. Metrics, 
    ...

In general, because of 3 main reasons:
- Reproducibility;
- Organization;
- Optimization.

**MLFlow** 

In practice, its just a Python package that can be installed with pip, and it contains four main modules:
1. Tracking; 
2. Models;
3. Model Registry;
4. Projects (out of the scope)


Organizes your experiments into runs, and to keep track of:
- Parameters (can include personalize parametes, like the path to your training data, different prepocessing);
- Metrics;
- Metadata (you can add tags, to easily search for previous runs);
- Artifacts (visualization, log the dataset (does not scale very well));
- Models (log your models - maybe in certain situation it makes sense, but when you are running hyperparameter tunning you don't want to save all the models, just save the parameters of the best model);

Along with this information, MLflow automatically logs extra information about the run:
- Source code;
- Version of the code (git commit);
- Start and end time;
- Author

### Quick demo

`! mlflow ui` - Will run locally a experiment tracking tool, in your browser

In [4]:
! mlflow ui

[2022-05-23 16:42:48 +0100] [27468] [INFO] Starting gunicorn 20.1.0
[2022-05-23 16:42:48 +0100] [27468] [INFO] Listening at: http://127.0.0.1:5000 (27468)
[2022-05-23 16:42:48 +0100] [27468] [INFO] Using worker: sync
[2022-05-23 16:42:48 +0100] [27469] [INFO] Booting worker with pid: 27469
^C
[2022-05-23 16:45:31 +0100] [27468] [INFO] Handling signal: int
[2022-05-23 16:45:31 +0100] [27469] [INFO] Worker exiting (pid: 27469)


# 2nd Lesson - [Getting started with MLflow](https://www.youtube.com/watch?v=cESCQE9J3ZE&list=PL3MmuxUbc_hIUISrluw_A7wDSmfOhErJK&index=10)

1. Create your virtual environment: `python3.9 -m venv ~/.virtualenvs/mlcourse02`
2. Update pip: `pip install -U pip`
3. Install requirements: `pip install -r requirements.txt`

Requirements:
- mlflow, 
- jupyter, 
- sklearn,
- pandas,
- seaborn, 
- hyperopt,
- xgboost.

4. We can now run `mlflow ui`. But instead of using the same command line, we will give a path for mlflow to backend store the models in our local computer, by running the following:
- `mlflow ui --backend-store-uri sqlite:///mlflow.db`, which is saying to store the models locally in a sqlite database

OR

- `mlflow ui --backend-store-uri sqlite:///mlflow.db --default-artifact-root file:/home/<user>/mlruns -h 0.0.0.0 -p 8000`

    4.1 If you are unable to find sqlite database, than you need to create one. Follow the first two instructions on this [link](https://github.com/FDelca/mlflow_guidelines/blob/master/MLflow%20Guidelines.ipynb) for that.

Go to [Week2-LearningExercises]() to check the exercise of this lesson

In [2]:
!python -V

Python 3.9.12


**mlflow ui**

![Screenshot%20from%202022-05-23%2018-18-45.png](attachment:Screenshot%20from%202022-05-23%2018-18-45.png)

- Source: It was not able to get the name of the jupyter notebook one must do [this]() work around to obtain it

# 3th Lesson - [Experiment tracking with MLflow](https://www.youtube.com/watch?v=iaJz-T7VWec&list=PL3MmuxUbc_hIUISrluw_A7wDSmfOhErJK&index=11)

We will be using Hyperopt for hyperparameter tuning. Hyperopt is a Python library for serial and parallel optimization over awkward search spaces, which may include real-valued, discrete, and conditional dimensions.

Objects from **[hyperopt](https://github.com/hyperopt/hyperopt)** used:
- `fmin`: minize the objective function in this case the rmse;
- `tpe`: algorithm choosen to control the minimization logic;
- `hp`: it is a library that contains a bunch of methods to define the search space. (range of values);
- `STATUS_OK`: it is a parameter to tell the hyperopt that the cycle as run successfully/unsuccessfully;
- `Trials`: keeps track of the information for each run;
- `scope`: used to define parameter of type integer.

To help you define the search space, use this [link](http://hyperopt.github.io/hyperopt/getting-started/search_spaces/) since there is a lot of options that can be used depending of the hyperparameter you want to optimize and the scope of the problem.

Questions:
1. What is the difference between `maxevals` of `fmin` and `num_boost_round = 1000` of `xgb.train()`?
- `num_boost_round`: is the same as `n_estimators`, meaning that they are the number of decision trees used. The only reason why they have a different name, is because `n_estimators` is an implementation of `sklearn` - used when calling the `xgb.XGBRegressor` method. And `num_boost_round` is a method of `xgb.train`


- `max_evals`: defines the number of attempts to optimize the hyperparameters, using an algorithm to minimize the cost function (in our case `tpe` was used). It normally uses a early_stop argument to reduce computational time.

[**Autolog mlflow**](https://www.mlflow.org/docs/latest/tracking.html#automatic-logging): allows you to log metrics, parameters, and models without the need for explicit log statements. Only works with certain libraries, such as: sklearn, tensorflow/keras, xgboost, spark). It also saves the dependencies (packages' versions).

**NOTE:** After using `mlflow.autolog()` one must disable it or all the posterior runs with that given library will try to auto log everything.

One can disable it using: `mlflow.xgboost.autolog(disable=True)`

# 4th Lesson - [Model management](https://www.youtube.com/watch?v=OVUPIX88q88&list=PL3MmuxUbc_hIUISrluw_A7wDSmfOhErJK&index=13)

![Screenshot%20from%202022-05-24%2011-01-21.png](attachment:Screenshot%20from%202022-05-24%2011-01-21.png)

https://neptune.ai/experiment-tracking

When we finish the **experiment tracking** stage, we can start thinking about the **model management** stage, where:
1. Model Versioning;
2. Model Deployment;

We can use **mlflow** to manage our models for us.

Ways to save the model:
- Log model as an artifact

`mlflow.log_artifact(local_path="my_model", artifact_path="models")`

- Log model using the method **log_model**

`mlflow.<framework>.log_model(model, artifact_path="models")`

![Screenshot%20from%202022-05-24%2012-25-20.png](attachment:Screenshot%20from%202022-05-24%2012-25-20.png)

Questions:

2. Unable to save the model as an artifact, due to this error:

`MlflowException: Invalid artifact path: 'models_pickle/'. Names may be treated as files in certain cases, and must not resolve to other names when treated as such. This name would resolve to 'models_pickle'`

Solution:

- The model must be saved into a local path first, and then by doing `log_artifact()` we are giving the path of the model locally saved to be store in the mlflow system. The error was solved when giving the full path to the folder where the model was store, and not adding a folder to save our new model in mlflow system. It was a workaround, still searching for a better solution.

# 5th Lesson - [Model Registry](https://www.youtube.com/watch?v=TKHU7HAvGH8&list=PL3MmuxUbc_hIUISrluw_A7wDSmfOhErJK&index=14)

The `MlflowClient` object allows us to interact with...
- an MLflow Tracking Server that creates and manages experiments and runs.
- an MLflow Registry Server that creates and manages registered models and model versions. 

To instantiate it we need to pass a tracking URI and/or a registry URI


![Screenshot%20from%202022-05-24%2014-36-10.png](attachment:Screenshot%20from%202022-05-24%2014-36-10.png)

Models can have three different stages:
1. Staging (when the production model is archived is moved to production)
2. Production (the model that is being used);
3. Archive (after production).

Three important factors to promote one model over the other:
1. Evaluation metrics;
2. Time to train;
3. Size of the model.

**NOTE:** ``If there is any problem with sklearn, please use versions lower or equal than 1.1.0.``

**Note: the model registry doesn't actually deploy the model to production when you transition a model to the "Production" stage, it just assign a label to that model version. You should complement the registry with some CI/CD code that does the actual deployment.**

![Screenshot%20from%202022-05-24%2017-14-39.png](attachment:Screenshot%20from%202022-05-24%2017-14-39.png)

![Screenshot%20from%202022-05-24%2017-16-04.png](attachment:Screenshot%20from%202022-05-24%2017-16-04.png)