# Notes: Overview
MLFlow uses/terminology:
-  ML experiment: the process of building an ML model (playing with models/hyperparameters).
- Experiment run: each trial in an ML experiment
- Run artifact: any file that is associated with an ML run
- Experiment metadata: info relating to experiment.
- Experiment tracking: Source code, Environment, Data, Model, Hyperparameters, Metrics

# Notes: MLFlow
Information Manually Logged:
- MLFlow: Tracking (experiment tracking), MLFlow Models (special type of models), Model Registry, MLFlow Projects (not in scope).
MLFlow tracks the following per run:
- Parameters (hyperparameter of model, parameters that could affect the model ie path to the training dataset, preprocessing to input data). It can be reflected in the experiment that you were playing with different versions of data, for example.
-  Metrics: evaluation metrics (accuracy, F1, precision, etc). Metrics can be from training, validation, or test datasets.
-  Metadata: any info related to the experiment (ie tags so that you can search/filter runs).
-  Artifacts: any files logged that are supplementary (ie visualization). Dataset not recommended to be logged as artifact since we can have duplicates.
-  Models: log the model of the experiment.
Information Automatically Logged (by MLFlow):
- Source code (name of the file)
- Version of code (git commit)
- Start/End Time
- Author 

# MLFlow Demo:
- Get access to MLFlow CLI once it is installed
# MLFlow syntax:
mlflow [Options] Command [Arguments]
- Options = --version, --help
- Command = artifacts, azureml, db, deployments, experiments, gc, models, run, runs, sagemaker, server, ui 
- Type mlflow to get more descriptions
MLFlow UI: 
- A link will be displayed. go to the address to get access to experiments
- Experiments will all be listed on lefthand side.
- When creating experiment, it will ask for artifact location (optional) where you can specify the location of all your artifacts.
-  Models tab = Model Registry (error will be returned if running ui without any backend. Need to specify sql database).

In [3]:
# Tells mlflow we want to store all artifacts and metadata in sqlite (one alternative of backend storage).
# This is necessary to use model registry or else we will run into the same error previously mentioned.
mlflow ui --backend-store-uri sqlite:///mlflow.db

# Module 2.2 Notes:
Achieved:
- Preparing the local environment, installing mlflow client
- Starting up mlflow experiment
- Navigating mlflow ui (looking at results, etc).
- Adding mlflow to notebook, logging information (parameters and metrics)

# Module 2.3 Notes:
- Add hyperparameter tuning to notebook and how to explore results of hyperparameter search in mlflow ui
- Discussing how to select the best model based on the results
- Autologging (enables logging with less lines of code)
- Selecting multiple runs allows us to compare the results between runs.
- To filter by tags, use tags.key = 'value'
- Parallel coordinate plots can tell you which combinations of hyperparameters perform the best and worst
- Scatterplot gives you a direct correlation between a hyperparameter and the metric (ie rmse).
- Contour plot allows you to see 2 hyperparameter relationships with each other and against the metric as well (metric being the elevation).
- Since we want to minimize error in this example, we reverse the color to find the minimal point (the minimal error).

Automatic Logging: Works only with certain frameworks (can be found on mlflow auto log page/below).
- The following libraries support autologging:
- Scikit-learn
- TensorFlow and Keras
- Slupn
- XGBoost|
- LightGBM
- Statsmodels
- Spark
- Fastai
- Pytorch

AutoLogging using XGBoost:
- saves visualization about feature importance of the model
- saves info about the importance of the features
- saves model as an mlflow model (model.xgb in this case).
- saves dependencies (library and library versions), conda environments, pip and python version, etc.
- saves run information (can run it as a python function or as an XGBoost model).
- Clicking on model gives you examples on how to load the models. 

# Module 2.4 Notes: Model Management
- Source for MLOps lifecycle diagram: https://neptune.ai/blog/ml-experiment-tracking
- Experiment tracking is only a part of MLOps life cycle. There's also Model Management (which experiment tracking is a part of).
- Model Management covers: Experiment tracking, Model versioning, Model deployment, Scaling hardware
- After experiment tracking, we are happy with the model. We then need to consider ways to save and version the models. Then we want to deploy the model and then potentially update the model in order to scale it.
- The last stage of the MLOps life cycle (after model management) is Prediction Monitoring.
- Most basic version of saving a model in mlflow - saving it as an artifact
- mlflow.log_artifact(local_path=location where the artifact is located, artifact_path= where the model will be saved)
- mlflow.log_params(dictionary) is another way of logging parameters. Notice params is plural. Pass in a dictionary of parameters and it will log it all.
- Second way of logging models with mlflow: mlflow.framework.log_model(model, artifact_path= where to save the model). framework is the framework you're using to train the model ie xgb (xgboost) in this case. model is variable where your model is initialized (booster).
- MLModel = mlflow model. Stores info about the model itself: artifact_path = where model is stored, flavors: list of methods to load the model
- Save preprocessor as an artifact to later load the preprocessor to preprocess prediction data into the model.
To make predictions using the model:
- first step is to pass to mlflow the model URI (unique resource identifier). Basically, it's pointing to a run -> run ID -> model folder inside the run's artifact.
-  mlflow.pyfunc.library_method('runs:/run_ID/model_name') then used to load the model (the type of pyfunc depends on which framework you are loading the model into ie pandas or spark). These are both examples of loading the model as a python function.
-  Second method is to load it into the framework (creating an object from that framework as if you loaded the model into that library): mlflow.framework.load_model('runs:/run_ID/model_name')
-  In this example, it was: xgboost_model = mlflow.xgboost.load_model('runs:/088f9cd07a9f49f6b23b141076902c53/models_mlflow')
-  By loading the model into the framework, you create an object type of the framework (ie a xgboost type object). THIS IS IMPORTANT BECUSE IT GIVES YOU ACCESS TO THE METHODS OF THE FRAMEWORK OBJECT TYPE (ie xgboost methods).
-  After the models are loaded, then you can make predictions by just calling the .predict() method as usual.
-  So the order goes load model -> pick load type (python function or framework) -> deploy (ie to a cloud, Spark, kubernetes clusters, docker, jupyter notebooks, etc).

# Module 2.5 Notes:
- 