# MLFlow Projects

<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/mlflow-project.png" width="600" />

## What you will learn in this course 🧐🧐

What is very cool is to have a standard way of organizing your ML projects so that you can implement trainings easily. MLFlow projects lets you do that. In this course, you will learn: 

* What is MLFlow projects
* How to organize an MLFlow projects
* Understand config files in MLFlow projects

## What is MLFlow Projects? 🤔

MLFlow Project is a way for you to standardize your projects so that you can use them with any types of technologies and train your models remotely.

## How is structured an MLFlow Project 🗂️

Now that you registered your metrics and your model, check out your working directory on your local machine. You should see an architecture that looks like this: 

```shell 
├── mlruns
│   └── 0
│       ├── 0a2b502f674949b4acb8dfce6549a7fb
│       │   ├── artifacts
│       │   │   └── model
│       │   │       ├── MLmodel
│       │   │       ├── conda.yaml
│       │   │       └── model.pkl
│       │   ├── meta.yaml
│       │   ├── metrics
│       │   │   └── Accuracy
│       │   ├── params
│       │   │   └── C
│       │   └── tags
│       │       ├── mlflow.log-model.history
│       │       ├── mlflow.source.name
│       │       ├── mlflow.source.type
│       │       └── mlflow.user
│       └── meta.yaml
├── train.py
```

In this structure what is actually important to understand is: 

- `artifacts` folder: where you store informations about your model to deploy it.
- `meta.yaml` file: where you have all the information regarding your run.

You can also find all that information on your **Mlflow tracking server**: 

![crack](https://full-stack-assets.s3.eu-west-3.amazonaws.com/Deployment/Mlflow_project_presentation.gif)


> 👋 You might not exactly see the above structure on your local directory. It's okay, simply verify that the information is at least present on your remote server. 

For almost all folders, you already know what they are about (checkout our previous course on MLFlow if that's not the case). The main one that we haven't talked about is the `artifacts` folder. Let's explain the purpose of each of the containing files.

## Artifacts 🏛️

Artifacts are the place where you have all the information regarding the environment when your model has been trained. Especially, you have three files: 

* `MLModel`
* `conda.yml`
* `model.pkl`, if you persisted a sklearn model. But you can have other types of files if you persisted a TensorFlow, Pytorch or any other type of model.

Now if you set up your MLFLow tracking server remotely, you most likely have chosen Amazon S3 as your Artifact Store location instead of a local one. 

### `MLModel` 

An `MLModel` file should look something like this: 

```yaml
artifact_path: model
flavors:
  python_function:
    data: model.pkl
    env: conda.yaml
    loader_module: mlflow.sklearn
    python_version: 3.7.3
  sklearn:
    pickled_model: model.pkl
    serialization_format: cloudpickle
    sklearn_version: 0.23.1
run_id: 0a2b502f674949b4acb8dfce6549a7fb
utc_time_created: '2020-06-14 17:23:39.122114'
```

It gives all necessary informations to run your model. Especially pay attention to: 

- `env`: by default you'll get a `conda` environment but you can setup a `Docker` environment,
- `sklearn_version`: be really careful with the versions registered here as it might not be available in your servers.

> 👋 If you are interested about the format of this file, `MLModel` takes its structure from `yaml` serialization language. It's a file extension widely used for configuration. Check out this documentation if you want to learn more: https://www.cloudbees.com/blog/yaml-tutorial-everything-you-need-get-started

### `conda.yaml`

Just like `Docker`, Conda is a package and environment management system that is widely used in Data Science. MLFlow uses it to package models so that you can run your project on any server. A `conda.yaml` look like this: 

```yaml
channels:
- defaults
dependencies:
- python=3.7.3
- scikit-learn=0.23.1
- pip
- pip:
  - mlflow
  - cloudpickle==1.4.1
name: mlflow-env
```

As you can see, you have all the dependencies stated here. Again be careful with versions stated in your YAML file as some servers might not be able to run them.

> 👋 Most likely you won't have to touch anything in this file. 

### `model.pkl`

When you log a model using MLFlow, a file containing the model's information will be created. When it's a `sklearn` model, it's going to be a `.pkl` but you can have other types of files if you persisted a TensorFlow, Pytorch or any other type of model.

## Resources 📚📚

* <a href="https://mlflow.org/docs/latest/projects.html" target="_blank">Mlflow Projects</a>
* <a href="https://mlflow.org/docs/latest/tutorials-and-examples/tutorial.html" target="_blank">Mflow Tutorial</a>
* <a href="https://www.cloudbees.com/blog/yaml-tutorial-everything-you-need-get-started" target="_blank">YAML Tutorial: Everything You Need to Get Started in Minutes</a>
* <a href="https://docs.conda.io/en/latest/" target="_blank">Conda</a>