## MLFlow Installation 

As `mlflow` is written in Python we can use `pip`/`conda` or other package manager to install.

If you have `pytorch` or other integration installed, it will be picked up automatically by `mlflow` (no need for extras this time).

In [None]:
!pip install mlflow

## Projects

> __MLFlow Projects are mainly CONVENTION to organize and describe your code to let others (people, automation pipelines) easily run it__

Projects are usually `git` repositories and allow you to specify (in varying level of detail) required environment (either `conda` or `docker`, eventually `system` specified but this is discouraged) via:
- directory structure
- `MLproject` file in git's root directory

Note:
- the `MLproject` file should be a yaml file, but it should have no extension
  - save it as `MLroject`, not `MLroject.yaml`

### Directories

> Structuring our code via directories is enough to create basic `MLFlow` project, __but specifying `MLproject` is a better option__

In case where there is no `MLproject.yaml` the following takes place:
- __Name of the project__ - name of the project's root directory (e.g. git's root)
- __Conda environment__ - if `conda.yaml` is available in the root
- __Any `.py`/`.sh` file in the project can be an entry point__ (more about running projects later)

One can obtain `conda.yaml` file via a simple command (provided you are inside the conda environment while running it):

```bash
conda env export [--from-history] > conda.yaml
```

`--from-history` requests only packages you have explicitly installed. This has two effects:
- Portability across operating systems (as OS specific packages will be installed this way)
- Not fully reproducible (due to possibly different dependencies)

__In general it should be safe to use the `--from-history` flag for increased portability of projects__

### Using MLProject.yaml

> Better option is to explicitly specify entry points, structure, parameters etc. via `MLproject` file

Here is an example `MLproject`:


```yml
---
name: My Project

conda_env: my_env.yaml

entry_points:
  main:
    parameters:
      data_file: path
      regularization: {type: float, default: 0.1}
    command: "python train.py -r {regularization} {data_file}"
  validate:
    parameters:
      data_file: path
    command: "python validate.py {data_file}"
```

Check out some examples in the documentation [here](https://github.com/mlflow/mlflow/tree/master/examples)

As you can see one can:

1. __specify environment explicitly:__
    - `conda` (simply a file with dependencies)
    - `docker` environment:
        - specify image available on the OS
        - if image is not available, try to pull it from `DockerHub`
        - if registry containing image is specified it will try to pull it (unless it's already available on the system)


For `docker` environment one can also specify:
- volumes to be mounted during project running
- environment variables passed to the container

See an example of `docker_env` below:

```yml
---
name: My Project

docker_env:
  image:  mlflow-docker-example

entry_points:
  main:
    parameters:
      data_file: path
      regularization: {type: float, default: 0.1}
      p: float
    command: "python train.py -r {regularization} {data_file}"
  validate:
    parameters:
      data_file: path
    command: "python validate.py {data_file}"
```

[See more here](https://www.mlflow.org/docs/latest/projects.html#mlproject-file)

2. __Specify parameters and entrypoints__

One can specify:
- name of the parameter
- type of the parameter (default is `str`, others are `float`, `path`, `uri`)
- default value(s)


Those values are latter passed on to `command` field and appropriately substituted.
__If we don't specify some parameter, it will be passed to the running command via `--key value` syntax__

Two cells above all of the parameter specification are shown.

### Running projects

> `MLFLow` provides command line program `mlflow` which has a subcommand `run` allowing us to run the project


Usage is really simple:

```bash
mlflow run <directory>
```

but there are a few useful tricks which allow us to run it with even less effort:

In [None]:
!mlflow run --help

For example we could do this (run in your CLI or in your cell):

In [None]:
!mlflow run https://github.com/mlflow/mlflow-example -P alpha=0.5
