# MLFlow Installation 

## Introduction
>`MLFlow` is written in Python; thus, it can be installed using `pip`/`conda` or any other package manager.

If you have `pytorch` or any other integration installed, it will be picked up automatically by `mlflow`.

In [None]:
!pip install mlflow

## Projects

> __MLFlow projects refer to the CONVENTION (or standard) for organising and describing code to enable others (people, automation pipelines) easily run it.__

Conventionally, projects are `git` repositories that allow you to specify, in varying levels of detail, the required environment (either `conda` or `docker`; note that `system`-specified environments are discouraged) via the
- directory structure.
- `MLproject` file in git's root directory.

*Note*
- `MLproject` should be a `yaml` file, without an extension (i.e. save it as `MLroject`, not `MLroject.yaml`).

### Directories

> Structuring code via directories is sufficient for creating a basic `MLFlow` project; however, __specifying `MLproject` is a better option.__

If there is no `MLproject.yaml` file, note the following:
- __Name of the project__: name of the project's root directory (e.g. git's root).
- __Conda environment__: apply it if `conda.yaml` is available in the root.
- __Any `.py`/`.sh` file in the project can be an entry point__ (the topic of running projects will be discussed in more detail later).

One can obtain the `conda.yaml` file via a simple command, provided the command is run from within the conda environment:

```bash
conda env export [--from-history] > conda.yaml
```

`--from-history` requests only packages that have been explicitly installed. This has two effects:
- Portability across OSs (as OS-specific packages will be installed this way).
- Partial reproducibility (possibly due to the different dependencies).

__Generally, it is safe to use the `--from-history` flag to improve the portability of projects.__

### Using MLProject.yaml

> As a better alternative, the entry points, structure, parameters, etc. can be explicitly specified via the `MLproject` file.

Consider the example `MLproject` below:


```yml
---
name: My Project

conda_env: my_env.yaml

entry_points:
  main:
    parameters:
      data_file: path
      regularization: {type: float, default: 0.1}
    command: "python train.py -r {regularization} {data_file}"
  validate:
    parameters:
      data_file: path
    command: "python validate.py {data_file}"
```

For more examples, check out the documentation [here](https://github.com/mlflow/mlflow/tree/master/examples).

From the above code, it is clear that the user can do the following:

1. __specify the environment explicitly:__
    - `conda` (simply a file with dependencies).
    - `docker`:
        - specify the image available on the OS
        - if the image is unavailable, attempt to pull it from `DockerHub`.
        - if the registry containing the image is specified, it will attempt to pull it, unless it is already available on the system.


For the `docker` environment, one can also specify the
- volumes to be mounted while the project runs.
- environment variables passed to the container.

Here is an example of `docker_env`:

```yml
---
name: My Project

docker_env:
  image:  mlflow-docker-example

entry_points:
  main:
    parameters:
      data_file: path
      regularization: {type: float, default: 0.1}
      p: float
    command: "python train.py -r {regularization} {data_file}"
  validate:
    parameters:
      data_file: path
    command: "python validate.py {data_file}"
```

More examples are provided [here](https://www.mlflow.org/docs/latest/projects.html#mlproject-file).

2. __Specify parameters and entrypoints__

One can specify the
- name of the parameter.
- type of the parameter (the default is `str`; others are `float`, `path` and `uri`).
- default value(s).

These values are eventually passed to the `command` field and appropriately substituted.
__For unspecified parameters, the values will be passed to the running command via the `--key value` syntax.__

Two cells above all of the parameter specifications are shown.

### Running projects

> `MLFLow` provides a terminal command, `mlflow`, which has a subcommand, `run`, that allows us to run the project.


The syntax is quite simple:

```bash
mlflow run <directory>
```

However, there are a few workarounds for running it effortlessly:

In [None]:
!mlflow run --help

For example, we could run the following in the CLI or cell:

In [None]:
!mlflow run https://github.com/mlflow/mlflow-example -P alpha=0.5


## Conclusion
At this point, you should have a good understanding of 
- how to install MLFlow.
- MLFlow projects.