## Deploying models

Once we have our model saved we can easily deploy it to various services, namely:
- [locally](https://www.mlflow.org/docs/latest/models.html#deploy-mlflow-models) with REST API (either inside `docker` container or with `conda` environment)
- [Microsoft's Azure ML](https://www.mlflow.org/docs/latest/models.html#deploy-a-python-function-model-on-microsoft-azure-ml)
- [Amazon SageMaker](https://www.mlflow.org/docs/latest/models.html#deploy-a-python-function-model-on-amazon-sagemaker)
- [Apache UDF](https://www.mlflow.org/docs/latest/models.html#export-a-python-function-model-as-an-apache-spark-udf)
- Others, maintained by community deployment plugins (for example `torchserve`), check out [here](https://www.mlflow.org/docs/latest/plugins.html#deployment-plugins)

Let's see `mlflow models` command:

In [None]:
!mlflow models --help

### models build-docker

> This subcommand creates a docker image and places our model inside it

After this we can serve the model by running created image (by default port `8080` is exposed so we can easily map it).

Let's see this command in more details

In [None]:
!mlflow models build-docker --help

__`python_flavor` is the default one and every specific integration is compatible with it__ (see more details [here](https://www.mlflow.org/docs/latest/python_api/mlflow.pyfunc.html))

### model serve

> Runs a basic webserver (created via `flask`) which we can query (e.g. using `curl`)

We can specify (amongst other things):
- `--model-uri` - model resource (mandatory)
- `--workers` - number of parallel workers handling requests
- `--port` - on which port the server will listen for requests

In [None]:
!mlflow models serve --help

### models predict

> Allows us to query model with a file (`.csv` or `.json`) (__useful for testing!__)

Let's see the possibilities:

In [None]:
!mlflow models predict --help

## Querying deployed model

Once we deployed the model (via `docker` or `flask` webserver) we can query it (from other machines or from `localhost` also). 

Requests are done via sending `json` text strings to `/invocations` endpoint. There are a few possibilities to send the data:
- JSON-serialized pandas DataFrames in the split orientation (`data = pandas_df.to_json(orient='split')`)
- JSON-serialized pandas DataFrames in the records orientation (discouraged)
- CSV-serialized pandas DataFrames (`data = pandas_df.to_csv()`)
- Tensor input formatted as described in TF Serving’s API docs where the provided inputs will be cast to Numpy arrays

Each of the above can be seen below (please notice `content/type` specification for different versions):

In [None]:
# split-oriented DataFrame input
curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json' -d '{
    "columns": ["a", "b", "c"],
    "data": [[1, 2, 3], [4, 5, 6]]
}'

# record-oriented DataFrame input (fine for vector rows, loses ordering for JSON records)
curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json; format=pandas-records' -d '[
    {"a": 1,"b": 2,"c": 3},
    {"a": 4,"b": 5,"c": 6}
]'

# numpy/tensor input using TF serving's "instances" format
curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json' -d '{
    "instances": [
        {"a": "s1", "b": 1, "c": [1, 2, 3]},
        {"a": "s2", "b": 2, "c": [4, 5, 6]},
        {"a": "s3", "b": 3, "c": [7, 8, 9]}
    ]
}'

We could also encode more complex data before sending the request (e.g. images could be encoded using `base64` and automatically decoded by MLFlow):

In [None]:
# record-oriented DataFrame input with binary column "b"
curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json; format=pandas-records' -d '[
    {"a": 0, "b": "dGVzdCBiaW5hcnkgZGF0YSAw"},
    {"a": 1, "b": "dGVzdCBiaW5hcnkgZGF0YSAx"},
    {"a": 2, "b": "dGVzdCBiaW5hcnkgZGF0YSAy"}
]'

# record-oriented DataFrame input with datetime column "b"
curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json; format=pandas-records' -d '[
    {"a": 0, "b": "2020-01-01T00:00:00Z"},
    {"a": 1, "b": "2020-02-01T12:34:56Z"},
    {"a": 2, "b": "2021-03-01T00:00:00Z"}
]'

In summary, we've seen how MLFlow can be used to deploy models.