# Deploying ML models with AML

This notebook will discuss how we will deploy models in AML.

In ML, we normally use a trained model to predict labels for new data in which our model has not been trained, this is called `inferencing`. A model is often deployed to allow apps to request real-time predictions for single/small amount of new data observations.

In AML, we create real-time inferencing solutions by deploying a trained model as a service, and hosting it in a containerised platform such as AKS.

## Deploying a model for real-time inferencing

You can deploy a trained model to a compute:
- local compute
- AML compute instance
- Azure Container Instance (ACI)
- Azure Kubernetes Service (AKS) cluster
- Azure function
- IoT module

AML uses containers for deployments, this packages the model and the code that enables the models use, in an image that can be deployed to a container of your chosen compute. 

For testing, it is good to deploy to:
- local compute
- AML Computer Instance
- ACI

For production:
- deploy to a target that meets the specific performance, scalability and security needs of your app architecture.

To deploy a model for real-time consumption, you have to:  

1. Register a trained model  
2. Define an inference configuration  
3. Define a deployment configuration  
4. Deploy the model  

### 1. Register a Trained Model

To register a model from a local file, you can use:
- `register` method of the **Model** object
- 'register_model' method of the **Run** object

#### Example code
option 1 - registering via the Model object

```python
azureml.core import Model

model = Model.register(
    workspace = ws,
    model_name='some classification model',
    model_path='model.pkl', # path to where you have stored the model locally
    description='Classify something'
)
```

option 2 - register model using Run object

```python
run.register_model(
    model_name='classification model,
    model_path='outputs/model.pkl',
    description='It classifies something'
)
```

### 2. Define an Inference Configuration

To deploy the model, it will need to be deployed with:  

1. InferenceConfig file: combines environment config file and entry script

    - environment config: An environment in which the script is run  
    - entry/scoring script: A scipt that loads the model and returns the predictions for some input data


These two, `entry/scoring script` and the `environment config` are both combined in an `InferenceConfig` file.

#### 2.1 Entry/Scoring Script  
Lets create the script that runs the model and returns the prediction

This script must contain 2 functions:
- init(): 
    - called when the service is initialised
    - loads model from model registery
- run(raw_data): 
    - called when new data is submitted to the service
    - run function to generate prediction 

Example:
```python
import json
import joblib
import numpy as np 
import azureml.core.model import Model 

# Init() - This will load the model from the model registry
def init():
    global model
    # get path to registered model file and load it
    model_path = Model.get_model_path('classification_model')
    model = joblib.load(model_path)

# run(raw_data) - called when a request is received
def run(raw_data):
    # convert to numpy array
    data = np.array(json.loads(raw_data)['data'])
    # get prediction from model
    predictions = model.predict(data)
    # return prediction as JSON serialisable format
    return predictions.tolist()
```

#### 2.2 Creating & Defining Environment

For the model to be called and run, you need a python environment in which to run the python entry script above. 

You can configure this environment using a conda config file - created using CondaDependencies class. 

Example:

```python
from azureml.core.conda_dependencies import CondaDependencies

# add dependencies model needs
myenv = CondaDependencies()
myenv.add_conda_package('scikit-learn')

# save environment config as a .yml file
env_file = 'service_files/env.yml'
with open(env_file, 'w') as f:
    f.write(myenv.serialisable_to_string())
print('saved dependency info')
```

#### 2.3 Combining in a InferenceConfig file

Here, you will combine the entry script and the environment config file.

example:
```python
from azureml.core.model import InferenceConfig

classifier_inference_config = InferenceConfig(
    runtime='python',
    source_directory='service_files',
    entry_script='score.py',
    conda_file='env.yml'
)
```

### 3. Define a Deployment Config

Now, where will the script and its dependency file be deployed to? You need some compute!

Here is an example: deploying to `AKS`
Remember, for deploying to AKS, you need to define the compute and the cluster:

1. Create compute target
```python
from azureml.core.compute import ComputeTarget, AksCompute

cluster_name='aks-cluster'
compute_config = AksCompute.provisioning_configuration(location='westeu')
production_cluster = ComputeTarget.create(ws, cluster_name, compute_config)
```

2. define deployment config
We will set the target-specific compute spec we need for container deployment
```python
from azureml.core.webservice import AksWebservice

classifier_deploy_config = AksWebservice.deploy_configuration(
    cpu_cores=1,
    memory_gb=1
)
```

Above, we defined the steps for an AKS (cluster and compute). But for deploying to ACI:
- you do not need to create an ACI compute target
- use `deploy_configuration` class from `azureml.core.webservice.AciWebservice`

### 4. Deploy the Model

Now we just need to deploy it!

To do this, call `deploy` on the `Model` class.

```python
from azureml.core.model import Model

model = ws.models['classification_model']

service = Model.deploy(
    workspace=ws,
    models=[model],
    inference_config= classifier_inference_config,
    deployment_config= classifier_deploy_config,
    deployment_target=production_cluster
)

service.wait_for_deployment(show_output = True)

<hr>

## Consuming Real-time inferencing service

You can consume via a REST endpoint or via the aml sdk.

### Using AML SDK

- This is mostly for testing 
- call web service using the run method of a WebService object that references the deployed service

Example:
```python
import json

x_new = [[0.1,2.3,4.1,2.0],[0.2,1.8,3.9,2.1]]

json_data = json.dumps({
    'data':x_new
})

predictions = json.loads(response)

for i in range(len(x_new)):
    print(x_new[i], predictions[i])
```

### Using Rest endpoint

in prod, most apps wont use AML SDK. instead, it will use the REST endpoint.

How to determine the REST endpoint:
- In AML Studio
- using the `scoring_uri` variable of WebService object

example:
```python
endpoint = service.scoring_uri
print(endpoint)
```

What about authentication?
You may want to restrict access to the endpoint by apps by using authentication. You can either use:
- Key: requests are authenticated by specifying a key
- Token: requests are authenticated by providing a JSON web token (JWT)

ACI disables authentication & AKS is set to key-based.

if you have an authenticated session with the ws, you can retreive the keys for a service by using **get_keys** method of the **WebService** obj.

For token-based auth, you have to use service-principle auth for your client app. This verifies the identitiy through Azure AD and will call **get_token**.

## Troubleshooting service deployment

A real-time deployment has a lot of moving parts:
- trained model
- runtime environment config
- scoring script
- container image
- container host

But how can we troubleshoot this! 
There are a few options:
- check the service state
- review service logs
- deploy to a local container 

### 1. Check the service state
you can check the status of a service by checking the **state**. 

to view the state, you must use the compute-specific service type (not the generic WebService, but instead something like **Aks**Webservice)

Here is an example:
```python
from azureml.core.webservice import AksWebservice

service = AciWebservice(name='classifier-service', workspace=ws)
print(service.state)
```

If you get a ***Healthy*** back, you're all set!

### 2. Review Service logs
call the **get_logs()** method of a service object. The logs include detailed info about the provisioning of the service.

### 3. Deploy to local container
deployment and runtime errors are easier to diagnose on a local container, as you can make changes to scoring files that is referenced in the inference config, and reload the service without redploying it. You can only do this in the local container/deployment.