This projects shows how to use Azure ML to train a model and deploy it as a web service. Data is mostly unstructured, so we'll use NLP to extract features from text columns. The goal is to predict the salary based on the job description, title etc.
High-level steps:
- Start with jupyter notebook as a prototype.
- Refactor notebook to scripts.
- Clean non-essential code.
- Use functions.
- Add logging.
- Create all necessary Azure resources (resource group, workspace, compute cluster, etc.)
- Create and run the pipeline.
- Register the model.
- Create the endpoint.
- Test the endpoint.
Low-level steps:
Data science:
- Split the data into train, validation, and test sets. This is the first to avoid data leakage.
- Featurize categorical columns.
- One-hot encoding for categorical columns.
- Tokenize and vectorize text columns.
- Transform the data into tensors.
- Create a simple neural network.
- Combine vectorized (one-hot encoded) and featurized text columns (bag of words).
- Train the model.
- Evaluate the model.
Moving to Azure ML:
- Create a resource group.
- Create a workspace.
- Create a compute cluster.
- Load the data to Azure ML Datastore.
- Create the components.
- Create and run the pipeline.
Setup the environment
Install and activate the conda environment by executing the following commands:
conda env create -f environment.yml
conda activate azure_ml_sandbox
![Screenshot 2023-10-02 at 15 20 37](https://private-user-images.githubusercontent.com/74664634/272091928-dc615a70-dcd0-46b2-bd9f-04146004731a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjEyNDkxMjksIm5iZiI6MTcyMTI0ODgyOSwicGF0aCI6Ii83NDY2NDYzNC8yNzIwOTE5MjgtZGM2MTVhNzAtZGNkMC00NmIyLWJkOWYtMDQxNDYwMDQ3MzFhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzE3VDIwNDAyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTAzNDkzMzQxMzIwMTBjYThlM2Y0ZjI1YzlkZGNjZmZkOWJhMjBkMWI2NmUwZmQzNzgwNjBiMjNkOTQwNDRhM2EmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.M93BVOXAqmI8pIS2vOlGZCWj_lL1CwL5fm4xv6an0D4)
Upload the data to Azure ML Datastore. In my case, I'm using a single file, so the type=uri-file
is used in data.yml
.
Create compute cluster:
az ml compute create -f cloud/cluster-cpu.yml -g <resource-groupe-name> -w <workspace-name>
Create the dataset we'll use to train the model:
az ml data create -f cloud/data.yml -g <resource-groupe-name> -w <workspace-name>
NOTE: For file more than 100MB, compress or use azcopy
Create the components:
az ml component create -f cloud/01_split.yml
az ml component create -f cloud/02_preprocess.yml
az ml component create -f cloud/03_train.yml
az ml component create -f cloud/04_test.yml
Create and run the pipeline.
run_id=$(az ml job create -f cloud/pipeline-job.yml --query name -o tsv)
Download the trained model
az ml job download --name $run_id --output-name "model_dir"
Create the Azure ML model from the output.
az ml model create --name model-pipeline-cli --version 1 --path "azureml://jobs/$run_id/outputs/model_dir" --type mlflow_model
![Screenshot 2023-10-02 at 15 41 21](https://private-user-images.githubusercontent.com/74664634/272095386-40852c97-5dad-4781-b7cd-30e844f060e7.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjEyNDkxMjksIm5iZiI6MTcyMTI0ODgyOSwicGF0aCI6Ii83NDY2NDYzNC8yNzIwOTUzODYtNDA4NTJjOTctNWRhZC00NzgxLWI3Y2QtMzBlODQ0ZjA2MGU3LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzE3VDIwNDAyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTExZTRhODIyMGEyNjkzODI3M2QxN2EzOWQ1NjU2ZDhlOGQwOWZkNTUwN2ZiMzk4Y2VhNWIzNjE4NzI0MWU0YTcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.xwCpH1CYui0GgSSIF-mld9R0zJCGBcsFxCuFVJ-e1LY)
Create the endpoint
az ml online-endpoint create -f cloud/endpoint.yml
az ml online-deployment create -f cloud/deployment.yml --all-traffic
Test the endpoint
az ml online-endpoint invoke --name endpoint-pipeline-cli --request-file test_data/images_azureml.json
Clean up the endpoint, to avoid getting charged.
az ml online-endpoint delete --name endpoint-pipeline-cli -y
Useful commands:
az version
# Output:
{
"azure-cli": "2.53.0",
"azure-cli-core": "2.53.0",
"azure-cli-telemetry": "1.1.0",
"extensions": {}
}
If ML extension is not installed:
az extension add -n ml
# Output:
{
"azure-cli": "2.53.0",
"azure-cli-core": "2.53.0",
"azure-cli-telemetry": "1.1.0",
"extensions":
{
"ml": "2.20.0"
}
}
Run the help command to verify your installation and see available subcommands:
az ml -h