Predicting job salary using Neural Network model on Azure ML (NLP)

This projects shows how to use Azure ML to train a model and deploy it as a web service. Data is mostly unstructured, so we'll use NLP to extract features from text columns. The goal is to predict the salary based on the job description, title etc.

High-level steps:

Start with jupyter notebook as a prototype.
Refactor notebook to scripts.
- Clean non-essential code.
- Use functions.
- Add logging.
Create all necessary Azure resources (resource group, workspace, compute cluster, etc.)
Create and run the pipeline.
Register the model.
Create the endpoint.
Test the endpoint.

Low-level steps:

Data science:

Split the data into train, validation, and test sets. This is the first to avoid data leakage.
Featurize categorical columns.
- One-hot encoding for categorical columns.
- Tokenize and vectorize text columns.
Transform the data into tensors.
Create a simple neural network.
- Combine vectorized (one-hot encoded) and featurized text columns (bag of words).
Train the model.
Evaluate the model.

Moving to Azure ML:

Setup the environment

Install and activate the conda environment by executing the following commands:

conda env create -f environment.yml
conda activate azure_ml_sandbox

Training and deploying in cloud

Upload the data to Azure ML Datastore. In my case, I'm using a single file, so the type=uri-file is used in data.yml.

Create compute cluster:

az ml compute create -f cloud/cluster-cpu.yml -g <resource-groupe-name> -w <workspace-name>

Create the dataset we'll use to train the model:

az ml data create -f cloud/data.yml -g <resource-groupe-name> -w <workspace-name>

NOTE: For file more than 100MB, compress or use azcopy

Create the components:

az ml component create -f cloud/01_split.yml
az ml component create -f cloud/02_preprocess.yml
az ml component create -f cloud/03_train.yml
az ml component create -f cloud/04_test.yml

Create and run the pipeline.

run_id=$(az ml job create -f cloud/pipeline-job.yml --query name -o tsv)

Download the trained model

az ml job download --name $run_id --output-name "model_dir"

Create the Azure ML model from the output.

az ml model create --name model-pipeline-cli --version 1 --path "azureml://jobs/$run_id/outputs/model_dir" --type mlflow_model

Create the endpoint

az ml online-endpoint create -f cloud/endpoint.yml
az ml online-deployment create -f cloud/deployment.yml --all-traffic

Test the endpoint

az ml online-endpoint invoke --name endpoint-pipeline-cli --request-file test_data/images_azureml.json

Clean up the endpoint, to avoid getting charged.

az ml online-endpoint delete --name endpoint-pipeline-cli -y

Useful commands:

az version

# Output:
{
  "azure-cli": "2.53.0",
  "azure-cli-core": "2.53.0",
  "azure-cli-telemetry": "1.1.0",
  "extensions": {}
}

If ML extension is not installed:

az extension add -n ml

# Output:

{
  "azure-cli": "2.53.0",
  "azure-cli-core": "2.53.0",
  "azure-cli-telemetry": "1.1.0",
  "extensions": 
  {
    "ml": "2.20.0"
  }
}

Run the help command to verify your installation and see available subcommands:

az ml -h

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.devcontainer		.devcontainer
cloud		cloud
notebooks		notebooks
src		src
.gitignore		.gitignore
NOTES.md		NOTES.md
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting job salary using Neural Network model on Azure ML (NLP)

Training and deploying in cloud

References

About

Releases

Packages

Languages

avoytkiv/azml_predict_salary_nlp

Folders and files

Latest commit

History

Repository files navigation

Predicting job salary using Neural Network model on Azure ML (NLP)

Training and deploying in cloud

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages