# Train Model using a Script

This Notebook shows how to train a model using a Python script rather than an interactive Notebook (cf `2_train_notebook.ipynb`). The training script can be found in `scripts/train.py` and performs roughly the same steps as the Notebook version. However, it only registers the model if a certain performance threshold is reached.

In this Notebook you will only find code to submit the training script on the local machine. The example also shows how to set up a specific Python version and packages using Anaconda. AzureML will create a new Anaconda environment if no local environment is found that matches the correct specification.

You can use `conda env list` at the terminal to check whether a new environment was created. Environments created by AzureML will show up as `azureml_<hash>` in the list, where the hash is based on the Environment specification in AzureML.

In [1]:
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core import (
    Workspace,
    Experiment,
    Environment,
    ScriptRunConfig,
)

In [2]:
# Setup Anaconda dependencies including Python version and packages.
deps = CondaDependencies()
deps.set_python_version("3.8")
deps.add_conda_package("numpy")
deps.add_conda_package("pandas")
deps.add_conda_package("scikit-learn>=0.24")

# Prints an Anaconda environment file in YAML format from the above specifications
print(deps.serialize_to_string())

# Conda environment specification. The dependencies defined in this file will
# be automatically provisioned for runs with userManagedDependencies=False.

# Details about the Conda environment file format:
# https://conda.io/docs/user-guide/tasks/manage-environments.html#create-env-file-manually

name: project_environment
dependencies:
  # The python interpreter version.
  # Currently Azure ML only supports 3.5.2 and later.
- python=3.8

- pip:
    # Required packages for AzureML execution, history, and data preparation.
  - azureml-defaults

- numpy
- pandas
- scikit-learn>=0.24
channels:
- anaconda
- conda-forge



In [3]:
# Create a new AzureML Environment
env = Environment("custom_anaconda")

# Plug in the Anaconda dependencies
env.python.conda_dependencies = deps

env

{
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/intelmpi2018.3-ubuntu16.04:20210301.v1",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "enabled": false,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": "2g"
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": null,
    "name": "custom_anaconda",
    "python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "anaconda",
                "conda-for

In [4]:
# Create training run configuration for the local machine
run_cfg = ScriptRunConfig(
    source_directory="scripts/",
    script="train.py",
    compute_target="local",
    environment=env,
)

In [5]:
# Connect to the Workspace and run the exp
ws = Workspace.from_config()

exp = Experiment(ws, "azureml_demo")
script_run = exp.submit(run_cfg)