# Train a prediction model on R using the glm() function

**Requirements** - In order to benefit from this tutorial, you will need:
- A basic understanding of Machine Learning
- An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)
- An Azure ML workspace. [Check this notebook for creating a workspace](/sdk/resources/workspace/workspace.ipynb) 
- A Compute Cluster. [Check this notebook to create a compute cluster](/sdk/resources/compute/compute.ipynb)
- A python environment
- Installed Azure Machine Learning Python SDK v2 - [install instructions](/sdk/README.md#getting-started)

**Learning Objectives** - By the end of this tutorial, you should be able to:
- Connect to your AML workspace from the Python SDK
- Create and run a `CommandJob` which executes a R command to run a training script
- Use a local file as an `input` to the CommandJob
- Create a custom `environment` from a docker context to run R commands

**Motivations** - This notebook explains how to setup and run a CommandJob. The CommandJob is a fundamental construct of Azure Machine Learning. It can be used to run a task on a specified compute (either local or on the cloud). The CommandJob accepts `environment` and `compute` to setup required infrastructure. You can define a `command` to run on this infrastructure with `inputs`. In this command job we will examine how to create an environment for R and run an R script using the CommandJob.

# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

## 1.1. Import the required libraries

In [None]:
#import required libraries
from azure.ml import MLClient
from azure.ml.entities import CommandJob, JobInput, Environment, Dataset
from azure.identity import InteractiveBrowserCredential
from azure.ml.entities._assets.environment import BuildContext

## 1.2. Configure workspace details and get a handle to the workspace
To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ml` to get a handle to the required Azure Machine Learning workspace. We use the default [interactive authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.interactivebrowsercredential?view=azure-python) for this tutorial. More advanced connection methods can be found [here](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python).

In [None]:
#Enter details of your AML workspace
subscription_id = '<SUBSCRIPTION_ID>'
resource_group = '<RESOURCE_GROUP>'
workspace = '<AML_WORKSPACE_NAME>'

In [None]:
#get a handle to the workspace
ml_client = MLClient(InteractiveBrowserCredential(), subscription_id, resource_group, workspace)

# 2. Configure and run the CommandJob
In this section we will configure and run the CommandJob

## 2.1 Configure the CommandJob
The command job needs the following to be setup
- `code_local_path` - This is the path where the code to run the command is located
- `command` - This is the command that needs to be run
- `inputs` - This is the dictionary of inputs to the CommandJob. To send it files or folders, we can use the `JobInput` class which  can be used to configure inputs of 3 types: `file`, `folder` or `dataset`.
    - `file` - This can be used for local files or remote files. For remote files - http/https, wasb are supported. 
    - `folder` - This can be used for local folders or remote folders. For remote files - http/https, wasb are supported
    - `dataset` - To use datasets as input, you can use a registered dataset in the workspace using the format '<dataset_name>:<version>' OR you can use a local file or folder as a dataset. For e.g JobInput(dataset='my_dataset:1') OR JobInput(dataset=Dataset(local_path="./data")). In this example, we use a dataset which is created from a local folder
- `environment` - This is the environment needed for the command to run. Curated or custom environments from the workspace can be used. Or a custom environment can be created and used as well. Check out the [environment](/sdk/assets/environment/environment.ipynb) notebook for more examples. In this example we create a custom environment from a docker context defined on the local machine.
- `compute` - The compute on which the CommandJob will run. In this example we are using a compute called `cpu-cluster` present in the workspace. You can replace it any other compute in the workspace. You can run it on the local machine by using `local` for the compute. This will run the CommandJob on the local machine and all the run details and output of the job will be uploaded to the Azure ML workspace.
- `display_name` - The display name of the Job
- `description` - The description of the experiment


In [None]:
job = CommandJob(
    code_local_path='./src',
    command= 'Rscript accidents.R --data ${{inputs.training_data}}',
    inputs={'training_data': JobInput(dataset=Dataset(local_path="./data"), mode='ro_mount')},
    environment=Environment(build=BuildContext(local_path='./docker-context')),
    compute='cpu-cluster',
    display_name='r-accidents-example',
    experiment_name = 'r-accidents-example',
    description='Train a GLM using R on the accidents dataset.'
)

## 2.2 Run the CommandJob
Using the `MLClient` created earlier, we will now run this CommandJob in the workspace.

In [None]:
#submit the command job
returned_job = ml_client.create_or_update(job)
#get a URL for the status of the job
returned_job.services["Studio"].endpoint

# Next Steps
You can see further examples of running a job [here](/sdk/jobs/single-step/)