Licensed under the MIT License.
# Predict automobile price with the designer

Azure Machine Learning pipelines organize multiple machine learning and data processing steps into a single resource. Pipelines let you organize, manage, and reuse complex machine learning workflows across projects and users.

You will learn how to:

- Create a new pipeline.
- Import data.
- Prepare data.
- Train a machine learning model.
- Evaluate a machine learning model.

Sign in to https://ml.azure.com and select the workspace you want to work with.

## Create and load dataset

Note: 

Download the datasets on computer that you are using to run the web browser

Datasets are available at https://github.com/MaheshSQL/AzureMLWorkshop -> Datasets

    Click the filename -> Raw -> Save as (Right click)

This is required only when using Compute Instance.

Before you configure your experiment, upload your data file to your workspace in the form of an Azure Machine Learning dataset. Doing so, allows you to ensure that your data is formatted appropriately for your experiment.

1. Create a new dataset by selecting Datasets -> Create dataset -> From local files

    a. On the Basic info form, give your dataset a name (e.g. Automobile price data) and provide an optional description. The automated ML interface currently only supports TabularDatasets, so the dataset type should default to Tabular. Give your dataset a unique name and provide an optional description.

    b. Select Next on the bottom left

    c. On the Datastore and file selection form, select the default datastore that was automatically set up during your workspace creation, workspaceblobstore (Azure Blob Storage). This is where you'll upload your data file to make it available to your workspace.

    d. Select Browse.

    e. Choose the automobile_price file on your local computer. This file available in 'C:\Azure ML Labs\Datasets' folder.    

    f. Select Next on the bottom left, to upload it to the default container that was automatically set up during your workspace creation.

    g. When the upload is complete, the Settings and preview form is pre-populated based on the file type.

    h. Verify that the Settings and preview form is populated as follows and select Next.

![settings-image](.\.\Images\1.png "Settings and preview")

   i. Select Next.
   
   j. Ensure that Type for price column is Decimal.

![schema-image](.\.\Images\13.png "Schema")


k. Select Create to complete the creation of your dataset.

  

## Create the pipeline



1. Select Designer and click on + icon to create a new pipeline

![launch-designer-Url](https://docs.microsoft.com/en-us/azure/machine-learning/media/tutorial-designer-automobile-price-train-score/launch-designer.png "launch-designer")

2. Select Easy-to-use prebuilt modules.

3. At the top of the canvas, select the default pipeline name Pipeline-Created-on-XX-XX-XXXX. Rename it to Lab-02-Pipeline. The name doesn't need to be unique.

## Set the default compute target

A pipeline runs on a compute target, which is a compute resource that's attached to your workspace.

You can set a Default compute target for the entire pipeline, which will tell every module to use the same compute target by default. However, you can specify compute targets on a per-module basis.

1. Next to the pipeline name, select the Gear icon Screenshot of the gear icon at the top of the canvas to open the Settings pane.

2. In the Settings pane to the right of the canvas, select Select compute target.



![select-compute-image](.\.\Images\9.png "Select Compute for Pipeline")

3. Select Save.

(If you already have an available compute target, you can select it to run this pipeline as shown above)

## Import data

There are several sample datasets included in the designer for you to experiment with. For this tutorial, use <i>Automobile price data</i> dataset we created in step before.

1. To the left of the pipeline canvas is a palette of datasets and modules. Select Datasets to view the registered datasets.
<br><br>
2. Select the dataset Automobile price data that you have created just now, and drag it onto the canvas.

![select-dataset-image](.\.\Images\12.png "Select dataset")

## Visualize the data

You can visualize the data to understand the dataset that you'll use.

1. Right-click the Automobile price data and select Visualize -> Data output

2. Select the different columns in the data window to view information about each one.

Each row represents an automobile, and the variables associated with each automobile appear as columns. There are 205 rows and 26 columns in this dataset.

![visualise-data-image](.\.\Images\10.png "Visualise Data")

Click Close once you have completed the data exploration.

## Prepare data

Datasets typically require some preprocessing before analysis. You might have noticed some missing values when you inspected the dataset. These missing values must be cleaned so that the model can analyze the data correctly.

### Remove a column

When you train a model, you have to do something about the data that's missing. In this dataset, the normalized-losses column is missing many values, so you will exclude that column from the model altogether.

1. In the module palette to the left of the canvas, expand the Data Transformation section and find the Select Columns in Dataset module.

2. Drag the Select Columns in Dataset module onto the canvas. Drop the module below the dataset module.

3. Connect the Automobile price data (Raw) dataset to the Select Columns in Dataset module. Drag from the dataset's output port, which is the small circle at the bottom of the dataset on the canvas, to the input port of Select Columns in Dataset, which is the small circle at the top of the module.

Tip: You create a flow of data through your pipeline when you connect the output port of one module to an input port of another.

![connect-modules-Url](https://docs.microsoft.com/en-us/azure/machine-learning/media/tutorial-designer-automobile-price-train-score/connect-modules.gif "connect-modules")

4. Select the Select Columns in Dataset module.

5. In the module details pane to the right of the canvas, select Edit column.

6. Expand the Column names drop down next to Include, and select 'All columns'.

7. Select the + to add a new rule.

8. From the drop-down menus, select 'Exclude' and 'Column names'.

9. Enter normalized-losses in the text box.



![exclude-column-Url](https://docs.microsoft.com/en-us/azure/machine-learning/media/tutorial-designer-automobile-price-train-score/exclude-column.png "exclude-column")

10. In the lower right, select Save to close the column selector.

11. Select the Select Columns in Dataset module.

12. In the module details pane to the right of the canvas, select the 'Comment' text box and enter <i>Exclude normalized losses</i>

Comments will appear on the graph to help you organize your pipeline.

### Clean missing data

Your dataset still has missing values after you remove the normalized-losses column. You can remove the remaining missing data by using the Clean Missing Data module.

Tip: Cleaning the missing values from input data is a prerequisite for using most of the modules in the designer.

1. In the module palette to the left of the canvas, expand the section Data Transformation, and find the 'Clean Missing Data' module.

2. Drag the Clean Missing Data module to the pipeline canvas. Connect it to the Select Columns in Dataset module.

3. Select the Clean Missing Data module.

4. In the module details pane to the right of the canvas, select Edit Column.

5. In the Columns to be cleaned window that appears, expand the drop-down menu next to Include. Select, 'All columns'

6. Select Save

7. In the module details pane to the right of the canvas, select 'Remove entire row' under 'Cleaning mode'.

8. In the module details pane to the right of the canvas, select the Comment box, and enter <i>Remove missing value rows</i>

Your pipeline should now look something like this:

![pipeline-clean-Url](https://docs.microsoft.com/en-us/azure/machine-learning/media/tutorial-designer-automobile-price-train-score/pipeline-clean.png "pipeline-clean")

### Train a machine learning model

Now that you have the modules in place to process the data, you can set up the training modules.

Because you want to predict price, which is a number, you can use a regression algorithm. For this example, you use a linear regression model.

#### Split the data

Splitting data is a common task in machine learning. You will split your data into two separate datasets. One dataset will train the model and the other will test how well the model performed.

1. In the module palette, expand the section Data Transformation and find the 'Split Data' module.

2. Drag the Split Data module to the pipeline canvas.

3. Connect the <i>left</i> port of the Clean Missing Data module to the Split Data module.

Important: Be sure that the left output ports of Clean Missing Data connects to Split Data. The left port contains the the cleaned data. The right port contains the discarded data.

4. Select the Split Data module.

5. In the module details pane to the right of the canvas, set the Fraction of rows in the first output dataset to 0.7.

This option splits 70 percent of the data to train the model and 30 percent for testing it. The 70 percent dataset will be accessible through the left output port. The remaining data will be available through the right output port.

6. In the module details pane to the right of the canvas, select the Comment box, and enter <i>Split the dataset into training set (0.7) and test set (0.3)</i>

#### Train the model

Train the model by giving it a dataset that includes the price. The algorithm constructs a model that explains the relationship between the features and the price as presented by the training data.

1. In the module palette, expand Machine Learning Algorithms.

This option displays several categories of modules that you can use to initialize learning algorithms.

2. Select Regression > Linear Regression, and drag it to the pipeline canvas.

3. In the module palette, expand the section Module training, and drag the Train Model module to the canvas.

4. Connect the output of the Linear Regression module to the left input of the Train Model module.

5. Connect the training data output (left port) of the Split Data module to the right input of the Train Model module. (See screenshot below)

Important: Be sure that the left output ports of Split Data connects to Train Model. The left port contains the the training set. The right port contains the test set.

![pipeline-train-model-Url](https://docs.microsoft.com/en-us/azure/machine-learning/media/tutorial-designer-automobile-price-train-score/pipeline-train-model.png "pipeline-train-model")

6. Select the Train Model module.

7. In the module details pane to the right of the canvas, select Edit column selector.

8. In the Label column dialog box, expand the drop-down menu and select 'Column names'.

9. In the text box, select 'price' to specify the value that your model is going to predict.

10. Click Save

Important: Make sure you enter the column name exactly. Do not capitalize price.



![label-colum-image](.\.\Images\11.png "Label Column")

Your pipeline should look like this:

![pipeline-train-graph-Url](https://docs.microsoft.com/en-us/azure/machine-learning/media/tutorial-designer-automobile-price-train-score/pipeline-train-graph.png "pipeline-train-graph")

#### Add the Score Model module

After you train your model by using 70 percent of the data, you can use it to score the other 30 percent to see how well your model functions.

Enter score model in the search box to find the 'Score Model' module. Drag the module to the pipeline canvas.

Connect the output of the Train Model module to the left input port of Score Model. 

Connect the test data output (right port) of the Split Data module to the right input port of Score Model.

#### Add the Evaluate Model module

Use the Evaluate Model module to evaluate how well your model scored the test dataset.

1. Enter evaluate in the search box to find the Evaluate Model module. Drag the module to the pipeline canvas.

2. Connect the output of the Score Model module to the left input of Evaluate Model.

The final pipeline should look something like this:

![pipeline-final-graph-Url](https://docs.microsoft.com/en-us/azure/machine-learning/media/tutorial-designer-automobile-price-train-score/pipeline-final-graph.png "pipeline-final-graph")

#### Submit the pipeline

Now that your pipeline is all setup, you can submit a pipeline run to train your machine learning model. You can submit a valid pipeline run at any point, which can be used to review changes to your pipeline during development.

1. At the top of the canvas, select Submit.

2. In the Set up pipeline run dialog box, select 'Create new' experiment.

    Note: Experiments group similar pipeline runs together. If you run a pipeline multiple times, you can select the same experiment for successive runs.

    a. Enter a Lab-02-Regression as the New experiment Name.

    b. Select Submit.

You can view run status and details at the top right of the canvas.

If is the first run, it may take up to 20 minutes for your pipeline to finish running. The default compute settings have a minimum node size of 0, which means that the designer must allocate resources after being idle. Repeated pipeline runs will take less time since the compute resources are already allocated. Additionally, the designer uses cached results for each module to further improve efficiency.

#### View scored labels

After the run completes, you can view the results of the pipeline run. First, look at the predictions generated by the regression model.

1. Right click the Score Model module, and select Visualize to view its output.

Here you can see the predicted prices and the actual prices from the testing data.

![score-result-Url](https://docs.microsoft.com/en-us/azure/machine-learning/media/tutorial-designer-automobile-price-train-score/score-result.png "score-result")

#### Evaluate models

Use the Evaluate Model to see how well the trained model performed on the test dataset.

1. Right-click the Evaluate Model module and select Visualize to view its output.

The following statistics are shown for your model:

- Mean Absolute Error (MAE): The average of absolute errors. An error is the difference between the predicted value and the actual value.

- Root Mean Squared Error (RMSE): The square root of the average of squared errors of predictions made on the test dataset.

- Relative Absolute Error: The average of absolute errors relative to the absolute difference between actual values and the average of all actual values.

- Relative Squared Error: The average of squared errors relative to the squared difference between the actual values and the average of all actual values.

- Coefficient of Determination: Also known as the R squared value, this statistical metric indicates how well a model fits the data.


For each of the error statistics, smaller is better. A smaller value indicates that the predictions are closer to the actual values. For the coefficient of determination, the closer its value is to one (1.0), the better the predictions.

![evaluate-image](.\.\Images\14.png "Evaluate")

### --- End ---

In [7]:
#Increase width
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:80% !important; }</style>"))