# Visualizing Amazon SageMaker machine learning predictions with Amazon QuickSight

Updated guide for the relevant [blog post](https://aws.amazon.com/blogs/machine-learning/making-machine-learning-predictions-in-amazon-quicksight-and-amazon-sagemaker)

## Step 0: Prepare a SageMaker inference pipeline and a test dataset

First run the `customer_churn.ipynb`. This notebook will:

- Train a SageMaker inference pipeline (preprocessing + model)
- Create a test dataset, and upload it to your S3 bucket
- Will generate a manifest file that will be needed when importing the dataset to QuickSight

## Step 1: Setup a QuickSight **Enterprise Edition** account

Open the AWS Console and navigate to Amazon QuickSight. Select **Sign up for QuickSight**.

![example](images/1.png)

Select the **Enterprise Edition** and click **Continue**

![example](images/2.png)


You will have to choose a **unique** username. Keep all the other selections to default. Particularly, **don't change the AWS Region**. 

![example](images/3.png)

You can also add additional resources to QuickSight, namely S3 (where the dataset resides) and Amazon SageMaker (for getting ML predictions on the imported dataset). **Check the S3 and SageMaker options**. Alternatively, you can also do this on a later stage, from within QuickSight, by clicking the **Manage QuickShight** option on the upper right corner.

![example](images/5.png)

When you check the S3 option, you will have to specifically select the S3 Bucket that you created, in order for QuickSight to be able to access the new dataset.

![example](images/6.png)

After that, you just click **Go to Amazon QuickSight**.

![example](images/7.png)

## Step 2: Create a new dataset

Once inside QuickSight, choose **Datasets** and then **New dataset**.

![example](images/8.png)

Select **S3** as an option for the location of the dataset that you will import.

![example](images/9.png)

In "New S3 data source" select the S3 URI where the dataset is located. You can copy-paste this URI from the second cell before the end, in the `customer_churn.ipynb`.

You have to also upload a manifest JSON file that describers the schema of this dataset. In the final cell of `customer_churn.ipynb`, this file is generated for you. Just download the `manifest.json` file locally, and then uploaded in QuickSight.

![example](images/10.png)

After that, you will receive a confirmation that the dataset creation process was finished. You can either now click **Edit/Preview data** to explore the original dataset, or directly select **Augment with SageMaker** in order to import the outputs of an inference pipeline.

![example](images/11.png)

## Step 3: Augment the dataset with the outputs of a SageMaker model

Either directly from the dataset confirmation window, or from the dataset exploration page, click **Augment with SageMaker**.

![example](images/12.png)

After that, you will have to select the SageMaker model that you will use, as well as, a schema JSON file that describes the input / outputs and additional properties of the model. In our case, the model name will start with `QS-inference-pipeline-xxx`. The schema JSON file is already included in the repository that you have cloned. Just download the `Churn_schema.json` file locally, and then upload it to QuickSight. After that, you will see a preview of the inputs and outputs of the model.

![example](images/13.png)

QuickShight attempts to match the fields between the dataset and the model schemas. If there is no direct match for some fields, you can manually select the appropriate one. Here just select the `Int'l Plan` for the field `Intl Plan`. Then click Next. 

![example](images/14.png)

After that you will be asked to review the model's output name, and add a description. You can finish the importing process by clicking **Prepare data**.

![example](images/15.png)


This adds an additional column (called *Churn*) at the dataset on your dataset, which will come from the outputs of the SageMaker model. 
If now you click **SAVE & PUBLISH** on the upper right corner of the screen, this will save the whole dataset and it will trigger SageMaker to run a **Batch Transform** inference prediction job, in order to get the outputs predictions for your dataset. Depending on the size of the dataset and the model pipeline, this will take a few minutes  up to hours. In our case, it will take approximately 7 minutes. 

![example](images/16.png)

Clicking **PUBLISH & VISUALIZE** will take you to the visualization window. There you will be able to use the predicted values of the model and include them in graphs and dashboards. 

![example](images/17.png)

For more directions on how to create visualizations for this example, please follow the steps of this [blog post](https://aws.amazon.com/blogs/machine-learning/making-machine-learning-predictions-in-amazon-quicksight-and-amazon-sagemaker)