In this code pattern, you will build time-series machine learning models and visualize the results using IBM Cloud Pak for Data Jupyter Notebooks, AutoAI and Embedded Dashboard on Amazon Web Services (AWS) Cloud. Developers will learn both Code and No Code approach to build models and visualize the results.
When you have completed this code pattern, you will understand how to:
- Build classification and time-series models in IBM Cloud Pak for Data Watson Studio Jupyter notebooks.
- Visualize data in IBM Cloud Pak for Data Cognos Dashboard Embedded.
- Build and compare different classification models in IBM Cloud Pak for Data AutoAI Experiments.
- Pre-processed datasets are loaded into an Amazon S3 bucket
- The datasets from the S3 bucket are read in Jupyter Notebooks
- Different models are built and evaluated in Jupyter Notebooks and the final prediction data is stored back into S3 bucket
- The datasets from the S3 bucket is copied into Watson Studio Project and loaded into AutoAI. Different models are built and compared in AutoAI with no code
- The prediction data produced by the Jupyter Notebook models stored in S3 bucket is read by Cognos Dashboard Embedded to visualize the data in the form of interactive dashboard
-
Setup a project in Cloud Pak for Data
- 2.1. Create a Project
- 2.2. Create a Connection to S3
-
Code Approach: Build Prediction Models with Watson Studio
- 3.1. About the Notebooks
- 3.2. Run LSTM Notebook 1
- 3.3. Run LSTM Notebook 2
- 3.4. Run Decision Tree Notebook
-
No Code Approach : Build Prediction Models with IBM Cloud Pak for Data using AutoAI
-
Visualize the Predictions in IBM Cloud Pak for Data Cognos Embedded Dashboard
Create a S3 bucket in AWS by refering to the AWS documentation.
- Creating a bucket
- Click on Create Bucket and enter a name for the bucket. (for example, 'lab4')
- Keep the
Block all public access
enabled and then click onCreate bucket
button.
- Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/
- In the Buckets list, choose the name of the bucket that you created.(for example, 'lab4')
- Click on Upload, select Add files and upload the following files:
- Click on Upload.
In this step, you will learn to create a project and setup a connection to Amazon S3 bucket in your IBM Cloud Pak for Data. This is essential because all the dataset resides in Amazon S3 Bucket.
-
Create a project in IBM Cloud Pak for Data. Click on the Hamburger menu, and select All Projects.
-
Click on New Project.
- Select project type as Analytics project.
- Click on Create a project from file.
- Upload the cpd-project.zip file.
- Enter a project name and click on Create.
-
Once the project is created click on View Project. You should see the overview of the project as shown below.
-
Click on the Assets tab and you will see Data and Notebooks.
-
Click on Add to Project and select Connection.
-
Select Connection type as Amazon S3.
- Enter the credentials to connect to your S3 bucket.
- Click on Test connection and you will see connection successful message if you have entered the correct credentials
- Click on Create.
-
Once the connection is created, you will see the connection in Assets tab under Data assets. With this connection you can access all the datasets present in your S3 bucket from your Cloud Pak for Data project.
In the Code Approach you will learn how to build two types of prediction models in Watson Studio Jupyter Notebooks. As a Developer you will have full control over the model’s hyperparameters and the training data in this section.
What are Hyperparameters? A hyperparameter is a parameter whose value is used to control the learning process in a Machine Learning Algorithm. The same kind of machine learning models can require different constraints, weights or learning rates to generalize different data patterns. Hence we tune or optimize the hyperparameters so that the model can optimally solve the machine learning problem. The section is divided into following sub-sections:
- 3.1. About the Notebooks
- 3.2. Run LSTM Notebook 1
- 3.3. Run LSTM Notebook 2
- 3.4. Run Decision Tree Notebook
-
Click on the Assets tab and you will see the following Notebooks:
- Region-Brussels-LSTM.ipynb
- Region-Wallonia-LSTM.ipynb
- Region-All-Decision-Trees.ipynb
-
The LSTM notebooks are used to build the prediction models to predict future COVID-19 cases for Brussels and Wallonia region respectively. LSTM models are built using the data from the datasets in the S3 bucket. Both the models are built with different hyperparameters.
-
LSTM Model for Brussels region is built with the following hyperparameters:
- train_test_split:
0.70
- lookback:
30
- hidden_layers:
2
- units:
55
,100
- dropouts:
0.15
,0.15
- optimizer:
adam
- learning_rate:
0.001
(default) - epochs:
25
- batch_size:
32
- train_test_split:
-
LSTM Model for Wallonia region is built with the following hyperparameters:
- train_test_split:
0.70
- lookback:
30
- hidden_layers:
2
- units:
60
,100
- dropouts:
0.15
,0.15
- optimizer:
adam
- learning_rate:
0.001
(default) - epochs:
25
- batch_size:
32
- train_test_split:
-
Aditionally, Decision Tree notebook is used to build a model to predict the Risk Index for Brussels, Flanders and Wallonia region.
-
Decision Tree models are built with the following hyperparameters:
- train_test_split:
0.70
- max_depth:
4
- min_samples_split:
2
- min_samples_leaf:
1
- criterion:
entropy
- train_test_split:
3.2. Notebook 1 : Predict future COVID-19 cases for Brussels region with Long Short-Term Memory (LSTM) Model
In this lab exercise, you will learn a popular opensource machine learning algorithm, Long Short-Term Memory (LSTM). You will use this time-series algorithm to build a model from historical data of total COVID-19 cases. Then you use the trained model to predict the future COVID-19 cases.
-
You will refer to the Region-Brussels-LSTM.ipynb notebook.
-
Click on the edit button to open the notebook in edit mode.
-
The notebook should look something as shown below.
-
You need to add the S3 connection to the notebook.
- Click on the empty second code cell in the notebook.
- Click on find and add data button on top right.
- Click on Connections tab.
- You will see your connection variable. Click on Insert to code and select pandas DataFrame.
- Select the ts-brussels-grouped.csv dataset from the connection variable.
-
Verify the dataframe name to be
data_df_1
in the generated code snippet. -
Click on Cell and select Run All to run the notebook.
-
This will run the notebook, it will take some time please be patient.
-
Once the notebook is completed you can observe the following in the notebook:
- Current Trend of COVID-19 cases in Brussels
- LSTM Model Accuracy
- LSTM Model Loss
- LSTM Model Prediction
-
Current Trend of COVID-19 cases in Brussels: The current trend of COVID-19 cases in Brussels is shown in the graph.
-
LSTM Model Accuracy: You can observe the Root Mean Squared Error (RMSE) values are almost similar for training & test data which confirms the accuracy of the model without overfitting or underfitting.
-
LSTM Model Loss: There's no vanishing gradient descent as the LSTM model with optimal configueration has taken care of the gradient descent problem.
-
LSTM Model Prediction: You can observe the model is able to catch the pattern in the data.
-
The following CSV files are generated from the notebook:
- Brussels.csv: This is the dataframe containing the historical COVID-19 cases in Brussels.
- brussels-actualVsPredicted.csv: This is the dataframe containing the actual and predicted COVID-19 cases in Brussels.
- brussels-errorEvaluation.csv: This is the dataframe containing the error evaluation of the model.
- brussels-next7Prediction.csv: This is the dataframe containing the next 7 days prediction of COVID-19 cases in Brussels.
-
These CSV files will be stored to your S3 bucket and Data Assets in your Cloud Pak for Data project.
Note: These CSV files will be used to Visualize the Data in Watson Cognos Dashboard Embedded
You have successfully completed this lab exercise. You can continue to the next lab exercise.
3.3. Notebook 2 : Predict future COVID-19 cases for Wallonia region with Long Short-Term Memory (LSTM) Model
In this lab exercise, you will learn a popular opensource machine learning algorithm, Long Short-Term Memory (LSTM). You will use this time-series algorithm to build a model from historical data of total COVID-19 cases. Then you use the trained model to predict the future COVID-19 cases.
-
You will refer to the Region-Wallonia-LSTM.ipynb notebook.
-
Click on the edit button to open the notebook in edit mode.
-
The notebook should look something as shown below.
-
You need to add the S3 connection to the notebook.
- Click on the empty second code cell in the notebook.
- Click on find and add data button on top right.
- Click on Connections tab.
- You will see your connection variable. Click on Insert to code and select pandas DataFrame.
- Select the ts-brussels-grouped.csv dataset from the connection variable.
-
Verify the dataframe name to be
data_df_1
in the generated code snippet. -
Click on Cell and select Run All to run the notebook.
-
This will run the notebook, it will take some time please be patient.
-
Once the notebook is completed you can observe the following in the notebook:
- Current Trend of COVID-19 cases in Brussels
- LSTM Model Accuracy
- LSTM Model Loss
- LSTM Model Prediction
-
Current Trend of COVID-19 cases in Brussels: The current trend of COVID-19 cases in Brussels is shown in the graph.
-
LSTM Model Accuracy: You can observe the Root Mean Squared Error (RMSE) values are almost similar for training & test data which confirms the accuracy of the model without overfitting or underfitting.
-
LSTM Model Loss: There's no vanishing gradient descent as the LSTM model with optimal configueration has taken care of the gradient descent problem.
-
LSTM Model Prediction: You can observe the model is able to catch the pattern in the data.
-
The following CSV files are generated from the notebook:
- Wallonia.csv: This is the dataframe containing the historical COVID-19 cases in Wallonia.
- wallonia-actualVsPredicted.csv: This is the dataframe containing the actual and predicted COVID-19 cases in Wallonia.
- wallonia-errorEvaluation.csv: This is the dataframe containing the error evaluation of the model.
- wallonia-next7Prediction.csv: This is the dataframe containing the next 7 days prediction of COVID-19 cases in Wallonia.
-
These CSV files will be stored to your S3 bucket and Data Assets in your Cloud Pak for Data project.
Note: These CSV files will be used to Visualize the Data in Watson Cognos Dashboard Embedded
You have successfully completed this lab exercise. You can continue to the next lab exercise.
In this lab exercise, you will learn a popular machine learning algorithm, Decision Tree. You will use this classification algorithm to build a model from historical data of region and their total cases. Then you use the trained decision tree to predict the Risk Index of a region.
-
You will refer to the Region-All-Decision-Tree.ipynb notebook.
-
Click on the edit button to open the notebook in edit mode.
-
The notebook should look something as shown below.
-
Before running the notebook, you need to add the S3 connection to the notebook.
- Click on the third code cell in the notebook.
- Click on find and add data button on top right.
- Click on Connections tab.
- You will see your connection variable. Click on Insert to code and select pandas DataFrame.
- Select the RI-data-ML.csv dataset from the connection variable.
-
Verify the dataframe name to be
data_df_1
in the generated code snippet. -
Click on Cell and select Run All to run the notebook.
-
This will run the notebook, it will take some time please be patient.
-
Once the notebook is completed you can observe the following in the notebook:
- Decision Tree Model Accuracy
- Decision Tree Visualization
-
Decision Tree Model Accuracy: You can observe the accuracy of the model is 86.63%.
-
Decision Tree Visualization: You can observe the decision tree in the notebook.
You have successfully completed this lab exercise.
Note: Next steps - Visualize the Data in Cognos Embedded Dashboard
In this section, you will build and train high-quality predictive models quickly with no-code and simplify AI lifecycle management using Watson Studio's AutoAI. AutoAI automates tasks for data scientists, tasks such as feature engineering and selection, choosing the type of machine learning algorithm; building an analytical model based on the algorithm; hyperparameter optimization, training the model on tested data sets and running the model to generate scores and findings.
-
Navigate to the project, click on Add to project option on top right and select AutoAI experiment as asset type.
-
Add the data file, click on Select from project and select the
RI-data-ML.csv
file from the project's data assets. -
Click on No for creating a time series forecast. We are building a multi class classifier. Select
Risk_Index
as the option to predict and hit Run experiment. -
It will take a couple of minutes to complete the experiment. You will see Experiment completed on the right side of the canvas.
-
Click on first pipeline (Rank 1) and choose Save as option on the top right side.
-
You should see the message Model saved successfully per below. Click on View in project option.
-
Select the option Go to the model in the space after promoting it and click on Promote.
-
Click on New deployment. Select Deployment type as Online, give a name to the deployment and hit Create.
-
It will take a couple of minutes for the deployment. The status should be Deployed per below.
-
Click on model-deploy and you should see the Endpoint and Code Snippets per below.
-
Lets do some predictions. Click on Test tab and input the data using form or Json format.
-
Enter the input data using single or multiple samples (Json). We will try with single sample by giving the input to Region as Brussels & 100 as
Total_cases
and click on Add to list. -
You should see the Input list updated with the sample values. Hit Predict to generate predictions.
-
You can see the predicted value is 1 under Result section which means the risk index is predicted as Low for the input data of Brussels Region with 100 cases on a given day.
You have learnt how to build AI predictive models without any code, deploy the model and generate predictions. Feel free to play around to get comfortable using AutoAI for generating accurate predictions.
In this section, you will learn how to build responsive data visualization in Watson Studio's Cognos Embedded Dashboard. You can build interactive charts, tables, graphs, etc in the Cognos Embedded Dashboard. You will use the data generated in the previous lab exercise to build the following visualizations:
- Current trends and future prediction of COVID-19 cases region wise.
- Model evaluation metrices such as actual vs predicted cases and model loss region wise.
The section is divided into following sub-sections:
You need to create a new Dashboard in your Cloud Pak for Data project in order to visualize the analytical results. In this section you will learn how to setup a new Cognos Embedded Dashboard in your project.
-
Before you get started, download the Covid-19-predictions-dashboard.zip dashboard file and extract the zip file.
-
In the Cloud Pak for Data project, click on Add to Project and select asset type as Dashboard.
- Select create a new dashboard from Local file.
- Upload the
Covid-19-Predictions-Dashboard.json
extracted file. - Enter a name for the dashboard.
- Click on Create.
- Upload the
- Once the dashboard is created, you will see a message saying Missing data asset (1/8).
- To relink the missing data assets, do the following:
- Click on Relink.
- Select Data Assets and select the dataset.
- Link the following data assets:
- Brussels.csv
- brussels-next7Prediction.csv
- wallonia-next7Prediction.csv
- brussels-actualVsPredicted.csv
- brussels-errorEvaluation.csv
- wallonia-actualVsPredicted.csv
- wallonia-errorEvaluation.csv
- Wallonia.csv
Once all the assets are relinked, you will see the dashboard view as shown.
More about the dashboard is explained in the next section.
There are two tabs in the Dashboard Trends and Model Evaluation.
-
Trends Tab has the following widgets for the Brussels and Wallonia regions:
- Total Cases: Shows the total number of cases for the Region.
- Region Map: Shows the map of the Region.
- Current Trends: Shows the current trends for the Region.
- 7 Days Prediction: Shows the 7 days prediction for the Region.
-
Model Evaluation Tab has the following widgets for the Brussels and Wallonia regions:
- Actual vs Predicted: Shows the actual vs predicted values for the model of the perticular Region.
- Model Loss: Shows the model loss for the model of the perticular Region.
The Dashboard is interactive, you can click on any data point from the dashboard to see the details change in realtime.
In this code pattern, you learn't how to build time-series and decision tree machine learning models on IBM Cloud Pak for Data Jupyter Notebooks and visualize the results on IBM Cloud Pak for Data Embedded Dashboard on Amazon Web Services (AWS) Cloud with Code Approach. You also learnt how to build models and deploy them with AutoAI under No Code Approach.
This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 and the Apache License, Version 2.