Skip to content

IBM/ibm-aws-pandemic-management-system-asset-6

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build machine learning models with code/no code in a collaborative data science environment

In this code pattern, you will build time-series machine learning models and visualize the results using IBM Cloud Pak for Data Jupyter Notebooks, AutoAI and Embedded Dashboard on Amazon Web Services (AWS) Cloud. Developers will learn both Code and No Code approach to build models and visualize the results.

When you have completed this code pattern, you will understand how to:

  • Build classification and time-series models in IBM Cloud Pak for Data Watson Studio Jupyter notebooks.
  • Visualize data in IBM Cloud Pak for Data Cognos Dashboard Embedded.
  • Build and compare different classification models in IBM Cloud Pak for Data AutoAI Experiments.

architecture

Flow

  1. Pre-processed datasets are loaded into an Amazon S3 bucket
  2. The datasets from the S3 bucket are read in Jupyter Notebooks
  3. Different models are built and evaluated in Jupyter Notebooks and the final prediction data is stored back into S3 bucket
  4. The datasets from the S3 bucket is copied into Watson Studio Project and loaded into AutoAI. Different models are built and compared in AutoAI with no code
  5. The prediction data produced by the Jupyter Notebook models stored in S3 bucket is read by Cognos Dashboard Embedded to visualize the data in the form of interactive dashboard

Steps

  1. Setup a S3 Bucket

  2. Setup a project in Cloud Pak for Data

  3. Code Approach: Build Prediction Models with Watson Studio

  4. No Code Approach : Build Prediction Models with IBM Cloud Pak for Data using AutoAI

  5. Visualize the Predictions in IBM Cloud Pak for Data Cognos Embedded Dashboard

1. Setup a S3 Bucket

1.1. Create a S3 Bucket

Create a S3 bucket in AWS by refering to the AWS documentation.

  • Creating a bucket
  • Click on Create Bucket and enter a name for the bucket. (for example, 'lab4')
  • Keep the Block all public access enabled and then click on Create bucket button.

1.2. Upload data to s3 bucket

2. Setup a project in Cloud Pak for Data

In this step, you will learn to create a project and setup a connection to Amazon S3 bucket in your IBM Cloud Pak for Data. This is essential because all the dataset resides in Amazon S3 Bucket.

2.1. Create a Project

  • Create a project in IBM Cloud Pak for Data. Click on the Hamburger menu, and select All Projects.

    cpd-projects

  • Click on New Project.

    • Select project type as Analytics project.
    • Click on Create a project from file.
    • Upload the cpd-project.zip file.
    • Enter a project name and click on Create.
  • Once the project is created click on View Project. You should see the overview of the project as shown below.

    cpd-dashboard

  • Click on the Assets tab and you will see Data and Notebooks.

2.2. Create a Connection to S3

  • Click on Add to Project and select Connection.

  • Select Connection type as Amazon S3.

    • Enter the credentials to connect to your S3 bucket.
    • Click on Test connection and you will see connection successful message if you have entered the correct credentials
    • Click on Create.

    successful-connection

  • Once the connection is created, you will see the connection in Assets tab under Data assets. With this connection you can access all the datasets present in your S3 bucket from your Cloud Pak for Data project.

3. Code Approach: Build Prediction Models with Watson Studio

In the Code Approach you will learn how to build two types of prediction models in Watson Studio Jupyter Notebooks. As a Developer you will have full control over the model’s hyperparameters and the training data in this section.

What are Hyperparameters? A hyperparameter is a parameter whose value is used to control the learning process in a Machine Learning Algorithm. The same kind of machine learning models can require different constraints, weights or learning rates to generalize different data patterns. Hence we tune or optimize the hyperparameters so that the model can optimally solve the machine learning problem. The section is divided into following sub-sections:

3.1. About the Notebooks

  • Click on the Assets tab and you will see the following Notebooks:

    • Region-Brussels-LSTM.ipynb
    • Region-Wallonia-LSTM.ipynb
    • Region-All-Decision-Trees.ipynb
  • The LSTM notebooks are used to build the prediction models to predict future COVID-19 cases for Brussels and Wallonia region respectively. LSTM models are built using the data from the datasets in the S3 bucket. Both the models are built with different hyperparameters.

  • LSTM Model for Brussels region is built with the following hyperparameters:

    • train_test_split: 0.70
    • lookback: 30
    • hidden_layers: 2
    • units: 55, 100
    • dropouts: 0.15, 0.15
    • optimizer: adam
    • learning_rate: 0.001 (default)
    • epochs: 25
    • batch_size: 32
  • LSTM Model for Wallonia region is built with the following hyperparameters:

    • train_test_split: 0.70
    • lookback: 30
    • hidden_layers: 2
    • units: 60, 100
    • dropouts: 0.15, 0.15
    • optimizer: adam
    • learning_rate: 0.001 (default)
    • epochs: 25
    • batch_size: 32
  • Aditionally, Decision Tree notebook is used to build a model to predict the Risk Index for Brussels, Flanders and Wallonia region.

  • Decision Tree models are built with the following hyperparameters:

    • train_test_split: 0.70
    • max_depth: 4
    • min_samples_split: 2
    • min_samples_leaf: 1
    • criterion: entropy

3.2. Notebook 1 : Predict future COVID-19 cases for Brussels region with Long Short-Term Memory (LSTM) Model

In this lab exercise, you will learn a popular opensource machine learning algorithm, Long Short-Term Memory (LSTM). You will use this time-series algorithm to build a model from historical data of total COVID-19 cases. Then you use the trained model to predict the future COVID-19 cases.

  • You will refer to the Region-Brussels-LSTM.ipynb notebook.

  • Click on the edit button to open the notebook in edit mode.

    brussels-edit

  • The notebook should look something as shown below.

    notebook-preview

  • You need to add the S3 connection to the notebook.

    • Click on the empty second code cell in the notebook.
    • Click on find and add data button on top right.
    • Click on Connections tab.
    • You will see your connection variable. Click on Insert to code and select pandas DataFrame.
    • Select the ts-brussels-grouped.csv dataset from the connection variable.

    add-data-connection

  • Verify the dataframe name to be data_df_1 in the generated code snippet.

  • Click on Cell and select Run All to run the notebook.

    run-notebook

  • This will run the notebook, it will take some time please be patient.

  • Once the notebook is completed you can observe the following in the notebook:

    • Current Trend of COVID-19 cases in Brussels
    • LSTM Model Accuracy
    • LSTM Model Loss
    • LSTM Model Prediction
  • Current Trend of COVID-19 cases in Brussels: The current trend of COVID-19 cases in Brussels is shown in the graph.

    nb1-current-trend

  • LSTM Model Accuracy: You can observe the Root Mean Squared Error (RMSE) values are almost similar for training & test data which confirms the accuracy of the model without overfitting or underfitting.

    nb1-lstm-accuracy

  • LSTM Model Loss: There's no vanishing gradient descent as the LSTM model with optimal configueration has taken care of the gradient descent problem.

    nb1-lstm-loss

  • LSTM Model Prediction: You can observe the model is able to catch the pattern in the data.

    nb1-lstm-prediction

  • The following CSV files are generated from the notebook:

    • Brussels.csv: This is the dataframe containing the historical COVID-19 cases in Brussels.
    • brussels-actualVsPredicted.csv: This is the dataframe containing the actual and predicted COVID-19 cases in Brussels.
    • brussels-errorEvaluation.csv: This is the dataframe containing the error evaluation of the model.
    • brussels-next7Prediction.csv: This is the dataframe containing the next 7 days prediction of COVID-19 cases in Brussels.
  • These CSV files will be stored to your S3 bucket and Data Assets in your Cloud Pak for Data project.

Note: These CSV files will be used to Visualize the Data in Watson Cognos Dashboard Embedded

You have successfully completed this lab exercise. You can continue to the next lab exercise.

3.3. Notebook 2 : Predict future COVID-19 cases for Wallonia region with Long Short-Term Memory (LSTM) Model

In this lab exercise, you will learn a popular opensource machine learning algorithm, Long Short-Term Memory (LSTM). You will use this time-series algorithm to build a model from historical data of total COVID-19 cases. Then you use the trained model to predict the future COVID-19 cases.

  • You will refer to the Region-Wallonia-LSTM.ipynb notebook.

  • Click on the edit button to open the notebook in edit mode.

    brussels-edit

  • The notebook should look something as shown below.

    notebook-preview

  • You need to add the S3 connection to the notebook.

    • Click on the empty second code cell in the notebook.
    • Click on find and add data button on top right.
    • Click on Connections tab.
    • You will see your connection variable. Click on Insert to code and select pandas DataFrame.
    • Select the ts-brussels-grouped.csv dataset from the connection variable.

    add-data-connection

  • Verify the dataframe name to be data_df_1 in the generated code snippet.

  • Click on Cell and select Run All to run the notebook.

    run-notebook

  • This will run the notebook, it will take some time please be patient.

  • Once the notebook is completed you can observe the following in the notebook:

    • Current Trend of COVID-19 cases in Brussels
    • LSTM Model Accuracy
    • LSTM Model Loss
    • LSTM Model Prediction
  • Current Trend of COVID-19 cases in Brussels: The current trend of COVID-19 cases in Brussels is shown in the graph.

    nb2-current-trend

  • LSTM Model Accuracy: You can observe the Root Mean Squared Error (RMSE) values are almost similar for training & test data which confirms the accuracy of the model without overfitting or underfitting.

    nb2-lstm-accuracy

  • LSTM Model Loss: There's no vanishing gradient descent as the LSTM model with optimal configueration has taken care of the gradient descent problem.

    nb2-lstm-loss

  • LSTM Model Prediction: You can observe the model is able to catch the pattern in the data.

    nb2-lstm-prediction

  • The following CSV files are generated from the notebook:

    • Wallonia.csv: This is the dataframe containing the historical COVID-19 cases in Wallonia.
    • wallonia-actualVsPredicted.csv: This is the dataframe containing the actual and predicted COVID-19 cases in Wallonia.
    • wallonia-errorEvaluation.csv: This is the dataframe containing the error evaluation of the model.
    • wallonia-next7Prediction.csv: This is the dataframe containing the next 7 days prediction of COVID-19 cases in Wallonia.
  • These CSV files will be stored to your S3 bucket and Data Assets in your Cloud Pak for Data project.

Note: These CSV files will be used to Visualize the Data in Watson Cognos Dashboard Embedded

You have successfully completed this lab exercise. You can continue to the next lab exercise.

3.4. Notebook 3 : Risk Index Prediction with Decision Tree

In this lab exercise, you will learn a popular machine learning algorithm, Decision Tree. You will use this classification algorithm to build a model from historical data of region and their total cases. Then you use the trained decision tree to predict the Risk Index of a region.

  • You will refer to the Region-All-Decision-Tree.ipynb notebook.

  • Click on the edit button to open the notebook in edit mode.

    brussels-edit

  • The notebook should look something as shown below.

    notebook-preview

  • Before running the notebook, you need to add the S3 connection to the notebook.

    • Click on the third code cell in the notebook.
    • Click on find and add data button on top right.
    • Click on Connections tab.
    • You will see your connection variable. Click on Insert to code and select pandas DataFrame.
    • Select the RI-data-ML.csv dataset from the connection variable.

    add-data-connection

  • Verify the dataframe name to be data_df_1 in the generated code snippet.

  • Click on Cell and select Run All to run the notebook.

    run-notebook

  • This will run the notebook, it will take some time please be patient.

  • Once the notebook is completed you can observe the following in the notebook:

    • Decision Tree Model Accuracy
    • Decision Tree Visualization
  • Decision Tree Model Accuracy: You can observe the accuracy of the model is 86.63%.

    nb2-current-trend

  • Decision Tree Visualization: You can observe the decision tree in the notebook.

    nb2-lstm-accuracy

You have successfully completed this lab exercise.

Note: Next steps - Visualize the Data in Cognos Embedded Dashboard

4. No Code Approach: Build Prediction Models with IBM Cloud Pak for Data using AutoAI

In this section, you will build and train high-quality predictive models quickly with no-code and simplify AI lifecycle management using Watson Studio's AutoAI. AutoAI automates tasks for data scientists, tasks such as feature engineering and selection, choosing the type of machine learning algorithm; building an analytical model based on the algorithm; hyperparameter optimization, training the model on tested data sets and running the model to generate scores and findings.

  • Navigate to the project, click on Add to project option on top right and select AutoAI experiment as asset type.

  • Create an AutoAI experiment by giving it a name. alt-text

  • Add the data file, click on Select from project and select the RI-data-ML.csv file from the project's data assets. alt-text

  • Click on No for creating a time series forecast. We are building a multi class classifier. Select Risk_Index as the option to predict and hit Run experiment. alt-text

  • It will take a couple of minutes to complete the experiment. You will see Experiment completed on the right side of the canvas. alt-text

  • Review the eight pipelines generated per below. alt-text

  • Click on first pipeline (Rank 1) and choose Save as option on the top right side. alt-text

  • Click on create. alt-text

  • You should see the message Model saved successfully per below. Click on View in project option. alt-text

  • Click on Promote to deployment space. alt-text

  • Under Target space, create a new deployment space. alt-text

  • Give a name to the deployment and hit create. alt-text

  • The deployment space gets created in a minute. alt-text

  • Select the option Go to the model in the space after promoting it and click on Promote. alt-text

  • You will be redirected to the deployment space. alt-text

  • Click on New deployment. Select Deployment type as Online, give a name to the deployment and hit Create. alt-text

  • It will take a couple of minutes for the deployment. The status should be Deployed per below. alt-text

  • Click on model-deploy and you should see the Endpoint and Code Snippets per below. alt-text

  • Lets do some predictions. Click on Test tab and input the data using form or Json format. alt-text

  • Enter the input data using single or multiple samples (Json). We will try with single sample by giving the input to Region as Brussels & 100 as Total_cases and click on Add to list. alt-text

  • You should see the Input list updated with the sample values. Hit Predict to generate predictions. alt-text

  • You can see the predicted value is 1 under Result section which means the risk index is predicted as Low for the input data of Brussels Region with 100 cases on a given day.

You have learnt how to build AI predictive models without any code, deploy the model and generate predictions. Feel free to play around to get comfortable using AutoAI for generating accurate predictions.

5. Visualize the Predictions in IBM Cloud Pak for Data Cognos Embedded Dashboard

In this section, you will learn how to build responsive data visualization in Watson Studio's Cognos Embedded Dashboard. You can build interactive charts, tables, graphs, etc in the Cognos Embedded Dashboard. You will use the data generated in the previous lab exercise to build the following visualizations:

  • Current trends and future prediction of COVID-19 cases region wise.
  • Model evaluation metrices such as actual vs predicted cases and model loss region wise.

The section is divided into following sub-sections:

5.1. Setup Cognos Embedded Dashboard

You need to create a new Dashboard in your Cloud Pak for Data project in order to visualize the analytical results. In this section you will learn how to setup a new Cognos Embedded Dashboard in your project.

Create a new Cognos Embedded Dashboard

  • Before you get started, download the Covid-19-predictions-dashboard.zip dashboard file and extract the zip file.

  • In the Cloud Pak for Data project, click on Add to Project and select asset type as Dashboard.

add-dashboard

  • Select create a new dashboard from Local file.
    • Upload the Covid-19-Predictions-Dashboard.json extracted file.
    • Enter a name for the dashboard.
    • Click on Create.

Relink Data Assets to the Dashboard

  • Once the dashboard is created, you will see a message saying Missing data asset (1/8).

relink

  • To relink the missing data assets, do the following:
    • Click on Relink.
    • Select Data Assets and select the dataset.
    • Link the following data assets:
      • Brussels.csv
      • brussels-next7Prediction.csv
      • wallonia-next7Prediction.csv
      • brussels-actualVsPredicted.csv
      • brussels-errorEvaluation.csv
      • wallonia-actualVsPredicted.csv
      • wallonia-errorEvaluation.csv
      • Wallonia.csv

re-link-data-asset2

Once all the assets are relinked, you will see the dashboard view as shown.

cognos-dashboard

More about the dashboard is explained in the next section.

5.2. Analyze Cognos Embedded Dashboard

There are two tabs in the Dashboard Trends and Model Evaluation.

  • Trends Tab has the following widgets for the Brussels and Wallonia regions:

    • Total Cases: Shows the total number of cases for the Region.
    • Region Map: Shows the map of the Region.
    • Current Trends: Shows the current trends for the Region.
    • 7 Days Prediction: Shows the 7 days prediction for the Region.

cognos-dashboard

  • Model Evaluation Tab has the following widgets for the Brussels and Wallonia regions:

    • Actual vs Predicted: Shows the actual vs predicted values for the model of the perticular Region.
    • Model Loss: Shows the model loss for the model of the perticular Region.

cognos-dashboard

The Dashboard is interactive, you can click on any data point from the dashboard to see the details change in realtime.

Summary

In this code pattern, you learn't how to build time-series and decision tree machine learning models on IBM Cloud Pak for Data Jupyter Notebooks and visualize the results on IBM Cloud Pak for Data Embedded Dashboard on Amazon Web Services (AWS) Cloud with Code Approach. You also learnt how to build models and deploy them with AutoAI under No Code Approach.

License

This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 and the Apache License, Version 2.

Apache License FAQ

About

Build machine learning models with no code in a collaborative data science environment.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published