# DSX Local v1.1.3, Short Hands On Lab - Version 1.2
## IBM Data Science Elite Team 

In this hands-on lab excercise, you will be playing the role of a data scientist at Cognitive Bank. You will use IBM Data Science Experience Local to work through a data science and machine learning use case.  In this use case you will explore the full end-to-end machine learning process and the range of advanced functionality provided by IBM DSX Local such as:
* Open platform environment with support for multiple languages and GUI tools
* Collaboration across projects and teams, integration with Git repositories  
* Deployment and model management capabilities to operationalize machine learning

In this short scenario, you will work with data science assets that have already been made available within DSX.  You will explore historical data related to customer churn using a Jupyter notebook and the Python language (*customer "churn" is when a customer stops doing business with your company*). After visualizing and exploring historical data, you will build two machine learning models that predict a given customer's probability to churn.  One model is built using a programmatic approach, the other is built using a wizard called the Model Builder.  In the programmatic approach you will use the Scala programming language to train, save, and test a model from within a Jupyter notebook.  Using the DSX Local Model Builder, you will train a similar machine learning model without writing any code.  As part of the  development process, you will test and evaluate the performance of the model.  Once satisfied, you will publish the model and use DSX model management features to deploy the model and setup ongoing evaluation of model performance.  These lab exercises will take approximately 45 minutes to complete.

###  This document is also available at the following links:
1. [Shared DSX Cloud notebook](https://ibm.co/2EMneTu)
2. [Shared DSX Cloud notebook, alternate link](https://dataplatform.ibm.com/analytics/notebooks/v2/3c695152-191d-42cd-b533-d6355490287c/view?access_token=3b5c448fe9dcbae971dd6d65ae784facb957920efd240d87042dc3d6bc32b640)
3. [Github](https://github.com/mwalli/DSXLAB)

# 1. Preparation steps
## 1.1. __Lab Environment Overview. __
You will be assigned to a specific DSX Local environment and provided a userID for that environment.  Use only the login ID provided to your team and when instructred please pay close attention to how you name projects, models, and other resources so you will be able easily identify your assets.

>__Required Web browser__:  Use either the __Chrome or Firefox__ browser on your personal workstation to complete these lab excercises.

> __Certificate warnings__:  When you initially connect to the DSX Local application, depending on whether you are using Firefox or Chrome you will receive an "insecure connection" or "connection not private" warning due to the untrusted certificate that is used during the DSX Local installation.   You can safely ignore these warnings and proceed to the site: Firefox users click "Advanced" and click "Add Exception", Chrome users click "Advanced" and click "Proceed to URL" 

## 1.2. __ Make note of the URL and credentials provided to your team for accessing the the DSX Local system__
The lab instructors will provide you with the following information.  You will need the following:

1. **DSX Local URL:**

2. **DSX Local username/password:**

> **NOTE:**  **Use only the URLs and login information provided to you by the lab instructors.  These lab excercises will not function properly if multiple people use the same URL and userID.**  Please follow lab procedures exactly as documented.  If you need assistance with the procedures or if you encounter problems or errors with the system, please inform a lab instructor and they will help address the issue.

## 1.3. Login and export the CustomerChurnLabMaster project
In this step you will export an existing project in DSX Local that is shared with your user ID.  You will then create a new project from the file that you exported. 
1. Login to DSX Local as your assigned team user using the URL, username, and password provided to you by the lab instructor.  
2. In the "Recently updated projects" list, click on the shared project called "CustomerChurnLabMaster".  *Note:  Do not perform your lab excercises in this project, you only have "viewer" priveleges for this project.  You will export this project then create a new project from the downloaded export file.*
3. From the Export menu (the 3rd icon from top right of screen), select either "Export as Zip" or "Export as TAR.gz" and save the export file to your local computer.   

## 1.4 Create a new DSX Local project from the export file
All work in DSX is performed within one or more projects.  In this step you will create a new project using the exported project file created in the previous step.  The imported project will already contain 2 Jupyter notebooks and 5 CSV datasets.
1. Return to your "All Projects" list either by clicking "View all Projects" from the the 3 horizontal bar menu at the top-left corner of the screen, or by clicking "Projects" in the navigation link area of the screen.
2. Click "New Project", select "From File" and do the following:
    * In the "Project File" area click "Browse".  
    * In the "File upload" window locate the previously exported file on your computer and click Open
     * In the "Name" field enter "__[USERNAME]___CustomerChurn" (e.g. ATeam_CustomerChurn, BTeam_CustomerChurn, etc)  *Note:  Project names in DSX Local must be unique across the entire cluster, so be sure to give a unique name to your project or you will encounter errors* 
    * Verify you have entered a unique project name and click the "Create" button.  
    * The message "Creating project" will be displayed and a new project will be created from the export file.
<BR>
>Note: As you progress through the lab excercises, you will see an asterisk \* next to your project's name and the message *Changes made -- You have local changes that you can commit* will be displayed. DSXL internally uses git to manage project changes and coordination between collaborators.  You can commit changes if you would like to but it is not necessary for the excercises in this lab.  If you did have other users added as collaborators to your project, they would not see any additions or changes you made until you committed your changes to the project.


# 2. Run the Python customer churn visualization notebook
In this step you will use Jupyter and Python to explore and visualize historical customer churn data.  The visualizations will reveal that customer churn is influenced by factors such as income, age, and state of residence.  When using Jupyter notebooks, DSX Local allows programming in the Python, Scala, or R language.  In this lab section, you will execute a Python visualization notebook that was imported into your project.  As you follow the procedures below, be sure to review the output of each cell as it is executed.

### 2.1 Execute the cells in the visualization notebook
1. From within your new project, click the notebook "Churn Visualization Python".  
2. A "Launching Jupyter" message will be displayed.  __This message may last for a couple minutes, please be patient__
*NOTE:  If after waiting only a blank screen is displayed or if you receive an "NGINX Timeout" error,  open the notebook again by clicking on your project's name and then clicking on the notebook again*
3. Once the notebook opens, you will see a series of cells.  The notebook has not yet been executed.  Perform the following steps in order:
    1. Click on and run the first cell.  To do this, click inside the first cell at the top of the notebook and then click the "Run" icon in the toolbar (__>|__).  Within the brackets to the left of the cell, you should see an \* appear and then change to the number "1".  This indicates the cell was execited successfully.
    2. Now click in the 2nd cell where you will see a comment about the "churn_rate_visualizations.csv" dataset.  The code below this comment was automatically inserted using the "Insert to code" function in DSX.  The automatically generated code creates a pandas dataframe and displays the first 5 lines.  After reviewing the code, execute the 2nd cell by clicking the "Run" icon in the toolbar (__>|__).  You should see the tabular output of the dataframe.head() statement displayed in the cell output.  
    3. Click in the 3rd cell, where the Brunel visualization library is imported.  At this point you will execute this and all remaining cells in the notebook.  To do this, while keeping the 3rd cell selected go to the "Cell" dropdown on the menu bar and select "Run All Below" (this runs the selected cell and all cells below it).  All remaining cells will display "\*" and begin executing sequentially.
    4.  Scroll down through the notebook to ensure that all remaining visualizations of the customer churn data ran and are displayed.

### 2.2 Review the cell output and explore Brunel visualizations
1. When the notebook has executed successfully, review the interactive Brunel dashboards.  Visualizations can be zoomed in/out using the middle mouse button while your mouse is hovering over the visualization.  Charts can be filtered by changing dropdown menu options and clicking on specific columns of interest.

### 2.3 Save the notebook, stop the kernel
1. Save the notebook by selecting `File -> Save and Checkpoint`.  The message "Checkpoint created" will appear in the toolbar (the message will disappear quickly).
2. Close the notebook by clicking on your project's name.
3. To conserve resources, before continuing, __Stop the Kernel__ that was started for your notebook by selecting "Stop Kernel" from the 3-dot menu to the right of the notebook name.

# 3. Run the Scala customer churn notebook 
In this section you will programmatically create a machine learning model using the Scala language.  The model will predict the probability that a given customer will churn from the business.  You will open the Jupyter Scala notebook that was created when you imported the project  and incrementally execute the cells in the notebook.  This procedure demonstrates using a programmatic approach to train, evaluate, save, and test a machine learning model from within a Jupyter notebook.   As you follow the procedures below, be sure to review the output of each cell as it is executed.  

### 3.1 - Execute the notebook, 1 cell at a time
1. From within your project, click the "Churn ML Training Notebook Scala LR" notebook to open it. A "Launching Jupyter" message will be displayed.  __This message may last for a couple minutes, please be patient__
2. Once the notebook opens, you will see a series of cells.  The notebook has not yet been executed.  Perform the following steps in order:
    1. Incrementally run all cells in the notebook.  Do this by first clicking inside the first code cell (where the %AddJar magic downloads the Brunel library) then click the "Run" icon in the toolbar (__>|__).  Within the brackets to the left of the cell, you should see an \* appear and then change to the number "1".  This indicates the cell was run successfully.  Repeat for all cells in the notebook, reviewing the code and output after each cell executes.  Pay special attention to the following cells near the end of the notebook:
    * "Save locally: Save trained model to DSX Local Project"  - This cell saves the new trained model into your DSX project.  Each time a model is saved with the same name, a new version of the model is stored in your project.  The cell outputs the response from the system after the model is saved.
    * "Test Locally: Test model in DSX Local Project" - This cell tests the model by invoking the unpublished model directly in your project using the DSX Local API.  This cell outputs both the scoringURL used to invoke the model and the response from invoking the model with test values.

### 3.2 - Execute the notebook again to create a new version, using "Run All"    
1. Once you have successfully run all cells in the model, perform the following steps to execute the notebook again so version 2 of the model is created:
    1. First clear all of the output from the first execution of the notebook by selecting "Cell" -> "All Output" -> "Clear" from the menubar.
    2. Re-run the entire notebook by selecting "Cell" -> "Run All" from the menubar.  Monitor progress of notebook execution and verify that it completes.  Be sure to notice that the output from the save operation shows that version 2 of the model has now been created (each version is stored in a separate directory in the filesystem, e.g. Bteam_CustomerChurn/models/BankingChurnMLNotebookModelLR/**2**)

### 3.3 - Save the notebook and stop the kernel    
1. Save the notebook by selecting `File -> Save and Checkpoint`.  The message "Checkpoint created" will appear in the toolbar.
2. Close the notebook by clicking on your project's name.
3. To conserve resources, before continuing, __Stop the Kernel__ that was started for your notebook by selecting "Stop Kernel" from the 3-dot menu to the right of the notebook name.
4. This notebook programmatically trained, saved, and tested a machine learning model.  Click on your project name, then click on Models in the navigation bar.  You will see the model that was created programmatically and that it has 2 versions.  Click on the model to view its details.  Within the model overview tab, click on the version number icon to view the model version history (you can switch which version is open in the project by clicking on a specific version). 

# 4. Use model development features to test and evaluate your new model prior to publishing.  Use DSX Model Management to deploy your published model into production.

In this lab section, you will use the in-project model development featues to interactively test and evaluate your new model.  After testing an evaluating the model, you will publish the model so it can be deployed and managed in production. After publishing you will use the DSX Model Management interface to deploy your model into production.  

### 4.1 Test online invocation of the model.
Interactively invoke and test your model and manually review the output.
1. Click on the "Test" tab of your new model and scroll down so the Input and Result sections are visible.
2. Enter the following values in the Input form then click "Submit".  **Note:  Text fields are CASE SENSITIVE.** A pie chart will be displayed with the model's prediction for this customer's probability to churn (1=churn, 0=notchurn).
    1. Age: 30
    2. Activity:  1
    3. Education: **D**octorate  (*Case sensitive*)
    4. Gender: F (*Case sensitive*)
    5. State: NY (*Case sensitive*)
    6. Negtweets: 2
    7. Income: 150000

3. Increase the number of Negtweets to 8 and click "Submit" to invoke the model again.  The probability that the customer will churn should increase.

### 4.2 Generate and run a script to evaluate model performance using an input dataset.
Generate a model evaluation script and run it as a job.  The script calculates and records the performance of the model relative to configurable metrics using a specified input file.  
1. Click on the "Overview" tab of your model.  Scroll down and notice that there are no entries listed in the "Evalution results" section.
2. Now scroll back up and click on the "Evaluate" tab of your model.  Scroll down until the "Schedule evaluation script inputs" section is visible.
3. Select the following as evaluation script inputs:
    1. Input data set:  cust_summary_notebook_evaluation.csv
    2. Evaluator:  Binary
    3. Threshold Metric: Accuracy Score
    4. Threshold: Min=.75 Mid=.90
4. Click "Generate evaluation script" - The "Result" section shows the python script that was generated to evaluate your model.  Review the python script.  *Note:  The generated evaluation script will often need to be modified to function correctly for a particular model. In this excercise, the script does not need to be modified.*
5. Click "Advanced Settings".  Note the settings that can be modified for the job such as whether the script will execute as a python script or as a jupyter notebook and whether the job should run on-demand or on a scheduled basis.  Click "Cancel" to exit advanced settings.
6. Click "Run now" - the details for the job will be displayed, scroll to the bottom until the "Runs" section is visible
7. Click on the Run ID to see details for the Job Run.  Scroll down to the "Logs tail" section to see the output of the job run.
8. Wait for the Job run to show a Result status of "Success"
9. Navigate back to the overview tab of your model within your project (click Jobs, click project name, click Assets, click Models)
10. Scroll to the "Evaluation results" section and review the results of your model evaluation.


### 4.3 Publish version 2 of your model.
Publish your model so that it can then be deployed and so other collaborators in your project can see it (you won't have any collaborators in this excercise)
1. Click on the "Publish" tab of your model
2. Notice that it is version 2 (/2) that you are publishing.
3. For "Published name" enter a unique name that includes your login name and that you will easily recognize, e.g. "__P\_[USERNAME]_BankingChurnMLNotebookModelLR__"
4. Scroll down and note the "Published model visibility" selection.  Only collaborators in the selected project will be able to see your published model in the DSX Model Management tooling.  In this example you only own 1 project so accept the default selection.
5. Click Publish - "Publishing model" will be displayed and you will be returned to your model's Overview tab.

### 4.4 Deploy your published model using Model Management
Now that your model has been published, it can be deployed.  Deployment creates an external REST endpoint that can be used to invoke the model.
1. From the hamburger menu (3 horizontal lines at top left) click "Model Management"
2. On the Model Management Dashboard you will see 3 example evaluations that were created as samples for this excercise. *NOTE: The dashboard only displays model evaluation results for evaluations performed on deployed models.  In-project evaluations on unpublished models (like what you performed in the prior step), are not displayed in Model Management.*
3. Click on the "Models" tab.
4. You will see your published model with your username as publisher as well as two sample models published by the admin user.
5. From the 3-dot menu to the right of your published model, select "Deploy"
6. For "Name", enter a unique name that includes your login name and that you will easily recognize, e.g. "__D\_[USERNAME]_BankingChurnMLNotebookModelLR__" 
7. For "Type" select "Online"
8. Click "Create".  
9. Your new model is now deployed.  Review the "External Scoring Endpoint" and other provided information.  The "Deployment Token" is used as an authentication mechanism when programmatically invoking the model.  
10. Optional "extra credit" steps (time allowing):  
    1. Use the deployed model's "Test API" feature to test the external model endpoint
    2. Schedule an evaluation of your deployed model using the "Schedule evaluation" option shown at the bottom of the deployed model screen.  When performing this task follow the same procedure and use the same options that you used when performing in-project model evaluation.


# 5. Train and save an ML model using the DSX Local Model Builder

In this section, you will use the DSX Local Model Builder GUI to train and save an ML model using a CSV file.  This process is similar to the programmatic approach you just completed using Scala, but with the DSX Local Model Builder allows creation of machine learning models without writing code.

1. From the hamburger menu, click on your project name, click "Assets", then either scroll down to "Models" and click "View all" or click the "Models" link along the top.
2. Click "Add model" next to the __+__ sign
3. In the "Name" field, enter a unique name that includes your login name and that you will easily recognize, e.g. "__[USERNAME]\_ModelBuilderModel__" **Be sure to give the model a unique name that includes your username and that you will easily recognize later.**
4. Scroll down and for "Method" select "__Manual__"
5. Click "Create"
6. On the "Select data asset" screen, click the link for the "__cust_summary_visbuilder_training.csv__" data set.  A preview of the data set will be displayed.  Click "__Use this data__".  *(alternatively, you can simply click the radio button next to the data set and click next to skip the data preview)*
> NOTE:  The Model Builder may display the message "Loading data" for up to several minutes, please be patient.  
6. On the “Prepare data set” screen, note the default selected transformer “Auto Data Preparation” on the right. This is the transformer that we will be using. Click “Add a transformer” on the top-right of the sceen to note other available transformers. After reviewing the list of transformers, dismiss the dialog and then click Next.
7. On the "Select a technique" screen, do the following:
    1. In the "Column value to predict" dropdown, select the "CHURN" column. (The goal of the model is to predict whether or not a customer will churn)
    2. Click "Binary Classification" as the technique.  (It should already be selected as the suggested technique.  )
    3. Accept the default split shown in the sliders for Train/Test/Holdout 
    4. Click “Add Estimators” on the top-right of the screen.
    5. Select "Logistic Regression" and "Gradient Boosted Tree Classifier" and Click "Add" 
    6. Click Next
8. A "Training models" status message will appear - this can take several minutes, be patient.
9. When model training has completed, review the metrics for the LR and GBTC estimators (such as AREA UNDER ROC CURVE).  In this scenario we want to use Logistic Regression so choose the "Delete" option from the 3-dot menu to the right of the Gradient Boosted Tree Classifier to remove it.  Click "Save" to save the new model, and click "Save" again in the confirmation dialog.  
    > A "Saving model" status message will appear, then you will be returned to the assets list in the project.
10. DSX Local supports hybrid ML scenarios:  In this step you will review the option to publish a model to the IBM Cloud WML service. *(We will not actually publish to WML)*
    * Do this by clicking the 3-dot menu to the right of either of your ML models and selecting "Publish to IBM Cloud".  Click "Cancel" after reviewing the dialog options.
    > If you were publishing to the WML service, you would paste the username/password credentials (long alphanumeric GUIDs) into this dialog.  The model would be published (saved) to your WML service within the IBM cloud.  

****
### *This DSX Local Hands On Lab and associated DSX Local lab environments were created by the IBM Data Science Elite Team*
