# DSX Local v1.1.3, Short Hands On Lab - Version 1.0
## IBM Data Science Elite Team 

In this hands-on lab excercise, you will be playing the role of a data scientist at Cognitive Bank. You will use IBM Data Science Experience Local to work through a data science and machine learning use case.  In this use case you will explore the full end-to-end machine learning process and the range of advanced functionality provided by IBM DSX Local such as:
* Open platform environment with support for multiple languages and GUI tools
* Collaboration across projects and teams, integration with Git repositories  
* Deployment and model management capabilities to operationalize machine learning

In this short scenario, you will work with data science assets that have already been made available within DSX.  You will explore historical data related to customer churn using a Jupyter notebook and the Python language (*customer "churn" is when a customer stops doing business with your company*). After visualizing and exploring historical data, you will build two machine learning models that predict a given customer's probability to churn.  One model is built using a programmatic approach, the other is built using a wizard called the Model Builder.  In the programmatic approach you will use the Scala programming language to train, save, and test a model from within a Jupyter notebook.  Using the DSX Local Model Builder, you will train a similar machine learning model without writing any code.  As part of the  development process, you will test and evaluate the performance of the model.  Once satisfied, you will publish the model and use DSX model management features to deploy the model and setup ongoing evaluation of model performance.  These lab exercises will take approximately 45 minutes to complete.

### TBD Update these links:  This document is also available at the following links:
1. [Shared DSX Cloud notebook](https://ibm.co/2mwoMK6)
2. [Shared DSX Cloud notebook, alternate link](https://dataplatform.ibm.com/analytics/notebooks/v2/61cd7b67-0776-401c-b859-a7f8fb049b05/view?access_token=dc3ec7063543479f110c6801421c35a82cabe8a1cb63dcd1840705d33d168874)
3. [Github](https://github.com/mwalli/DSXLAB)

# 1. Preparation steps
## 1.1. __Lab Environment Overview. __
You will be assigned to a specific DSX Local environment and provided a userID for that environment.  Use only the login ID provided to your team and when instructred please pay close attention to how you name projects, models, and other resources so you will be able easily identify your assets.

>__Required Web browser__:  Use either the __Chrome or Firefox__ browser on your personal workstation to complete these lab excercises.

> __Certificate warnings__:  When you initially connect to the DSX Local application, depending on whether you are using Firefox or Chrome you will receive an "insecure connection" or "connection not private" warning due to the untrusted certificate that is used during the DSX Local installation.   You can safely ignore these warnings and proceed to the site: Firefox users click "Advanced" and click "Add Exception", Chrome users click "Advanced" and click "Proceed to URL" 

## 1.2. __ Make note of the URL and credentials provided to your team for accessing the the DSX Local system__
The lab instructors will provide you with the following information.  You will need the following:

1. **DSX Local URL:**

2. **DSX Local username/password:**

> **NOTE:**  **Use only the URLs and login information provided to you by the lab instructors.  These lab excercises will not function properly if multiple people use the same URL and userID.**  Please follow lab procedures exactly as documented.  If you need assistance with the procedures or if you encounter problems or errors with the system, please inform a lab instructor and they will help address the issue.

## 1.3. Login and export the CustomerChurnLabMaster project
In this step you will export an existing project in DSX Local that is shared with your user ID.  You will then create a new project from the file that you exported. 
1. Login to DSX Local as your assigned team user using the URL, username, and password provided to you by the lab instructor.  
2. In the "Recently updated projects" list, click on the shared project called "CustomerChurnLabMaster".  *Note:  Do not perform your lab excercises in this project, you only have "viewer" priveleges for this project.  You will export this project then create a new project from the downloaded export file.*
3. From the Export menu (the 3rd icon from top right of screen), select either "Export as Zip" or "Export as TAR.gz" and save the export file to your local computer.   

## 1.4 Create a new DSX Local project from the export file
All work in DSX is performed within one or more projects.  In this step you will create a new project using the exported project file created in the previous step.  The imported project will already contain 2 Jupyter notebooks and 5 CSV datasets.
1. Return to your "All Projects" list either by clicking "View all Projects" from the the 3 horizontal bar menu at the top-left corner of the screen, or by clicking "Projects" in the navigation link area of the screen.
2. Click "New Project", select "From File" and do the following:
    * In the "Project File" area click "Browse".  
    * In the "File upload" window locate the previously exported file on your computer and click Open
     * In the "Name" field enter "__[USERNAME]___CustomerChurn" (e.g. ATeam_CustomerChurn, BTeam_CustomerChurn, etc)  *Note:  Project names in DSX Local must be unique across the entire cluster, so be sure to give a unique name to your project or you will encounter errors* 
    * Verify you have entered a unique project name and click the "Create" button.  
    * The message "Creating project" will be displayed and a new project will be created from the export file.
<BR>
>Note: As you progress through the lab excercises, you will see an asterisk \* next to your project's name and the message *Changes made -- You have local changes that you can commit* will be displayed. DSXL internally uses git to manage project changes and coordination between collaborators.  You can commit changes if you would like to but it is not necessary for the excercises in this lab.  If you did have other users added as collaborators to your project, they would not see any additions or changes you made until you committed your changes to the project.


# 2. Run the Python customer churn visualization notebook
In this step you will use Jupyter and Python to explore and visualize historical customer churn data.  The visualizations will reveal that customer churn is influenced by factors such as income, age, and state of residence.  When using Jupyter notebooks, DSX Local allows programming in the Python, Scala, or R language.  In this lab section, you will execute a Python visualization notebook that was imported into your project.  As you follow the procedures below, be sure to review the output of each cell as it is executed.
1. From within your new project, click the notebook "Churn Visualization Python".  
2. A "Launching Jupyter" message will be displayed.  __This message may last for a couple minutes, please be patient__
*NOTE:  If after waiting only a blank screen is displayed or if you receive an "NGINX Timeout" error,  open the notebook again by clicking on your project's name and then clicking on the notebook again*
5. Once the notebook opens, you will see a series of cells.  The notebook has not yet been executed.  Perform the following steps in order:
    1. Click on and run the first cell.  To do this, click inside the first cell at the top of the notebook and then click the "Run" icon in the toolbar (__>|__).  Within the brackets to the left of the cell, you should see an \* appear and then change to the number "1".  This indicates the cell was execited successfully.
    2. Now click in the 2nd cell where you will see a comment about the "churn_rate_visualizations.csv" dataset.  The code below this comment was automatically inserted using the "Insert to code" function in DSX.  The automatically generated code creates a pandas dataframe and displays the first 5 lines.  After reviewing the code, execute the 2nd cell by clicking the "Run" icon in the toolbar (__>|__).  You should see the tabular output of the dataframe.head() statement displayed in the cell output.  
    3. Click in the 3rd cell, where the Brunel visualization library is imported.  At this point you will execute this and all remaining cells in the notebook.  To do this, while keeping the 3rd cell selected go to the "Cell" dropdown on the menu bar and select "Run All Below" (this runs the selected cell and all cells below it).  All remaining cells will display "\*" and begin executing sequentially.
    4.  Scroll down through the notebook to ensure that all remaining visualizations of the customer churn data ran and are displayed.
    
6. When the notebook has executed successfully, review the interactive Brunel dashboards.  Visualizations can be zoomed in/out using the middle mouse button while your mouse is hovering over the visualization.  Charts can be filtered by changing dropdown menu options and clicking on specific columns of interest.
7. Save the notebook by selecting `File -> Save and Checkpoint`.  The message "Checkpoint created" will appear in the toolbar (the message will disappear quickly).
7. Close the notebook by clicking on your project's name.
8. To conserve resources, before continuing, __Stop the Kernel__ that was started for your notebook by selecting "Stop Kernel" from the 3-dot menu to the right of the notebook name.

# 3. Run the Scala customer churn notebook 
In this section you will programmatically create a machine learning model using the Scala language.  The model will predict the probability that a given customer will churn from the business.  You will open the Jupyter Scala notebook that was created when you imported the project  and incrementally execute the cells in the notebook.  This procedure demonstrates using a programmatic approach to train, evaluate, save, and test a machine learning model from within a Jupyter notebook.   As you follow the procedures below, be sure to review the output of each cell as it is executed.  

1. From within your project, click the "Churn ML Training Notebook Scala LR" notebook to open it. A "Launching Jupyter" message will be displayed.  __This message may last for a couple minutes, please be patient__
2. Once the notebook opens, you will see a series of cells.  The notebook has not yet been executed.  Perform the following steps in order:
    1. Incrementally run all cells in the notebook.  Do this by first clicking inside the first code cell (where the %AddJar magic downloads the Brunel library) then click the "Run" icon in the toolbar (__>|__).  Within the brackets to the left of the cell, you should see an \* appear and then change to the number "1".  This indicates the cell was run successfully.  Repeat for all cells in the notebook, reviewing the code and output after each cell executes.  Pay special attention to the following cells near the end of the notebook:
    * "Save locally: Save trained model to DSX Local Project"  - This cell saves the new trained model into your DSX project.  Each time a model is saved with the same name, a new version of the model is stored in your project.  The cell outputs the response from the system after the model is saved.
    * "Test Locally: Test model in DSX Local Project" - This cell tests the model by invoking the unpublished model directly in your project using the DSX Local API.  This cell outputs both the scoringURL used to invoke the model and the response from invoking the model with test values.
3. Once you have successfully run all cells in the model, perform the following steps to execute the notebook again so version 2 of the model is created:
    1. First clear all of the output from the first execution of the notebook by selecting "Cell" -> "All Output" -> "Clear" from the menubar.
    2. Re-run the entire notebook by selecting "Cell" -> "Run All" from the menubar.  Monitor progress of notebook execution and verify that it completes.  Be sure to notice that the output from the save operation shows that version 2 of the model has now been created (each version is stored in a separate directory in the filesystem, e.g. Bteam_CustomerChurn/models/BankingChurnMLNotebookModelLR/**2**)
4. Save the notebook by selecting `File -> Save and Checkpoint`.  The message "Checkpoint created" will appear in the toolbar.
5. Close the notebook by clicking on your project's name.
6. To conserve resources, before continuing, __Stop the Kernel__ that was started for your notebook by selecting "Stop Kernel" from the 3-dot menu to the right of the notebook name.
7. This notebook programmatically trained, saved, and tested a machine learning model.  Click on your project name, then click on Models in the navigation bar.  You will see the model that was created programmatically and that it has 2 versions.  Click on the model to view its details.  Within the model overview tab, click on the version number icon to view the model version history (you can switch which version is open in the project by clicking on a specific version). 

# 4. Use model development features to test and evaluate your new model prior to publishing.

In this lab section, you will use the in-project model development featues to interactively test and evaluate your new model.  After testing an evaluating the model, you will publish the model so it can be deployed and managed in production.   

### Test online invocation of the LR model.
1. Click on the "Test" tab of your new model and scroll down so the Input and Result sections are visible.
2. Enter the following values in the Input form then click "Submit".  **Note:  Text fields are CASE SENSITIVE.** A pie chart will be displayed with the model's prediction for this customer's probability to churn (1=churn, 0=notchurn).
    1. Age: 30
    2. Activity:  1
    3. Education: **D**octorate  (*Case sensitive*)
    4. Gender: F (*Case sensitive*)
    5. State: NY (*Case sensitive*)
    6. Negtweets: 2
    7. Income: 150000

4. Increase the number of Negtweets to 8 and click "Submit" to invoke the model again.  The probability that the customer will churn should increase.

From the 3-Horizontal Line menu (aka "Hamburger" menu at the top left of DSXL UI), select "Model Management"
2. Click the Deployments link to see all deployments
3. Click the deployment of the model that __your team__ created with the DSX Model Builder
4. Scroll to the bottom of the deployment details screen and click "Schedule Evaluation"
5. On the "Schedule Evaluation" screen, do the following
    1. Choose "BinaryClassiferEvaluator" from the Evaluator dropdown menu
    2. Check the "Use performance metrics to monitor this model" checkbox
    3. Keep the the radio button for "areaUnderROC" selected and accept the default of .7 for "Notify when less than"
    4. In the "Schedule" section, click on the "Starts at" selection and slide both sliders (below the calendar) all the way to the left (this will schedule evaluation to take place 10 minutes from the current time)
    5. Select "Every Day" as the Repeat option
    6. In the "Remote Data Sets" section, select "cust_summary_visbuilder_training" as the evaluation data set and then click "__Schedule__"
    > Note:  Normally the evaluation data set would contain updated data.  In this case we are evaluating the model using the same data that was used for training.
6. In 10-12 minutes (shortly after the scheduled evaluation time) you will see the result of the model evaluation displayed in the Model Management UI.  
    > The completed deployment evaluation should show a green checkmark (indicating success) on the Dashboard tab of the Model Management UI.  The list of all deployment evaluations for a deployed model are visible at the bottom of the deployment details window. 

# 5. Train, Publish, and Deploy an ML model using the DSX Local Model Builder

In this lab section, you will use the DSX Local Model Builder GUI to train, publish, and deploy an ML model using a data file from HDP HDFS.  During this process you will train, evaluate, deploy, and test a new machine learning model for predicting the probability that a given customer will churn.  This process is similar to the programmatic approach you just completed using Scala, but this time you will use the DSX Local Model Builder which allows creation of machine learning models without writing code.

1. Click on your project name, click "Assets", then either scroll down to "Models" or click the "Models" link along the top.
2. Click "Add model" next to the __+__ sign
3. In the "Name" field, enter "*TEAMNAME* Wizard Churn Model" *(replacing TEAMNAME with your team's name - ateam, bteam, etc) * **Be sure to give the model a unique name that you will recognize later.**
4. For the "Method" selection, click "__Manual__", the click "Create"
5. On the "Select data asset" screen, click the link for the "__cust_summary_visbuilder_training__" remote data set, a preview of the data set will be displayed.  Click "__Use this data__".  *(alternatively, you can simply click the radio button next to the data set and click next to skip the data preview)*
> NOTE:  The Model Builder may display the message "Loading data" for up to several minutes, please be patient.  Notify a lab instructor if after several minutes the system is still displaying the "Loading data" message.
6. On the “Prepare data set” screen, note the default selected transformer “Auto Data Preparation” on the right. This is the transformer that we will be using. Click “Add a transformer” on the top-right of the sceen to note other available transformers. After reviewing the list of transformers, dismiss the dialog and then click Next.
7. On the "Select a technique" screen, do the following:
    1. In the "Column value to predict" dropdown, select the "CHURN" column. (The goal of the model is to predict whether or not a customer will churn)
    2. Click "Binary Classification" as the technique.  (It should already be selected as the suggested technique.  )
    3. Accept the default split shown in the sliders for Train/Test/Holdout 
    4. Click “Add Estimators” on the top-right of the screen.
    5. Select "Logistic Regression" and click "Add"  
        > Note: Choose **only one** Estimator (Logistic Regression) for this excercise
    6. Click Next
8. A "Training models" status message will appear - this can take some time, be patient.
9. When model training has completed, review the metrics for the LR model (such as AREA UNDER ROC CURVE).  The radio button next to "Logistic Regression" should already be selected.  Click "Save" to save the new model, and click "Save" again in the confirmation dialog.  
    > A "Saving model" status message will appear, then you will be returned to the assets list in the project. *Remember the name of the model you just created, you will need it in the following steps.*
10. DSX Local supports hybrid ML scenarios:  In this step you will review the option to publish a model to the Bluemix WML service. *(We will not actually publish to WML)*
    * Do this by clicking the 3-dot menu to the right of your new ML model and selecting "Publish Model".  Click "Cancel" after reviewing the dialog options.
    > If you were publishing to the WML service, you would paste the username/password credentials (long alphanumeric GUIDs) into this dialog.  The model would be published (saved) to your WML service within the IBM cloud.  
    
11. Deploy your new model to the DSX Local ML service  
    1. Click on your project name, click on "Assets", click on "Models", and locate the model you just created and saved using Model Builder. 
    1. Select "Deploy" from the 3-dot menu to the right of your saved model.  *Be sure to deploy the right model - choose the model you created with Model Builder.*
    2. In the "Create Deployment" dialog box, name your deployment "__Deployed *TEAMNAME* Churn Model__" (using your team's name)
    3. Select "Online" from the "Type" dropdown menu and click "Create".  
    4. When the deployment is complete, you will be taken to the "Deployments" section of the DSX Local Model Management UI.
     > The model you just created and deployed with the Model Builder is a different model than the one you created and deployed earlier using a programmatic approach in the Scala notebook.
    
12. Test your deployed model using the Test API feature in DSX Local
    1. From the "Deployments" list in the Model Management UI, Click the deployment name for the model you created with Model Builder.
    2. Review the details for the deployment, then click the "Test API" button on the far top right of the screen.
    3. On the Test API screen, keep the the default test values and click "Predict".  The model's prediction for this customer to churn appears, along with a pie chart representation of the probabilities.
    4. Modify the NEGTWEETS value in the "Input Data" section (scroll down).  Increase NEGTWEETS to 10 and click "Predict" again.  Note how the predicted churn value for this customer has changed.  
13. When finished, click "Close" to exit the Test API screen

# 6. Use Model Management Features to schedule evaluation of deployed ML model

In this lab section, you will use the model management featues in DSX Local to schedule periodic evaluation of your deployed ML model.  When machine learning models are put into production, DSX model management features provide ongoing evaluation to ensure acceptible performance of the model. 

1. From the 3-Horizontal Line menu (aka "Hamburger" menu at the top left of DSXL UI), select "Model Management"
2. Click the Deployments link to see all deployments
3. Click the deployment of the model that __your team__ created with the DSX Model Builder
4. Scroll to the bottom of the deployment details screen and click "Schedule Evaluation"
5. On the "Schedule Evaluation" screen, do the following
    1. Choose "BinaryClassiferEvaluator" from the Evaluator dropdown menu
    2. Check the "Use performance metrics to monitor this model" checkbox
    3. Keep the the radio button for "areaUnderROC" selected and accept the default of .7 for "Notify when less than"
    4. In the "Schedule" section, click on the "Starts at" selection and slide both sliders (below the calendar) all the way to the left (this will schedule evaluation to take place 10 minutes from the current time)
    5. Select "Every Day" as the Repeat option
    6. In the "Remote Data Sets" section, select "cust_summary_visbuilder_training" as the evaluation data set and then click "__Schedule__"
    > Note:  Normally the evaluation data set would contain updated data.  In this case we are evaluating the model using the same data that was used for training.
6. In 10-12 minutes (shortly after the scheduled evaluation time) you will see the result of the model evaluation displayed in the Model Management UI.  
    > The completed deployment evaluation should show a green checkmark (indicating success) on the Dashboard tab of the Model Management UI.  The list of all deployment evaluations for a deployed model are visible at the bottom of the deployment details window. 

****
### *This DSX Local Hands On Lab and associated Skytap environment was created by the IBM Data Science Elite Team*
