<a href="https://colab.research.google.com/github/24057080-kiit/capstone-projects-using-machine-learning-techniques/blob/main/2024_06_12_SourabhKumarBehera_CapstoneProject19.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Capstone Project 19: Early Diabetes Prediction

#### Goal of the Project

This project is designed for you to practice and solve the activities that are based on the concepts covered in the lessons:

  - Multipage Streamlit App I

  - Multipage Streamlit App II

  - Decision Trees I to Decision Trees IV


---

### Context

Diabetes is a chronic (long-lasting) health condition that affects how your body turns food into energy.

Most of the food you eat is broken down into sugar (also called glucose) and released into your bloodstream. When your blood sugar goes up, it signals your pancreas to release insulin. Insulin acts like a key to let the blood sugar into your body’s cells for use as energy.

If you have diabetes, your body either doesn’t make enough insulin or can’t use the insulin it makes as well as it should. When there isn’t enough insulin or cells stop responding to insulin, too much blood sugar stays in your bloodstream. Over time, that can cause serious health problems, such as heart disease, vision loss, and kidney disease.

There isn’t a cure yet for diabetes, but losing weight, eating healthy food, and being active can really help in reducing the impact of diabetes.


---

#### Getting Started

Follow the steps described below to solve the project:

1. Click on the link provided below to open the Colab file for this project.
   
   https://colab.research.google.com/drive/1YTyQ05Cki451s-mJh9j2fhcfIrSPvDiJ

2. Create the duplicate copy of the Colab file. Here are the steps to create the duplicate copy:

    - Click on the **File** menu. A new drop-down list will appear.

      <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/0_file_menu.png' width=500>

    - Click on the **Save a copy in Drive** option. A duplicate copy will get created. It will open up in the new tab on your web browser.

      <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/1_create_colab_duplicate_copy.png' width=500>

     - After creating the duplicate copy of the notebook, please rename it in the **YYYY-MM-DD_StudentName_CapstoneProject19** format.

3. Now, write your code in the prescribed code cells.

---

#### Problem Statement

In this project, you are going to create a Multipage Early Diabetes Prediction Web app using the Streamlit framework.

This web app will do the following:

- Predicts whether a person has diabetes or is prone to get diabetes in future by analysing the values of several features using the Decision Tree Classifier.

- Displays the correlation heatmap, confusion matrix plot and a decision tree plot.



---

### Dataset Description

This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset.

The dataset includes 768 instances with 7 features and 1 target column (`Outcome`) which can be briefed as:

|Field|Description|
|---:|:---|
|Pregnancies|Number of times pregnant|
|Glucose|Plasma glucose concentration in an oral glucose tolerance test|
|BloodPressure|Diastolic blood pressure (mm Hg)|
|SkinThickness|Triceps skin fold thickness (mm)|
|Insulin|2-Hour serum insulin (mu U/ml)|
|BMI|Body mass index (weight in $kg$ and height in $m^2$)|
|DiabetesPedigreeFunction|A function which scores likelihood of diabetes based on family history|
|Age|Age of the person|
|Outcome|0 - The person does not have diabetes|
||1 - The person has diabetes|


**Dataset Credits:** https://www.kaggle.com/uciml/pima-indians-diabetes-database


---

### Things to do

1. Design the **Home page** of the Web app to display the Dataset information.
  
2. Create Decision Tree classifier and Optimised Decision Tree using the `GridSearchCV` to predict diabetes based on the values of features.

3. Design the **Prediction page** of the Web app that allow users to select the values of the features and display the prediction results on a click of the `Predict` button.

4. Design the **Visualise Decision tree** page that allow users to

 - Get the top correlated features using a correlation heatmap.

 - Display the confusion matrix plot to get the performance of the classifier.

 - Visualise the classifier with the help of an actual decision tree plot.  



---

#### Activity 1: Creating Python File for Diabetes Prediction Web App

In this activity, you have to create the `diabetes_main.py` file in Sublime editor and save it in the `Python_scripts` folder.

Copy the code given below in the `diabetes_main.py` file. You are already aware of this code which creates a function that will load the data from the csv file.

**Dataset Download Link:** https://s3-whjr-curriculum-uploads.whjr.online/b510b80d-2fd6-4c08-bfdf-2a24f733551d.csv

**Note:** Do not run the code shown below. It will throw an error.


In [None]:
# Code for 'diabetes_main.py' file.

# Importing the necessary Python modules.
import streamlit as st
import numpy as np
import pandas as pd

# Configure your home page by setting its title and icon that will be displayed in a browser tab.
st.set_page_config(page_title = 'Early Diabetes Prediction Web App',
                    page_icon = 'random',
                    layout = 'wide',
                    initial_sidebar_state = 'auto'
                    )

# Loading the dataset.
@st.cache()
def load_data():
    # Load the Diabetes dataset into DataFrame.

    df = pd.read_csv('https://s3-whjr-curriculum-uploads.whjr.online/b510b80d-2fd6-4c08-bfdf-2a24f733551d.csv')
    df.head()

    # Rename the column names in the DataFrame.
    df.rename(columns = {"BloodPressure": "Blood_Pressure",}, inplace = True)
    df.rename(columns = {"SkinThickness": "Skin_Thickness",}, inplace = True)
    df.rename(columns = {"DiabetesPedigreeFunction": "Pedigree_Function",}, inplace = True)

    df.head()

    return df

diabetes_df = load_data()

---

#### Activity 2: Page Navigator

In this activity, you need to add radio button widgets to navigate through the **Home**, **Predict Diabetes** and **Visualise Decision Tree** web pages in the web app as shown in the image below:

<img src="https://i.imgur.com/1KMYS5M.png">

You need to create three empty Python files that are `diabetes_home.py`, `diabetes_predict` and `diabetes_plots.py` inside the same folder that contains the `diabetes_main.py`.

- When a user selects the `Home` option, the `diabetes_home.py` script will be rendered.

- When a user selects the `Predict Diabetes` option, the `diabetes_predict.py` script will be rendered.

- When a user selects the `Visualise Decision Tree` option, the `diabetes.py` script will be rendered.

To create this navigation bar, perform the following tasks in `diabetes_main.py` file:

1. Import the `diabetes_home.py`, `diabetes_predict` and `diabetes_plots.py` files in the `diabetes_main.py` respectively.

2. Create a dictionary, say `pages_dict`, with keys being the label to be displayed in the navigation bar and values being the name of Python script to be rendered:

  ```python
  pages_dict = {"Home": diabetes_home,
              "Predict Diabetes": diabetes_predict,
              "Visualise Decision Tree": diabetes_plots}
  ```

4. Add a title in the sidebar with the label `Navigation`.

5. Add a radio button widget with the label `Go to` and options as keys of the `pages_dict` dictionary. Pass these keys in the form of a list or a tuple as the options to the radio button widget can only be provided in the form of a list or a tuple.

6. Store the current value of this radio button widget in a `user_choice` variable.

6. Obtain the corresponding value of the key stored in the `user_choice` variable by passing it to the `pages_dict` dictionary. Store the value obtained from the  dictionary in a variable, say `selected_page`. It will have any value amongst `diabetes_home.py`, `diabetes_predict` or `diabetes_plots.py`.

7. Call the user defined `app()` function using `selected_page` variable and pass `diabetes_df` as input to the `app()` function.

In [None]:
# Create the Page Navigator for 'Home', 'Predict Diabetes' and 'Visualise Decision Tree' web pages in 'diabetes_main.py'
# Import the 'diabetes_predict' 'diabetes_home', 'diabetes_plots' Python files

# Adding a navigation in the sidebar using radio buttons
# Create the 'pages_dict' dictionary to navigate.
pages_dict = {"Home": diabetes_home,
           "Predict Diabetes": diabetes_predict,
           "Visualise Decision Tree": diabetes_plots}
# Add radio buttons in the sidebar for navigation and call the respective pages based on user selection.
st.sidebar.title('Navigation')
selection = st.sidebar.radio('GO to',tuple(pages_dict.keys()))
page = pages_dict[selection]
page.app(diabetes_df)


After this activity, the user must be able to navigate between Home page, Prediction Diabetes Page and Visualise Decision Tree  page using the radio buttons in the sidebar.

---

#### Activity 3: Home Page Configuration

Open the blank `diabetes_home.py` file that you had created in the previous activity. Create a function `app()` in this file with `diabetes_df` as its input and perform the following tasks within this `app()` function:

1. Write the following descrition of the Web app and style it using the `markdown()` function of Streamlit.

  ```
  Diabetes is a chronic (long-lasting) health condition that affects how your body turns food into energy.
                
  There isn’t a cure yet for diabetes, but losing weight, eating healthy food, and being active can really help in reducing the impact of diabetes.

  This Web app will help you to predict whether a person has diabetes or is prone to get diabetes in future by analysing the values of several features using the Decision Tree Classifier.
  
  ```

2. Add the code to display and hide the entire dataset using `st.beta_expander()` and `st.dataframe()` widget.

3. Display column names, column data types and column data with the click of checkbox.

4. Show dataset summary with the click of a checkbox.

In [None]:
# Show complete dataset and summary in 'diabetes_home.py'
# Import the streamlit modules.
import streamlit as st

# Define a function 'app()' which accepts 'census_df' as an input.
def app(diabetes_df):
  # Set the title to the home page contents.
  st.title('Early Diabetes Predictoin Web App')
  # Provide a brief description for the web app.
  st.markdown("<p style='color:red;font-size:25px'>Diabetes is a chronic (long-lasting) health condition that affects how your body turns food into energy. There isn’t a cure yet for diabetes, but losing weight, eating healthy food, and being active can really help in reducing the impact of diabetes. This Web app will help you to predict whether a person has diabetes or is prone to get diabetes in future by analysing the values of several features using the Decision Tree Classifier. </p>", unsafe_allow_html = True)
  # Add the 'beta_expander' to view full dataset
  st.header("View Data")
  with st.beta_expander("View Data"):
      st.table(diabetes_df)

  st.subheader("Columns Description:")
  beta_col1, beta_col2, beta_col3 = st.beta_columns(3)
  # Add a checkbox in the first column. Display the column names of 'diabetes_df' on the click of checkbox.
  with beta_col1:
    if st.checkbox("Show all column names"):
      st.table(list(diabetes_df.columns))

  # Add a checkbox in the second column. Display the column data-types of 'diabetes_df' on the click of checkbox.
  with beta_col2:
    if st.checkbox("View column data-type"):
        st.table(diabetes_df.dtypes)

  # Add a checkbox in the third column followed by a selectbox which accepts the column name whose data needs to be displayed.
  with beta_col3:
    if st.checkbox("View column data"):
        column_data = st.selectbox('Select column', tuple(diabetes_df.columns))
        st.write(diabetes_df[column_data])

**Expected Output:**

<img src="https://i.imgur.com/dNtxSgO.png"/>

After this activity, the home page of the web app will allow the user to view the complete dataset as well as view summary of the dataset.

---

#### Activity 4: Prediction Page Configuration

Open the blank `diabetes_predict.py` file that you had created in  **Activity 2: Page Navigator**. Create two functions `d_tree_pred()` and `grid_tree_pred()` to design the Decision Tree Classifier.

The `d_tree_pred()` function takes the following feature variables as input to predict the diabetes:
- `diabetes_df`
- `glucose`
- `bp`
- `insulin`
- `bmi`
- `pedigree`
- `age`

Inside this function,

1. Split the original DataFrame into train and test sets.

2. Create an object (say `dtree_clf`) of the `DecisionTreeClassifier()` class.

3. Inside the `DecisionTreeClassifier()` constructor, pass the following two parameters:

 - `criterion = entropy`

 - `max_depth = 3`

4. Call the `fit()` function on the above constructor with train features and target variables as inputs.

5. Get the predicted target values by passing the above features inside the `predict()` function on `dtree_clf` and store the result in a variable `prediction`. Use indexing to obtain the predicted value i.e. `prediction = prediction[0]`.

6. Get the accuracy score on the train set by calling the `accuracy_score()` function on the `metrics` object and store the result in a variable `score`.

7. Return the `prediction` and `score`.

In [None]:
# Import the necessary modules design the Decision Tree classifier
import numpy as np
import pandas as pd
import streamlit as st
from sklearn.metrics import r2_score, mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
from sklearn import tree
from sklearn import metrics

In [None]:
# Create the 'd_tree_pred' function to predict the diabetes using the Decision Tree classifier
@st.cache()
def d_tree_pred(diabetes_df, glucose, bp, insulin, bmi, pedigree, age):
    # Split the train and test dataset.
    feature_columns = list(diabetes_df.columns)

    # Remove the 'Pregnancies', Skin_Thickness' columns and the 'target' column from the feature columns
    feature_columns.remove('Skin_Thickness')
    feature_columns.remove('Pregnancies')
    feature_columns.remove('Outcome')

    X = diabetes_df[feature_columns]
    y = diabetes_df['Outcome']
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 1)

    dtree_clf = DecisionTreeClassifier(criterion="entropy", max_depth=3)
    dtree_clf.fit(X_train, y_train)
    y_train_pred = dtree_clf.predict(X_train)
    y_test_pred = dtree_clf.predict(X_test)
    # Predict diabetes using the 'predict()' function.
    prediction = dtree_clf.predict([[glucose, bp, insulin, bmi, pedigree, age]])
    prediction = prediction[0]

    score = round(metrics.accuracy_score(y_train, y_train_pred) * 100, 3)

    return prediction, score

---

#### Activity 5: `GridSearchCV` Decision Tree Classifier

In this activity you need to run the `GridSearchCV` for classifier optimisation.

Create the `grid_tree_pred()` function below the `d_tree_pred` function in `diabetes_predict.py` file. This function takes the following feature variables as input:
- `diabetes_df`
- `glucose`
- `bp`
- `insulin`
- `bmi`
- `pedigree`
- `age`

Inside this function:

 - First define a `param_grid` to select which parameters from `DecisionTreeClassifier` class you want to run the optimisation. Let us set:
  - `criterion`: `['gini','entropy']`
  - `max_depth`: `4` to `20`.
  - `random_state`: `[42]`

- Contruct a decision tree grid `grid_tree` using `GridSearchCV` function with following inputs:

 - `DecisionTreeClassifier`
 - `param_grid`
 - `scoring`

- Call the `fit()` function on the `grid_tree` using `X_train` and `y_train` as input.

- Create an object `best_tree` and assign it the best decision tree model using `best_estimator_` atribute from `GridSearchCV`.

- Get the predicted target values by passing the features inside the `predict()` function on `best_tree` object and store the result in a variable `prediction`.

- Print the score achieved by the best classifier using `grid_tree.best_score_` and display the score in percentge.

- Return the `prediction` and `score`.

In [None]:
def grid_tree_pred(diabetes_df, glucose, bp, insulin, bmi, pedigree, age):
    feature_columns = list(diabetes_df.columns)
    # Remove the 'Pregnancies', 'Skin_Thickness' columns and the 'target' column from the feature columns
    feature_columns.remove('Pregnancies')
    feature_columns.remove('Skin_Thickness')
    feature_columns.remove('Outcome')
    X = diabetes_df[feature_columns]
    y = diabetes_df['Outcome']
    # Split the train and test dataset.
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 1)

    param_grid = {'criterion':['gini','entropy'], 'max_depth': np.arange(4,21), 'random_state': [42]}

    # Create a grid
    grid_tree = GridSearchCV(DecisionTreeClassifier(), param_grid, scoring = 'roc_auc', n_jobs = -1)

    # Training
    grid_tree.fit(X_train, y_train)
    best_tree = grid_tree.best_estimator_

    # Predict diabetes using the 'predict()' function.
    prediction = best_tree.predict([[glucose, bp, insulin, bmi, pedigree, age]])
    prediction = prediction[0]

    score = round(grid_tree.best_score_ * 100, 3)

    return prediction, score

---

#### Activity 6: The Prediction Page Configuration

Perform the following tasks in `diabetes_predict.py` page below `grid_tree_pred()` function:

1. Create an user-defined function, say `app()` with `diabetes_df` as its input to perform their respective tasks.

2. Create six streamlit slider widgets for selecting the values of features (`Glucose`, `Blood_Pressure`, `Insulin`, `BMI`, `Pedigree_Function` and `Age`)  with their minimum and maximum values. Store the widgets in their respective variables.

3. Create a drop down menu to select the classifier.
  - Decision Tree Classifier
  - GridSearchCV Best Tree Classifier

4. If Decision Tree Classifier is selected, pass the values selected in the slider inside the `d_tree_pred` function to predict the diabetes on a click of a button as follows:  

  `prediction, score = d_tree_pred(diabetes_df, glucose, bp, insulin, bmi, pedigree, age)`

5. If `prediction == 1`, display `The person either has diabetes or prone to get diabetes`, else display `The person is free from diabetes`.

6. Also display the accuracy score of the model that is stored in the `score` variable.

Similarly, If GridSearchCV Best Tree Classifier is selected, pass the values selected using the slider inside the `grid_tree_pred()` function to predict the diabetes on a click of a button.

In [None]:
# Create the user defined 'app()' function.
def app(diabetes_df):
    st.markdown("<p style='color:red;font-size:25px'>This app uses <b>Decision Tree Classifier</b> for the Early Prediction of Diabetes.", unsafe_allow_html = True)
    st.subheader("Select Values:")

    # Create six sliders with the respective minimum and maximum values of features.
    # store them in variables 'glucose', 'bp', 'insulin, 'bmi', 'pedigree' and 'age'
    # Write your code here:
    glucose = st.slider("Glucose", float(diabetes_df['Glucose'].min()), float(diabetes_df['Glucose'].max()))
    bp = st.slider("Blood Pressure", float(diabetes_df['Blood_Pressure'].min()), float(diabetes_df['Blood_Pressure'].max()))
    insulin = st.slider("Insulin", float(diabetes_df['Insulin'].min()), float(diabetes_df['Insulin'].max()))
    bmi = st.slider("BMI", float(diabetes_df['BMI'].min()), float(diabetes_df['BMI'].max()))
    pedigree = st.slider("Pedigree Function", float(diabetes_df['Pedigree_Function'].min()), float(diabetes_df['Pedigree_Function'].max()))
    age = st.slider("Age", float(diabetes_df['Age'].min()), float(diabetes_df['Age'].max()))
    st.subheader("Model Selection")

    # Add a single select drop down menu with label 'Select the Classifier'
    predictor = st.selectbox("Select the Decision Tree Classifier",('Decision Tree Classifier', 'GridSearchCV Best Tree Classifier'))

    if predictor == 'Decision Tree Classifier':
        if st.button("Predict"):
            prediction, score = d_tree_pred(diabetes_df, glucose, bp, insulin, bmi, pedigree, age)
            st.subheader("Decision Tree Prediction results:")
            if prediction == 1:
                st.info("The person either has diabetes or prone to get diabetes")
            else:
                st.info("The person is free from diabetes")
            st.write("The accuracy score of this model is", score, "%")


    elif predictor == 'GridSearchCV Best Tree Classifier':
        if st.button("Predict"):
            prediction, score = grid_tree_pred(diabetes_df, glucose, bp, insulin, bmi, pedigree, age)
            st.subheader("Optimised Decision Tree Prediction results:")
            if prediction == 1:
                st.info("The person either has diabetes or prone to get diabetes")
            else:
                st.info("The person is free from diabetes")
            st.write("The best score of this model is", score, "%")

**Expected Output:**

<img src="https://s3-whjr-v2-prod-bucket.whjr.online/4ea07d40-12bd-4edc-a68d-dfafee57e49a.gif"/>


---

#### Activity 7: Visualise the Decision Tree Classifier

Open the blank `diabetes_plots.py` file that you had created in  **Activity 3:Page Navigator**. Create an `app()` function in this file with `diabetes_df` as its input and perform the following tasks within this `app()` function:

1. Add the code to display the correlation heatmap on a click of a checkbox.

2. Add a `selectbox` widget with label `Select the Classifier` to select the Classifier model (Decision Tree Classifier and GridSearchCV Best Tree Classifier).

3. If Decision Tree classifier is selected, fit the decision tree model and plot the confusion matrix and decision tree for the Decision Tree classifier.

4. If GridSearchCV Best Tree Classifier is selected, fit the GridSearchCV best tree model and plot the confusion matrix and decision tree for the classifier. You can create the decision tree in the streamlit web app using the `graphviz_chart` function of the streamlit.

In [None]:
# Code for 'diabetes_plot.py' file.
# Import necessary modules.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import streamlit as st
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
from sklearn import tree
from sklearn.metrics import confusion_matrix, plot_confusion_matrix, classification_report
import graphviz as graphviz
from sklearn.tree import export_graphviz
from io import StringIO
from IPython.display import Image


# Define a function 'app()' which accepts 'census_df' as an input.
def app(diabetes_df):
    warnings.filterwarnings('ignore')
    st.set_option('deprecation.showPyplotGlobalUse', False)
    st.title("Visualise the Diabetes Prediction Web app ")

    if st.checkbox("Show the correlation heatmap"):
        st.subheader("Correlation Heatmap")
        plt.figure(figsize = (10, 6))
        ax = sns.heatmap(diabetes_df.iloc[:, 1:].corr(), annot = True) # Creating an object of seaborn axis and storing it in 'ax' variable
        bottom, top = ax.get_ylim() # Getting the top and bottom margin limits.
        ax.set_ylim(bottom + 0.5, top - 0.5) # Increasing the bottom and decreasing the top margins respectively.
        st.pyplot()

    st.subheader("Predictor Selection")


    # Add a single select with label 'Select the Classifier'
    plot_select = st.selectbox("Select the Classifier to Visualise the Diabetes Prediction:", ('Decision Tree Classifier', 'GridSearchCV Best Tree Classifier'))

    if plot_select == 'Decision Tree Classifier':
        # Split the train and test dataset.
        feature_columns = list(diabetes_df.columns)

        # Remove the 'Pregnancies', 'Skin_Thickness' columns and the 'target' column from the feature columns
        feature_columns.remove('Pregnancies')
        feature_columns.remove('Skin_Thickness')
        feature_columns.remove('Outcome')

        X = diabetes_df[feature_columns]
        y = diabetes_df['Outcome']
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 1)

        dtree_clf = DecisionTreeClassifier(criterion="entropy", max_depth=3)
        dtree_clf.fit(X_train, y_train)
        y_train_pred = dtree_clf.predict(X_train)
        y_test_pred = dtree_clf.predict(X_test)


        if st.checkbox("Plot confusion matrix"):
            plt.figure(figsize = (10, 6))
            plot_confusion_matrix(dtree_clf, X_train, y_train, values_format = 'd')
            st.pyplot()

        if st.checkbox("Plot Decision Tree"):
            # Export decision tree in dot format and store in 'dot_data' variable.
            dot_data = tree.export_graphviz(decision_tree = dtree_clf, max_depth = 3, out_file = None, filled = True, rounded = True,
                feature_names = feature_columns, class_names = ['0', '1'])
            # Plot the decision tree using the 'graphviz_chart' function of the 'streamlit' module.
            st.graphviz_chart(dot_data)


    if plot_select == 'GridSearchCV Best Tree Classifier':
        # Split the train and test dataset.
        feature_columns = list(diabetes_df.columns)

        # Remove the 'Pregnancies', 'Skin_Thickness' columns and the 'target' column from the feature columns
        feature_columns.remove('Pregnancies')
        feature_columns.remove('Skin_Thickness')
        feature_columns.remove('Outcome')

        X = diabetes_df[feature_columns]
        y = diabetes_df['Outcome']
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 1)

        param_grid = {'criterion':['gini','entropy'], 'max_depth': np.arange(4,21), 'random_state': [42]}

        # Create a grid
        grid_tree = GridSearchCV(DecisionTreeClassifier(), param_grid, scoring = 'roc_auc', n_jobs = -1)

        # Training
        grid_tree.fit(X_train, y_train)
        best_tree = grid_tree.best_estimator_

        grid_tree.fit(X_train, y_train)
        y_train_pred = grid_tree.predict(X_train)
        y_test_pred = grid_tree.predict(X_test)


        if st.checkbox("Plot confusion matrix"):
            plt.figure(figsize = (5, 3))
            plot_confusion_matrix(grid_tree, X_train, y_train, values_format = 'd')
            st.pyplot()

        if st.checkbox("Plot Decision Tree"):
            # Create empty dot file.
            #dot_data = StringIO()
            # Export decision tree in dot format.
            dot_data = tree.export_graphviz(decision_tree = best_tree, max_depth = 3, out_file = None, filled = True, rounded = True,
                feature_names = feature_columns, class_names = ['0', '1'])
            st.graphviz_chart(dot_data)




**Expected Output:**

<img src="https://s3-whjr-v2-prod-bucket.whjr.online/349f5dbb-f7c8-4836-89d8-bf9234f8738a.gif"/>

After this activity, the user must be able to predict diabetes and visualise the classifier models using the decision tree plots and confusion matrix.


---

### Submitting the Project

Follow the steps described below to submit the project.

1. After finishing the project, click on the **Share** button on the top right corner of the notebook. A new dialog box will appear.

  <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/2_share_button.png' width=500>

2. In the dialog box, click on the **Copy link** button.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/3_copy_link.png' width=500>


3. The link of the duplicate copy (named as **YYYY-MM-DD_StudentName_CapstoneProject19**) of the notebook will get copied

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/4_copy_link_confirmation.png' width=500>

4. Go to your dashboard and click on the **My Projects** option.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/5_student_dashboard.png' width=800>

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/6_my_projects.png' width=800>

5. Click on the **View Project** button for the project you want to submit.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/7_view_project.png' width=800>

6. Click on the **Submit Project Here** button.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/8_submit_project.png' width=800>

7. Paste the link to the project file named as **YYYY-MM-DD_StudentName_CapstoneProject19** in the URL box and then click on the **Submit** button.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/9_enter_project_url.png' width=800>


---