<img src="images/bannerugentdwengo.png" alt="Banner" width="400"/>

<div>
    <font color=#690027 markdown="1">
<h1>DECISION TREE: HEART ATTACK EXAMPLE</h1>    </font>
</div>

<div class="alert alert-box alert-success">
In this notebook, you will have Python generate a decision tree based on a table with labeled examples.<br>A decision tree provides a solution for a classification problem, here in a medical context.</div>

<div>
    <font color=#690027 markdown="1">
<h2>1. The medical problem</h2>    </font>
</div>

One can take into account several parameters in an attempt to predict whether a patient is at risk of a heart attack. Certain parameters of a known patient can be found in the patient's record.<br>The following table shows such parameters for six (known) patients indicating whether they had a heart attack or not.

<table>
 <thead>
    <tr>
<th><p align="center">Patient Number</th>      <th><p align="center">Chest Pain</th><th><p align="center">Man</th>      <th><p align="center">Smokes</th><th><p align="center">Enough physical exercise</th>      <th><p align="center">Heart Attack</th>  </thead>
<tr> <td> <p align="left">1<td> <p align="center">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; yes<td> <p align="center">&nbsp;&nbsp;yes<td> <p align="center">&nbsp;&nbsp;&nbsp;&nbsp;no<td> <p align="left">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; yes<td> <p align="left">&nbsp;&nbsp;&nbsp; yes   <tr> <td> <p align="left">2<td> <p align="center">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; yes<td> <p align="center">&nbsp;&nbsp;yes<td> <p align="center">&nbsp;&nbsp;&nbsp;&nbsp;yes<td> <p align="left">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; no<td> <p align="left">&nbsp;&nbsp;&nbsp; yes   <tr> <td> <p align="left">3<td> <p align="center">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; no<td> <p align="center">&nbsp;&nbsp;no<td> <p align="center">&nbsp;&nbsp;&nbsp;&nbsp;yes<td> <p align="left">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; no<td> <p align="left">&nbsp;&nbsp;&nbsp; yes<tr> <td> <p align="left">4<td> <p align="center">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; no<td> <p align="center">&nbsp;&nbsp;yes        <td> <p align="center">&nbsp;&nbsp;&nbsp;&nbsp;no<td> <p align="left">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; yes<td> <p align="left">&nbsp;&nbsp;&nbsp; no<tr> <td> <p align="left">5 &nbsp;&nbsp;<td> <p align="center">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; yes<td> <p align="center">&nbsp;&nbsp;no<td> <p align="center">&nbsp;&nbsp;&nbsp;&nbsp;yes<td> <p align="left">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; yes<td> <p align="left">&nbsp;&nbsp;&nbsp; yes<tr> <td> <p align="left">6<td> <p align="center">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; no<td> <p align="center">&nbsp;&nbsp;yes<td> <p align="center">&nbsp;&nbsp;&nbsp;&nbsp;yes<td> <p align="left">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; yes<td> <p align="left">&nbsp;&nbsp;&nbsp; no</table>

In this table, a patient is referred to by a 'patient number'.<br>The parameters 'chest pain', 'male', 'smokes' and 'sufficient exercise' are the parameters that are considered to evaluate the risk of a heart attack.<br> The patient can belong to the categories 'heart attack' or 'no heart attack', thus to the class 'yes' (which is the same as 'heart attack') or the class 'no' (which is the same as 'no heart attack').

<div>
    <font color=#690027 markdown="1">
<h2>2. The decision tree</h2>    </font>
</div>

### Importing modules

First, you import the necessary Python modules in order to be able to use the necessary functions and methods.

In [None]:
import numpy as np                         # to be able to input table as a matriximport matplotlib.pyplot as plt            # to display image of decision treefrom sklearn import tree                   # to generate decision tree

<div>
    <font color=#690027 markdown="1">
<h3>2.1 Preprocessing of the dataset</h3>    </font>
</div>

<div class="alert alert-box alert-info">
Note that the values of the parameters (variables) here are categorical, so you convert them to numerical values so that the computer can work with them: <br>instead of 'yes' you use the value '1' and instead of 'no' you use the value '0'.</div>    

First, analyze the table. What exactly is in it?- In the first column is the patient number, which has no influence on whether or not a heart attack occurs. So you disregard that patient number.- The health parameters 'chest pain', 'man', 'smokes' and 'sufficient exercise' are indeed important and should therefore be considered.- The last column indicates the category to which the patient belongs: 'heart attack' or 'no heart attack'. So, you are looking for divisions that bring about a minimal dispersion of the patients across these two classes.

<div class="alert alert-box alert-info">
A table of numbers is represented in mathematics by a matrix. <br>In Python, you enter such a matrix row by row. You use the function array() from the NumPy module, which you have already imported.</div>    

In [None]:
# each row in the matrix corresponds to one patient# values of health parameters are in resp. column 1, 2, 3, 4 of matrix data# last column indicates whether the patient is at risk of heart attack ('1') or not ('0').The input provided does not contain any Dutch text that needs translation.      [1, 1, 0, 1, 1],[1, 1, 1, 0, 1],       [0, 0, 1, 0, 1],       [0, 1, 0, 1, 0],[1, 0, 1, 1, 1],[0, 1, 1, 1, 0]])

Now you indicate to the computer that the first four columns of the entered matrix contain the parameters of a patient and the last column the class.

In [None]:
# health parameters and class distinctionhealth_parameters = data[:, :4]        # first 4 columns of matrix are considered parametersclass_ = data[:, 4]                        # last column is the class to which the patient belongs

Using the following two code cells, you can quickly check whether you entered everything correctly.

In [None]:
print(healthParameters)

In [None]:
print(class)

<div class="alert alert-box alert-warning">
You can practice with matrices in the notebook 'Matrices'. <br></div>    

<div class="alert alert-box alert-info">
The variables in this notebook can only take two values: 'yes' or 'no'. This is also referred to as Boolean variables. These variables can also take the value '0' or '1' or 'True' or 'False'.</div>

<div>
    <font color=#690027 markdown="1">
<h3>2.2 Generation of the decision tree</h3>    </font>
</div>

In Python, everything is an object. So, you first create an object: a decision tree that will classify patients. You do this using the DecisionTreeClassifier() function from the tree module. Note that you specify that the computer should use the gini index in the process. <br>You refer to that object with the variable decision tree.<br>You instruct the computer to create a decision tree that matches data about the parameters and the classes.

In [None]:
# generate decision tree based on datadecision_tree = tree.DecisionTreeClassifier(criterion="gini")   # tree is created using gini-indexdecision_tree.fit(health_parameters, class)                # generate tree that corresponds to data

Good, the computer did what you asked it to do. But you can't see the result yet. <br> A final step needs to be taken.

<div>
    <font color=#690027 markdown="1">
<h3>2.3 Displaying the Decision Tree</h3>    </font>
</div>

To display an image of the decision tree on the screen, you first create a drawing window, you tell the computer what should appear on that screen and finally give the instruction to show the image. <br>You use the functions figure() and show() from matplotlib, and the function plot_tree() from the module tree.

In [None]:
# show decision treeplt.figure(figsize=(10,10))                                                     # create drawing windowtree.plot_tree(decision_tree,                                                 # indicate what needs to be shownclass_names=["no risk", "risk"],feature_names=["Chest pain", "Male", "Smokes", "Exercise"],  # health parameters: 'chest pain', 'male', 'smokes', 'sufficient exercise'               filled=True, rounded=True)plt.show()                                                                      # show figure

<div>
    <font color=#690027 markdown="1">
<h2>3. All code together</h2>    </font>
</div>

In [None]:
# import necessary modulesimport numpy as np                         # to enter table as a matriximport matplotlib.pyplot as plt            # to display image of decision treefrom sklearn import tree                   # to generate decision tree
# dataThere doesn't seem to be any Dutch text in the input, please provide a sentence or paragraph in Dutch.      [[1, 1, 0, 1, 1],[1, 1, 1, 0, 1],[0, 0, 1, 0, 1],[0, 1, 0, 1, 0],[1, 0, 1, 1, 1],[0, 1, 1, 1, 0]])
# distinguishing parameters and classhealth_parameters = data[:, :4]        # first 4 columns of matrix are considered parametersclass = data[:, 4]                         # last column is class
# generate decision tree based on datadecisionTree = tree.DecisionTreeClassifier(criterion="gini")   # tree is created via gini indexdecision_tree.fit(health_parameters, class)                # generate tree corresponding to data
# display decision treeplt.figure(figsize=(10,10))                                                     # create drawing windowtree.plot_tree(decision_tree,                                                 # specify what needs to be shownclass_names=["no risk", "risk"],               feature_names=["Chest pain", "Male", "Smokes", "Exercise"],  # health parameters: 'chest pain', 'male', 'smokes', 'adequate exercise'               filled=True, rounded=True)plt.show()                                                                      # show figure

### Assignment
What in this code will remain the same in the exercises? What will you have to adjust?

### References

Ruiz C. (2001). CS4341 Introduction to Artificial Intelligence Homework - D 2001.<br>      http://web.cs.wpi.edu/~cs4341/D01/HW/homework.html#problem1

<img src="images/cclic.png" alt="Banner" align="left" width="100"/><br><br>
AI in Healthcare Notebook, see <a href="http://www.aiopschool.be">AI At School</a>, by F. Wyffels & N. Gesquière is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.