        *----------------------------- AUTHOR_DETAILS -------------------------------*
        |                                                                            |
        |        Project Title  = Heart Disease Prediction System                    |
        |                                                                            |
        |        Author         = Mr. Hanzala Muhammad Khan                          |
        |                                                                            |
        *----------------------------------------------------------------------------*

<br><br><br>
<center> <h2 style="color:green">-------------------- PROJECT PURPOSE --------------------</h2> </center>
<br>
<center><h3>
The main purpose of this Project is to demonstrate how the Heart Disease Problem can be treated as a Supervised Machine Learning Problem using Python and Scikit-learn Machine Learning Toolkit </h3>
<br>
<center><h3> For this Purpose, we will execute the Machine Learning Cycle </h3>
<br>
<center> <h2 style="color:green">-------------------------------------------------------------------------</h2> </center>
<br><br><br>

# Machine Learning Cycle

### Four phases of a Machine Learning Cycle are

### Training Phase

    Build the Model using Training Data

### Testing Phase

     Evaluate the performance of Model using Testing Data

### Application Phase

     Deploy the Model in the Real-world, to predict Real-time unseen Data

### Feedback Phase

    Take Feedback from the Users and Domain Experts to improve the Model


### We will follow the following steps to execute the Machine Learning Cycle Using a Single File

#### Step 1: Import Libraries

#### Step 2: Load Sample Data

#### Step 3: Understand and Pre-process Sample Data
    
    Step 3.1: Understand Sample Data
    
    Step 3.2: Pre-process Sample Data

#### Step 4: Feature Extraction 

#### Step 5: Label Encoding (Input and Output is converted in Numeric Representation)

    Step 5.1: Train the Label Encoder

    Step 5.2: Label Encode the Output

    Step 5.3: Label Encode the Input 

#### Step 6: Execute the Training Phase

    Step 6.1: Splitting Sample Data into Training Data and Testing Data 

    Step 6.2: Splitting Input Vectors and Outputs / Labels of Training Data

    Step 6.3: Train the Support Vector Classifier

    Step 6.4: Save the Trained Model

#### Step 7: Execute the Testing Phase 

    Step 7.1: Splitting Input Vectors and Outputs/Labels of Testing Data
    
    Step 7.2: Load the Saved Model
    
    Step 7.3: Evaluate the Performance of Trained Model

        Step 7.3.1: Make Predictions with the Trained Model on Testing Data

    Step 7.4: Calculate the Accuracy Score

#### Step 8: Execute the Application Phase 

    Step 8.1: Take Input from User 

    Step 8.2: Convert User Input into Feature Vector (Exactly Same as Feature Vectors of Sample Data)

    Step 8.3: Label Encoding of Feature Vector (Exactly Same as Label Encoded Feature Vectors of Sample Data)

    Step 8.4: Load the Saved Model

    Step 8.5: Model Prediction

         Step 8.5.1: Apply Model on the Label Encoded Feature Vector of unseen instance and return Prediction to the User

#### Step 9: Execute the Feedback Phase 

#### Step 10: Improve the Model based on Feedback

# Step 1: Import Libraries

In [2]:
# Import Libraries

import numpy as np
import pandas as pd
import pickle
import math

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn import svm
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import accuracy_score

from prettytable import PrettyTable   
# from astropy.table import Table, Column

# Step 2: Load Sample Data

In [3]:
# Load Sample Data
 
complete_sample_data = pd.read_csv("heart-disease-sample-data.csv")
sample_data = pd.DataFrame( complete_sample_data , columns=["age","sex","chol","oldpeak","target"])

print("\n\nSample Data:")
print("============\n")
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(sample_data)



Sample Data:

    age  sex  chol  oldpeak  target
0    63    1   233      2.3       1
1    37    1   250      3.5       1
2    41    0   204      1.4       1
3    56    1   236      0.8       1
4    57    0   354      0.6       1
5    57    1   192      0.4       1
6    56    0   294      1.3       1
7    44    1   263      0.0       1
8    52    1   199      0.5       1
9    57    1   168      1.6       1
10   54    1   239      1.2       1
11   48    0   275      0.2       1
12   49    1   266      0.6       1
13   64    1   211      1.8       1
14   58    0   283      1.0       1
15   50    0   219      1.6       1
16   58    0   340      0.0       1
17   66    0   226      2.6       1
18   43    1   247      1.5       1
19   69    0   239      1.8       1
20   59    1   234      0.5       1
21   44    1   233      0.4       1
22   42    1   226      0.0       1
23   61    1   243      1.0       1
24   40    1   199      1.4       1
25   71    0   302      0.4       1
26   59    1

# Step 3: Understand and Pre-process Sample Data

## Step 3.1: Understand Sample Data

In [4]:
# Understand Sample Data

print("\n\nAttributes in Sample Data:")
print("==========================\n")

print(sample_data.columns)

print("\n\nNumber of Instances in Sample Data:",sample_data["age"].count())
print("========================================\n")



Attributes in Sample Data:

Index(['age', 'sex', 'chol', 'oldpeak', 'target'], dtype='object')


Number of Instances in Sample Data: 100



## Step 3.2: Pre-process Sample Data
    o	Sample Data is already Preprocessed
    o	No Preprocessing needs to be Performed 

# Step 4: Feature Extraction
    o	Features are already Extracted
    o	No Feature Extraction needs to be Performed

# Step 5: Label Encoding the Sample Data (Input and Output is converted in Numeric Representation)
    o	Data already in numeric form, does not need to be encoded

# Step 6: Execute the Training Phase 

## Step 6.1: Splitting Sample Data into Training Data and Testing Data

In [5]:
# Splitting Sample Data into Training Data and Testing Data

training_data, testing_data = train_test_split( sample_data , test_size=0.2 , random_state=0 , shuffle = True)

# Save the Training and Testing Data into CSV File 

training_data.to_csv(r'training-data.csv', index = False, header = True)
testing_data.to_csv(r'testing-data.csv', index = False, header = True)

# print Training and Testing Data

print("\n\nTraining Data:")
print("==============\n")
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(training_data)
print("\n\nTesting Data:")
print("==============\n")
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(testing_data)



Training Data:

    age  sex  chol  oldpeak  target
43   53    0   264      0.4       1
62   64    1   335      0.0       0
3    56    1   236      0.8       1
71   60    1   253      1.4       0
45   52    1   325      0.2       1
48   53    0   216      0.0       1
6    56    0   294      1.3       1
99   56    1   249      1.2       0
82   67    1   254      0.2       0
76   58    1   216      2.2       0
60   40    1   167      2.0       0
80   59    1   326      3.4       0
90   52    1   255      0.0       0
68   58    1   230      2.5       0
51   67    1   229      2.6       0
27   51    1   175      0.6       1
18   43    1   247      1.5       1
56   48    1   229      1.0       0
63   43    1   177      2.5       0
74   41    1   172      0.0       0
1    37    1   250      3.5       1
61   60    1   230      1.4       0
42   45    1   208      3.0       1
41   48    1   245      0.2       1
4    57    0   354      0.6       1
15   50    0   219      1.6       1
17   66   

## Step 6.2: Splitting Input Vectors and Outputs / Labels of Training Data

In [6]:
# Splitting Input Vectors and Outputs / Labels of Training Data

print("\n\nInputs Vectors (Feature Vectors) of Training Data:")
print("==================================================\n")
input_vector_train = training_data.iloc[: , :-1]
print(input_vector_train)

print("\n\nOutputs/Labels of Training Data:")
print("================================\n")
print("    Disease")
output_label_train = training_data.iloc[: ,-1]
print(output_label_train)



Inputs Vectors (Feature Vectors) of Training Data:

    age  sex  chol  oldpeak
43   53    0   264      0.4
62   64    1   335      0.0
3    56    1   236      0.8
71   60    1   253      1.4
45   52    1   325      0.2
48   53    0   216      0.0
6    56    0   294      1.3
99   56    1   249      1.2
82   67    1   254      0.2
76   58    1   216      2.2
60   40    1   167      2.0
80   59    1   326      3.4
90   52    1   255      0.0
68   58    1   230      2.5
51   67    1   229      2.6
27   51    1   175      0.6
18   43    1   247      1.5
56   48    1   229      1.0
63   43    1   177      2.5
74   41    1   172      0.0
1    37    1   250      3.5
61   60    1   230      1.4
42   45    1   208      3.0
41   48    1   245      0.2
4    57    0   354      0.6
15   50    0   219      1.6
17   66    0   226      2.6
40   51    0   308      1.5
38   65    0   269      0.8
5    57    1   192      0.4
91   59    1   239      1.2
59   60    1   206      2.4
0    63    1   233    

## 6.3: Train the Random Forest Regressor

In [7]:
# Train the Random Forest Regressor

print("\n\nTraining the Random Forest Regressor on Training Data")
print("========================================================\n")
print("\nParameters and their values:")
print("============================\n")
model = RandomForestRegressor(n_estimators=20, random_state=0)
model.fit(input_vector_train,np.ravel(output_label_train))
print(model)



Training the Random Forest Regressor on Training Data


Parameters and their values:

RandomForestRegressor(n_estimators=20, random_state=0)


## Step 6.4: Save the Trained Model

In [8]:
# Save the Model in a Pkl File

pickle.dump(model, open('rfg_trained_model.pkl', 'wb'))

# Step 7: Execute the Testing Phase 

## Step 7.1: Splitting Input Vectors and Outputs/Labels of Testing Data

In [9]:
# Splitting Input Vectors and Outputs/Labels of Testing Data

print("\n\nInputs Vectors (Feature Vectors) of Testing Data:")
print("=================================================\n")
input_vector_test = testing_data.iloc[: , :-1]
print(input_vector_test)

print("\n\nOutputs/Labels of Testing Data:")
print("==============================\n")
print("    Disease")
output_label_test = testing_data.iloc[: ,-1]
print(output_label_test)



Inputs Vectors (Feature Vectors) of Testing Data:

    age  sex  chol  oldpeak
26   59    1   212      1.6
86   60    1   258      2.8
2    41    0   204      1.4
55   56    1   256      0.6
75   51    0   305      1.2
93   49    1   188      2.0
16   58    0   340      0.0
73   50    1   233      0.6
54   53    1   203      3.1
95   57    1   229      0.4
53   63    1   254      1.4
92   60    0   258      2.6
78   60    1   282      2.8
13   64    1   211      1.8
7    44    1   263      0.0
30   41    0   198      0.0
22   42    1   226      0.0
24   40    1   199      1.4
33   54    1   273      0.5
8    52    1   199      0.5


Outputs/Labels of Testing Data:

    Disease
26    1
86    0
2     1
55    0
75    0
93    0
16    1
73    0
54    0
95    0
53    0
92    0
78    0
13    1
7     1
30    1
22    1
24    1
33    1
8     1
Name: target, dtype: int64


## Step 7.2: Load the Saved Model

In [10]:
# Load the Saved Model

model_test = pickle.load(open('rfg_trained_model.pkl', 'rb'))

## Step 7.3: Evaluate the Machine Learning Model
### Step 7.3.1: Make Predictions with the Trained Models on Testing Data

In [11]:
# Provide Test data to the Trained Model

model_predictions = np.floor(model_test.predict(input_vector_test)+0.5).astype(int) 
testing_data.copy(deep=True)
pd.options.mode.chained_assignment = None
testing_data["Predictions"] = model_predictions

# Save the Predictions into CSV File

testing_data.to_csv(r'model-predictions.csv', index = False, header = True)

model_predictions = testing_data 
print("\n\nPredictions Returned by rfr_trained_model:")
print("==========================================\n")
print(model_predictions)



Predictions Returned by rfr_trained_model:

    age  sex  chol  oldpeak  target  Predictions
26   59    1   212      1.6       1            0
86   60    1   258      2.8       0            0
2    41    0   204      1.4       1            1
55   56    1   256      0.6       0            0
75   51    0   305      1.2       0            1
93   49    1   188      2.0       0            0
16   58    0   340      0.0       1            0
73   50    1   233      0.6       0            1
54   53    1   203      3.1       0            1
95   57    1   229      0.4       0            1
53   63    1   254      1.4       0            0
92   60    0   258      2.6       0            1
78   60    1   282      2.8       0            0
13   64    1   211      1.8       1            1
7    44    1   263      0.0       1            1
30   41    0   198      0.0       1            1
22   42    1   226      0.0       1            1
24   40    1   199      1.4       1            0
33   54    1   273     

## Step 7.4: Calculate the Accuracy Score

In [12]:
# Calculate the Accuracy

model_accuracy_score = accuracy_score(model_predictions["target"],model_predictions["Predictions"])

print("\n\nAccuracy Score:")
print("===============\n")
print(round(model_accuracy_score,2))



Accuracy Score:

0.55


# Step 8: Execute the Application Phase

## Step 8.1: Take Input from User

In [13]:
# Take Input from User

age_input = int(input("\nPlease enter Age here: ").strip())
gender_input = int(input("\nPlease enter your Gender here (For Male write 0, For Female write 1) : ").strip())
chol_input = int(input("\nPlease enter your cholestrol level here: ").strip())
oldpeak_input = float(input("\nPlease enter oldpeak here: ").strip())

## Step 8.2: Convert User Input into Feature Vector (Exactly Same as Feature Vectors of Sample Data)

In [14]:
# Convert User Input into Feature Vector

user_input = pd.DataFrame({ 'age': [age_input],'sex': [gender_input],'chol': [chol_input],'oldpeak': [oldpeak_input]})

print("\n\nUser Input Feature Vector:")
print("==========================\n")
print(user_input)



User Input Feature Vector:

   age  sex  chol  oldpeak
0   49    0   334      2.9


## Step 8.3: Load the Saved Model

In [15]:
# Load the Saved Model

model = pickle.load(open('rfg_trained_model.pkl', 'rb'))

## Step 8.5: Model Prediction
### Step 8.5.1: Apply Model on the Label Encoded Feature Vector of unseen instance and return Prediction to the User

In [16]:


predicted_disease = math.floor(model.predict(user_input)[0]+0.5)

# print(predicted_disease)
if(predicted_disease == 1): 
    prediction = "HAS DISEASE"
elif(predicted_disease == 0):
    prediction = "DOES NOT HAVE DISEASE"

# Add the Prediction in a Pretty Table

pretty_table = PrettyTable()
pretty_table.add_column("       ** Prediction **       ",[prediction])
print(pretty_table)

+--------------------------------+
|        ** Prediction **        |
+--------------------------------+
|          HAS DISEASE           |
+--------------------------------+


# Step 9: Execute the Feedback Phase
## A Two-Step Process
### Step 01: After some time, take Feedback from
    o	Domain Experts and Users on deployed Titanic Passenger Survival Prediction System
### Step 02: Make a List of Possible Improvements based on Feedback received

# Step 10: Improve Model based on Feedback
### There is Always Room for Improvement
### Based on Feedback from Domain Experts and Users
    o	Improve your Model