# AI 2024 Online Summer Internship
### Name: Rasikh Ali
### Email: rasikhali1234@gmail.com

## System & Libraries
<div class="alert alert-block alert-success">
    Using <b>Python </b> v3.8.18
    <br>
    with <b>Jupyter </b> v7.4.9
</div>
<div class="alert alert-block alert-info">
    
|    Libarries    | Version |     Purpose     |
|-----------------|---------|-----------------|
|Pandas           | v1.4.2  | Used for Data Manipulation and Analysis | 
|Numpy            | v1.23.5 | Used for Array Manipulation             |
|Pickle           | v4.0    | Used for Saving and Loading Model       |
|LabelEncoder     |         | Used for Encoding Categorical Features  |
|SVM              |         | Classifier for Classification, Regression, Outlier Detection  |
|Accuracy_Score   |         | Used for Calculating Accuracy Score of a Model                |
|Train_Test_Split |         | Used for Splitting array/matrices into train,test subsets     |
    
</div>

In [170]:
import pandas as pd
import numpy as np
import pickle

from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

## Loading Dataset

In [171]:
sample_data = pd.read_csv('dataset/heart-disease-sample-data.csv')
sample_data.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


## Understanding Dataset

<div class="alert alert-block alert-info">
About <b>Dataset</b>.

| Attributes | Description  |
|------------|--------------|                                                                                                       
|age         | Patient's age                                                                                                                                                         |
|sex         | Patient's Gender  (1 = male, 0 = female)                                                                                                                              |
|cp          | Chest Pain type  (0, 1, 2, 3)                                                                                                                                         |
|trestbps    | Resting Blood Pressure (in mm Hg)                                                                                                                                     |
|chol        | Serum Cholestoral in mg/dl                                                                                                                                            |
|fbs         | Fasting Blood Sugar > 120 mg/dl (1 = true, 0 = false)                                                                                                                 |
|restecg     | Resting Electrocardiographic Results  (0 = normal, 1 = having ST-T wave abnormality, 2 = showing probable or definite left ventricular hypertrophy)                   |
|thalach     | Maximum heart rate achieved during exercise                                                                                                                           |
|exang       | Exercise induced angina (1 = yes, 0 = no)                                                                                                                             |
|oldpeak     | ST depression induced by exercise relative to rest                                                                                                                    |
|slope       | Slope of the peak exercise ST segment (0 = upsloping, 1 = flat, 2 = downsloping)                                                                                      |
|ca          | Number of major vessels (0-3) colored by fluoroscopy                                                                                                                  |
|thal        | Thalassemia (3 = normal, 6 = fixed defect, 7 = reversible defect)                                                                                                     |
|target      | Presence or absence of heart disease (1 = presence, 2 = absesence)                                                                                                    |

</div>


In [172]:
print("-- Attributes in Sample Data --")
for cols in sample_data.columns:
    print(cols)

-- Attributes in Sample Data --
age
sex
cp
trestbps
chol
fbs
restecg
thalach
exang
oldpeak
slope
ca
thal
target


In [173]:
print("-- Number of instances in Sample Data --")
print(sample_data.count())

-- Number of instances in Sample Data --
age         100
sex         100
cp          100
trestbps    100
chol        100
fbs         100
restecg     100
thalach     100
exang       100
oldpeak     100
slope       100
ca          100
thal        100
target      100
dtype: int64


In [174]:
print("-- Number of Unique Values in Sample Data --")
print(sample_data.nunique())

-- Number of Unique Values in Sample Data --
age         33
sex          2
cp           4
trestbps    23
chol        78
fbs          2
restecg      2
thalach     53
exang        2
oldpeak     27
slope        3
ca           4
thal         4
target       2
dtype: int64


In [175]:
print("-- Number of Null Values in Sample Data --")
print(sample_data.isnull().sum())

-- Number of Null Values in Sample Data --
age         0
sex         0
cp          0
trestbps    0
chol        0
fbs         0
restecg     0
thalach     0
exang       0
oldpeak     0
slope       0
ca          0
thal        0
target      0
dtype: int64


In [176]:
print("-- Insights of Sample Data --")
sample_data.info()

-- Insights of Sample Data --
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       100 non-null    int64  
 1   sex       100 non-null    int64  
 2   cp        100 non-null    int64  
 3   trestbps  100 non-null    int64  
 4   chol      100 non-null    int64  
 5   fbs       100 non-null    int64  
 6   restecg   100 non-null    int64  
 7   thalach   100 non-null    int64  
 8   exang     100 non-null    int64  
 9   oldpeak   100 non-null    float64
 10  slope     100 non-null    int64  
 11  ca        100 non-null    int64  
 12  thal      100 non-null    int64  
 13  target    100 non-null    int64  
dtypes: float64(1), int64(13)
memory usage: 11.1 KB


## Pre-Processing

#### Managing Null Values

<div class="alert alert-block alert-success">
    As None of the Attribute have any null value, we'll skip it
</div>

In [177]:
sample_data['oldpeak'] = sample_data['oldpeak'].astype(int)

<div class="alert alert-block alert-success">
Selecting 4 <b>most discriminating</b> Input Attributes, so we'll be dropping 
</div>

In [178]:
cols = ['age','trestbps','chol','thalach','exang','oldpeak','slope','ca','thal']
selected_data = sample_data.drop(columns=cols)

## Label Encoding

<div class="alert alert-block alert-success">
    As all the Attributes already in Numberical Representation, there's <b>No Need</b> For label encoding
</div>

## Training Phase

In [179]:
print("-- Insights of Sample Data --")
selected_data.info()

-- Insights of Sample Data --
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 5 columns):
 #   Column   Non-Null Count  Dtype
---  ------   --------------  -----
 0   sex      100 non-null    int64
 1   cp       100 non-null    int64
 2   fbs      100 non-null    int64
 3   restecg  100 non-null    int64
 4   target   100 non-null    int64
dtypes: int64(5)
memory usage: 4.0 KB


#### Splitting data into train/test 

<div class="alert alert-block alert-info">
    Splitting data into train-test: Training = <b>80%</b> and Testing = <b>20%</b>.
</div>

In [181]:
# Providing random state and shuffle=True 
# because the dataset is sorted on "Target" and we want to provide variety of data for training, not same data

training_data, testing_data = train_test_split(selected_data, test_size=0.2, random_state=0, shuffle=True)

In [182]:
# Save the train and test data into CSV files
training_data.to_csv(r'training-data.csv', index = False, header = True)
testing_data.to_csv(r'testing-data.csv', index = False, header = True)

In [183]:
# Printing Training Data
print("-- Training Data --")
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(training_data)

-- Training Data --
    sex  cp  fbs  restecg  target
43    0   0    0        0       1
62    1   2    0        1       0
3     1   1    0        1       1
71    1   0    0        1       0
45    1   1    0        1       1
48    0   2    0        0       1
6     0   1    0        0       1
99    1   0    1        0       0
82    1   0    1        1       0
76    1   0    0        0       0
60    1   0    0        0       0
80    1   0    0        0       0
90    1   0    0        1       0
68    1   2    0        0       0
51    1   0    0        0       0
27    1   2    0        1       1
18    1   0    0        1       1
56    1   1    0        1       0
63    1   0    0        0       0
74    1   0    0        0       0
1     1   2    0        1       1
61    1   0    1        1       0
42    1   0    0        0       1
41    1   1    0        0       1
4     0   0    0        1       1
15    0   2    0        1       1
17    0   3    0        1       1
40    0   2    0        0   

In [184]:
# Printing Testing Data
print("-- Testing Data --")
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(testing_data)

-- Testing Data --
    sex  cp  fbs  restecg  target
26    1   2    1        1       1
86    1   0    0        0       0
2     0   1    0        0       1
55    1   2    1        0       0
75    0   0    0        1       0
93    1   2    0        1       0
16    0   2    0        1       1
73    1   2    0        1       0
54    1   0    1        0       0
95    1   2    0        0       0
53    1   0    0        0       0
92    0   0    0        0       0
78    1   0    0        0       0
13    1   3    0        0       1
7     1   1    0        1       1
30    0   1    0        1       1
22    1   0    0        1       1
24    1   3    0        1       1
33    1   2    0        0       1
8     1   2    1        1       1


#### Splitting Input Vector and Output of Training Data

<div class="alert alert-block alert-info">
    Splitting <b>Input (x)</b> Vector and <b>Output (y)</b> of Training Data.
</div>

In [185]:
train_x = training_data.iloc[:, :-1]
train_x.head()

Unnamed: 0,sex,cp,fbs,restecg
43,0,0,0,0
62,1,2,0,1
3,1,1,0,1
71,1,0,0,1
45,1,1,0,1


In [186]:
train_y = training_data.iloc[:, -1]
train_y.head()

43    1
62    0
3     1
71    0
45    1
Name: target, dtype: int64

#### Training Model Using Support Vector Classifier

In [187]:
print("-- Training using SVC on Training Data --")
print("-- Parameters & Values: ", end='')

model_svc = SVC(gamma='auto', random_state=0)
model_svc.fit(train_x, np.ravel(train_y))

print(model_svc)

-- Training using SVC on Training Data --
-- Parameters & Values: SVC(gamma='auto', random_state=0)


In [188]:
# Saving Trained Model
pickle.dump(model_svc, open('model_svc.pkl', 'wb'))

## Testing Phase

#### Splitting Input Vector and Output of Testing Data

<div class="alert alert-block alert-info">
    Splitting <b>Input (x)</b> Vector and <b>Output (y)</b> of Testing Data.
</div>

In [189]:
test_x = testing_data.iloc[:, :-1]
test_x.head()

Unnamed: 0,sex,cp,fbs,restecg
26,1,2,1,1
86,1,0,0,0
2,0,1,0,0
55,1,2,1,0
75,0,0,0,1


In [190]:
test_y = testing_data.iloc[:, -1]
test_y.head()

26    1
86    0
2     1
55    0
75    0
Name: target, dtype: int64

#### Loading Model

In [191]:
# Load saved Model
model = pickle.load(open('model_svc.pkl', 'rb'))

#### Evaluating Model

In [192]:
model_predictions = model.predict(test_x)

testing_data_prediction = testing_data.copy(deep=True)
pd.options.mode.chained_assignment = None

testing_data_prediction['Prediction'] = model_predictions

In [193]:
# Printing Testing Data
print("-- Testing Data with Prediction --")
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(testing_data_prediction)

-- Testing Data with Prediction --
    sex  cp  fbs  restecg  target  Prediction
26    1   2    1        1       1           1
86    1   0    0        0       0           0
2     0   1    0        0       1           1
55    1   2    1        0       0           1
75    0   0    0        1       0           0
93    1   2    0        1       0           1
16    0   2    0        1       1           1
73    1   2    0        1       0           1
54    1   0    1        0       0           0
95    1   2    0        0       0           1
53    1   0    0        0       0           0
92    0   0    0        0       0           0
78    1   0    0        0       0           0
13    1   3    0        0       1           1
7     1   1    0        1       1           1
30    0   1    0        1       1           1
22    1   0    0        1       1           0
24    1   3    0        1       1           1
33    1   2    0        0       1           1
8     1   2    1        1       1           1

In [194]:
# Saving Prediction into a CSV file
testing_data_prediction.to_csv(r'model_prediction.csv', index=False, header=True)

In [195]:
# Calculating Accuracy
model_accuracy_score = accuracy_score(testing_data_prediction['target'], testing_data_prediction['Prediction'])

print("-- Model Accuracy Score: ", end='')
print(round(model_accuracy_score,3))

-- Model Accuracy Score: 0.75


# Evaluating on Unseen Data (Application Phase)

<div class="alert alert-block alert-info">
    Testing on Unseen Data (realtime/userinput).
</div>

In [206]:
print("-- Take Input --")
sex_inp      = input("-- Please Enter Gender (1=Male, 0=Female) :").strip()
cp_inp      = input("-- Please Enter Chest Pain type (0, 1, 2, 3) :").strip()
fbs_inp    = input("-- Please Enter If Fasting Blood Sugar > 120 mg/dl (1 = true, 0 = false) :").strip()
restecg_inp    = input("-- Please Enter Resting Electrocardiographic Results (0 = normal, 1 = having ST-T wave abnormality, 2 = showing probable or definite left ventricular hypertrophy) :").strip()


-- Take Input --
-- Please Enter Gender (1=Male, 0=Female) :1
-- Please Enter Chest Pain type (0, 1, 2, 3) :0
-- Please Enter If Fasting Blood Sugar > 120 mg/dl (1 = true, 0 = false) :0
-- Please Enter Resting Electrocardiographic Results (0 = normal, 1 = having ST-T wave abnormality, 2 = showing probable or definite left ventricular hypertrophy) :0


In [207]:
# Convert Input into Feature Vector
user_inp = pd.DataFrame({
    'sex':   [sex_inp],
    'cp':      [cp_inp],
    'fbs':      [fbs_inp],
    'restecg':    [restecg_inp]
})

print("-- User Inputs are: ")
print(user_inp)

-- User Inputs are: 
  sex cp fbs restecg
0   1  0   0       0


In [208]:
# Loading Saved SVC Model
model = pickle.load(open('model_svc.pkl', 'rb'))

In [209]:
# Prediction on Unseen Data
predicted_Presence = model.predict(user_inp)
if(predicted_Presence == 1):
    prediction = 'You have Heart Disease'
elif(predicted_Presence == 0):
    prediction = "You don't have Heart Disease"


print("-- Prediction: ", end='')
print(prediction)

-- Prediction: You don't have Heart Disease
