# Multiple Disease Prediction

This Machine Learning model uses [SVC(Support Vector Classification)](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) to classsify and predict people with diseases like Diabetes, Heart Diseases, and Parkinson's. I have used datasets from kaggle openly available to train my model for which I would like to give due credit.
- [Diabetes Dataset](https://www.kaggle.com/datasets/mathchi/diabetes-data-set)
- [Heart Disease Dataset](https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset)
- [Parkinson's Dataset](https://www.kaggle.com/datasets/vikasukani/parkinsons-disease-data-set)

In [1]:
from timeit import default_timer as timer
import numpy as np
import pandas as pd
from sklearnex import patch_sklearn, unpatch_sklearn
import joblib
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score

### Loading the dataset

In [2]:
diabetes_dataset = pd.read_csv("./multiple_disease_dataset/diabetes.csv")

In [3]:
diabetes_dataset.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [4]:
diabetes_dataset.shape

(768, 9)

In [5]:
diabetes_dataset_y = diabetes_dataset["Outcome"]
diabetes_dataset = diabetes_dataset.drop(columns=["Outcome"], axis=1)

### Splitting the dataset into train and test data

In [6]:
diabetes_train, diabetes_test, diabetes_train_y, diabetes_test_y = train_test_split(diabetes_dataset,diabetes_dataset_y, test_size = 0.2, stratify=diabetes_dataset_y, random_state=2)

### Using Scikit Learn to train the Model

In [7]:
diabetes_classifier = svm.SVC(kernel='linear')

## Why Intel(R) Extension?

With Intel(R) Extension for Scikit-learn you can accelerate your Scikit-learn applications and still have full conformance with all Scikit-Learn APIs and algorithms. This is a free software AI accelerator that brings over 10-100X acceleration across a variety of applications. And you do not even need to change the existing code!

Intel(R) Extension for Scikit-learn offers you a way to accelerate existing scikit-learn code. The acceleration is achieved through patching: replacing the stock scikit-learn algorithms with their optimized versions provided by the extension.

![](scikit-learn-acceleration.png)

In [8]:
patch_sklearn()

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


In [9]:
start = timer()
diabetes_classifier.fit(diabetes_train, diabetes_train_y)
train_patched = timer() - start
f"Intel® extension for Scikit-learn time: {train_patched:.2f} s"

'Intel® extension for Scikit-learn time: 1.01 s'

In [10]:
unpatch_sklearn()

In [11]:
start = timer()
diabetes_classifier.fit(diabetes_train, diabetes_train_y)
train_unpatched = timer() - start
f"Original Scikit-learn time: {train_unpatched:.2f} s"

'Original Scikit-learn time: 1.01 s'

### Predicting on trained data

In [12]:
diabetes_train_prediction = diabetes_classifier.predict(diabetes_train)
diabetes_training_data_accuracy = accuracy_score(diabetes_train_prediction, diabetes_train_y)

In [13]:
print('Accuracy score of the training data : ', diabetes_training_data_accuracy)

Accuracy score of the training data :  0.7833876221498371


### Predicting on test data

In [14]:
diabetes_test_prediction = diabetes_classifier.predict(diabetes_test)
diabetes_testing_data_accuracy = accuracy_score(diabetes_test_prediction, diabetes_test_y)

In [15]:
print('Accuracy score of the testing data : ', diabetes_testing_data_accuracy)

Accuracy score of the testing data :  0.7727272727272727


### Predicting on new data

In [16]:
diabetes_input_data = (5,166,72,19,175,25.8,0.587,51)

diabetes_input_data_as_numpy_array = np.asarray(diabetes_input_data)

diabetes_input_data_reshaped = diabetes_input_data_as_numpy_array.reshape(1,-1)

diabetes_prediction = diabetes_classifier.predict(diabetes_input_data_reshaped)
print(diabetes_prediction)

if (diabetes_prediction[0] == 0):
  print('The person is not diabetic')
else:
  print('The person is diabetic')

[1]
The person is diabetic




### Saving the model

In [17]:
joblib.dump(diabetes_classifier, "diabetes_classifier.sav")

['diabetes_classifier.sav']

### Loading the dataset

In [18]:
heart_dataset = pd.read_csv('./multiple_disease_dataset/heart.csv')

In [19]:
heart_dataset.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [20]:
heart_dataset.shape

(303, 14)

In [21]:
heart_dataset_y = heart_dataset["target"]
heart_dataset = heart_dataset.drop(columns=["target"], axis=1)

### Splitting the dataset into train and test data

In [22]:
heart_train, heart_test, heart_train_y, heart_test_y = train_test_split(heart_dataset,heart_dataset_y, test_size = 0.2, stratify=heart_dataset_y, random_state=2)

### Using Scikit Learn to train the Model

In [23]:
heart_classifier = svm.SVC(kernel='linear')

In [24]:
patch_sklearn()

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


In [25]:
start = timer()
heart_classifier.fit(heart_train, heart_train_y)
train_patched = timer() - start
f"Intel® extension for Scikit-learn time: {train_patched:.2f} s"

'Intel® extension for Scikit-learn time: 0.11 s'

In [26]:
unpatch_sklearn()

In [27]:
start = timer()
heart_classifier.fit(heart_train, heart_train_y)
train_unpatched = timer() - start
f"Original Scikit-learn time: {train_unpatched:.2f} s"

'Original Scikit-learn time: 0.12 s'

### Predicting on trained data

In [28]:
heart_train_prediction = heart_classifier.predict(heart_train)
heart_training_data_accuracy = accuracy_score(heart_train_prediction, heart_train_y)

In [29]:
print('Accuracy score of the training data : ', heart_training_data_accuracy)

Accuracy score of the training data :  0.8553719008264463


### Predicting on test data

In [30]:
heart_test_prediction = heart_classifier.predict(heart_test)
heart_testing_data_accuracy = accuracy_score(heart_test_prediction, heart_test_y)

In [31]:
print('Accuracy score of the testing data : ', heart_testing_data_accuracy)

Accuracy score of the testing data :  0.819672131147541


### Predicting on new data

In [32]:
heart_input_data = (62,0,0,140,268,0,0,160,0,3.6,0,2,2)

heart_input_data_as_numpy_array= np.asarray(heart_input_data)

heart_input_data_reshaped = heart_input_data_as_numpy_array.reshape(1,-1)

heart_prediction = heart_classifier.predict(heart_input_data_reshaped)
print(heart_prediction)

if (heart_prediction[0]== 0):
  print('The Person does not have a Heart Disease')
else:
  print('The Person has Heart Disease')

[0]
The Person does not have a Heart Disease




### Saving the model

In [33]:
joblib.dump(heart_classifier, "heart_classifier.sav")

['heart_classifier.sav']

### Loading the dataset

In [34]:
parkinsons_dataset = pd.read_csv('./multiple_disease_dataset/parkinsons.csv')

In [35]:
parkinsons_dataset.head()

Unnamed: 0,name,MDVP:Fo(Hz),MDVP:Fhi(Hz),MDVP:Flo(Hz),MDVP:Jitter(%),MDVP:Jitter(Abs),MDVP:RAP,MDVP:PPQ,Jitter:DDP,MDVP:Shimmer,...,Shimmer:DDA,NHR,HNR,status,RPDE,DFA,spread1,spread2,D2,PPE
0,phon_R01_S01_1,119.992,157.302,74.997,0.00784,7e-05,0.0037,0.00554,0.01109,0.04374,...,0.06545,0.02211,21.033,1,0.414783,0.815285,-4.813031,0.266482,2.301442,0.284654
1,phon_R01_S01_2,122.4,148.65,113.819,0.00968,8e-05,0.00465,0.00696,0.01394,0.06134,...,0.09403,0.01929,19.085,1,0.458359,0.819521,-4.075192,0.33559,2.486855,0.368674
2,phon_R01_S01_3,116.682,131.111,111.555,0.0105,9e-05,0.00544,0.00781,0.01633,0.05233,...,0.0827,0.01309,20.651,1,0.429895,0.825288,-4.443179,0.311173,2.342259,0.332634
3,phon_R01_S01_4,116.676,137.871,111.366,0.00997,9e-05,0.00502,0.00698,0.01505,0.05492,...,0.08771,0.01353,20.644,1,0.434969,0.819235,-4.117501,0.334147,2.405554,0.368975
4,phon_R01_S01_5,116.014,141.781,110.655,0.01284,0.00011,0.00655,0.00908,0.01966,0.06425,...,0.1047,0.01767,19.649,1,0.417356,0.823484,-3.747787,0.234513,2.33218,0.410335


In [36]:
parkinsons_dataset.shape

(195, 24)

In [37]:
parkinsons_dataset_y = parkinsons_dataset["status"]
parkinsons_dataset= parkinsons_dataset.drop(columns=["name", "status"], axis=1)

### Splitting the dataset into train and test data

In [38]:
parkinsons_train, parkinsons_test, parkinsons_train_y, parkinsons_test_y = train_test_split(parkinsons_dataset,parkinsons_dataset_y, test_size = 0.2, stratify=parkinsons_dataset_y, random_state=2)

### Using Scikit Learn to train the Model

In [39]:
parkinsons_classifier = svm.SVC(kernel='linear')

In [40]:
patch_sklearn()

Intel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)


In [41]:
start = timer()
parkinsons_classifier.fit(parkinsons_train, parkinsons_train_y)
train_patched = timer() - start
f"Intel® extension for Scikit-learn time: {train_patched:.2f} s"

'Intel® extension for Scikit-learn time: 0.06 s'

In [42]:
unpatch_sklearn()

In [43]:
start = timer()
parkinsons_classifier.fit(parkinsons_train, parkinsons_train_y)
train_unpatched = timer() - start
f"Original Scikit-learn time: {train_unpatched:.2f} s"

'Original Scikit-learn time: 0.06 s'

In [44]:
parkinsons_train_prediction = parkinsons_classifier.predict(parkinsons_train)
parkinsons_training_data_accuracy = accuracy_score(parkinsons_train_prediction, parkinsons_train_y)

### Predicting on trained data

In [45]:
print('Accuracy score of the training data : ', parkinsons_training_data_accuracy)

Accuracy score of the training data :  0.8653846153846154


### Predicting on test data

In [46]:
parkinsons_test_prediction = parkinsons_classifier.predict(parkinsons_test)
parkinsons_testing_data_accuracy = accuracy_score(parkinsons_test_prediction, parkinsons_test_y)

In [47]:
print('Accuracy score of the testing data : ', parkinsons_testing_data_accuracy)

Accuracy score of the testing data :  0.8461538461538461


### Predicting on new data

In [48]:
parkinsons_input_data = (197.07600,206.89600,192.05500,0.00289,0.00001,0.00166,0.00168,0.00498,0.01098,0.09700,0.00563,0.00680,0.00802,0.01689,0.00339,26.77500,0.422229,0.741367,-7.348300,0.177551,1.743867,0.085569)

parkinsons_input_data_as_numpy_array = np.asarray(parkinsons_input_data)

parkinsons_input_data_reshaped = parkinsons_input_data_as_numpy_array.reshape(1,-1)

parkinsons_prediction = parkinsons_classifier.predict(parkinsons_input_data_reshaped)
print(parkinsons_prediction)


if (parkinsons_prediction[0] == 0):
  print("The Person does not have Parkinsons Disease")

else:
  print("The Person has Parkinsons")

[0]
The Person does not have Parkinsons Disease




### Saving the model

In [49]:
joblib.dump(parkinsons_classifier, "parkinsons_classifier.sav")

['parkinsons_classifier.sav']