# Overview

This notebook implements Disease Recommendation System based on Symptoms for Medical Dataset using Sklearn.
* Model Accuracy is 100%.
* Model is Saved as pkl file.

#### Highlights of this notebook:

1. Loading Libraries.
2. Applying One Hot Encoding Manually using Numpy and Pandas.
3. Creaing Model.
4. Testing Model Accuracy.
5. Saving Model as pkl file.

## <font size=4 color='blue'>If you find this notebook useful, leave an upvote, that motivates me to write more such notebooks.</font>

## Loading Libraries and Dataset

In [3]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

In [9]:
data = pd.read_csv('csv/Disease_Description.csv')

In [10]:
data.head()

Unnamed: 0,Disease,Description
0,Drug Reaction,An adverse drug reaction (ADR) is an injury ca...
1,Malaria,An infectious disease caused by protozoan para...
2,Allergy,An allergy is an immune system response to a f...
3,Hypothyroidism,"Hypothyroidism, also called underactive thyroi..."
4,Psoriasis,Psoriasis is a common skin disorder that forms...


## Data Preprocessing

In [None]:
data["Symptoms"] = 0

for i in range(len(data)):
    value = data.iloc[i].values.tolist()
    
    if 0 in value:
        data["Symptoms"][i] = value[1:value.index(0)]
        
    else:
        data["Symptoms"][i] = value[1:]
    

In [None]:
data.head()

In [None]:
column_values = data[['Symptom_1', 'Symptom_2', 'Symptom_3', 'Symptom_4',
       'Symptom_5', 'Symptom_6', 'Symptom_7', 'Symptom_8', 'Symptom_9',
       'Symptom_10', 'Symptom_11', 'Symptom_12', 'Symptom_13', 'Symptom_14',
       'Symptom_15', 'Symptom_16', 'Symptom_17']].values.ravel()

symps = pd.unique(column_values).tolist()
symps = [i for i in symps if str(i) != "nan"]

print(len(symps))

In [None]:
new_data = symptoms = pd.DataFrame(columns = symps ,index = data.index)

In [None]:
symptoms["Symptoms"] = data["Symptoms"]

In [None]:
for i in symps:
    symptoms[i] = symptoms.apply(lambda x:1 if i in x.Symptoms else 0, axis=1)

In [None]:
symptoms["Disease"] = data["Disease"]
symptoms = symptoms.drop("Symptoms",axis=1)

In [None]:
symptoms.head()

## Training and Testing Splitting

In [None]:
x = symptoms.drop('Disease', axis = 1)
y = symptoms.Disease

In [None]:
x_train, x_test, y_train , y_test = train_test_split(x ,y, random_state = 50)

## Creating Model and Evaluating it

In [None]:
model = RandomForestClassifier(n_estimators=200)
model.fit(x_train, y_train)
predictions = model.predict(x_test)
accuracy_score(y_test, predictions)

In [None]:
model = LogisticRegression(max_iter = 1000)
model.fit(x_train, y_train)
predictions = model.predict(x_test)
accuracy_score(y_test, predictions)

In [None]:
import pickle

pickle.dump(model, open('Model.pkl', 'wb'))

## <font size=4 color='blue'>If you find this notebook useful, leave an upvote, that motivates me to write more such notebooks.</font>