# **AI Heatlh Guard Model Building & Predictions**

# **Welcome to AI-Health Guard Research Paper**



> AI HEALTH GUARD
* Your Personalized Health Advisor. Predicts diseases, offers
tailored medical advice, workouts, and diet plans for holistic
well-being.

---

# **About Dataset**
### **Context**
---
> The data for "AI HEALTH GUARD" is from [Kaggle](https://www.kaggle.com/datasets/alokchoudhary2005/ai-health-guard), a platform for data scientists and machine learning engineers. This dataset includes 8 `csv` files

**● Symptom-severity.csv:** Describes the severity of specific symptoms.

**● Original_Dataset.csv:** The main dataset used to train the machine learning model.

**● description.csv**: Gives detailed descriptions of the health conditions.

**● diets.csv:** Provides information about which diets are appropriate for various health conditions.

**● medications.csv**: Gives details of when and how to take what kind of medication, should you need some.

**● precautions_df.csv:** Lists the different precautions that you are advised to adopt when facing various health conditions.

**● symtoms_df.csv:** Contains an exhaustive list of symptoms presented by different illnesses.

**● workout_df.csv:** Lists planned ways that are suited to an individual's specific health demands and encompasses work-outs combined with lifestyle advice for a healthy lifestyle

### **Data Analysis Insight:**
* Insights from data analysis shed light on trends, patterns and correlations
between symptoms and health conditions.

### **Recommendation Generation:**
* The recommendation generation process involves analyzing user-input symptoms
and generating personalized health recommendations.


---

## **Conclusion**
* The AI Health Guard project represents a significant endeavor in utilizing data science and machine learning techniques to empower individuals in managing their health effectively. By leveraging advanced algorithms and personalized recommendations, the system aims to enhance healthcare outcomes and promote overall well-being.

In [1]:
#import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objs as go
import plotly.express as px

import warnings
from sklearn.utils import shuffle
warnings.filterwarnings("ignore")


from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.naive_bayes import MultinomialNB

from sklearn import metrics
from sklearn.metrics import mean_squared_error, mean_absolute_error,r2_score
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.metrics import precision_recall_curve, roc_curve, auc
from sklearn.model_selection import cross_val_score, KFold, RandomizedSearchCV
from sklearn.datasets import make_classification
from collections import Counter
import pickle

In [2]:
# import data
df = pd.read_csv("/content/drive/MyDrive/Data Science My Repository/Projects/AI Health Guard Research /AI Health Guard Datasets/Symptoms-Disease Datasets/Main_Dataset.csv")
df = shuffle(df, random_state=42)
df.head()

Unnamed: 0,Disease,mucoid_sputum,blackheads,mood_swings,movement_stiffness,bladder_discomfort,swelling_joints,yellow_urine,increased_appetite,loss_of_balance,...,itching,dehydration,coma,nausea,irritability,dizziness,headache,vomiting,diarrhoea,anxiety
4984,Tularemia,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4710,Arthritis,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9413,Stroke,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
6587,Lead poisoning,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
10418,Sickle-cell anemia,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [3]:
df.shape

(10961, 606)

* Shape of dataset is `10961` rows and `606` columns.

# **Data Preprocessing**

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10961 entries, 4984 to 7270
Columns: 606 entries, Disease to anxiety
dtypes: float64(605), object(1)
memory usage: 50.8+ MB


In [5]:
# check null values
null_checker = df.apply(lambda x: sum(x.isnull())).to_frame(name='count')
print(null_checker)

                    count
Disease                 0
mucoid_sputum           0
blackheads              0
mood_swings             0
movement_stiffness      0
...                   ...
dizziness               0
headache                0
vomiting                0
diarrhoea               0
anxiety                 0

[606 rows x 1 columns]


In [6]:
df.columns = df.columns.str.strip()

In [7]:
df.columns

Index(['Disease', 'mucoid_sputum', 'blackheads', 'mood_swings',
       'movement_stiffness', 'bladder_discomfort', 'swelling_joints',
       'yellow_urine', 'increased_appetite', 'loss_of_balance',
       ...
       'itching', 'dehydration', 'coma', 'nausea', 'irritability', 'dizziness',
       'headache', 'vomiting', 'diarrhoea', 'anxiety'],
      dtype='object', length=606)

In [8]:
# Extract the unique column names (symptoms)
unique_columns = df.columns.tolist()
# Save the unique column names to a CSV file
unique_columns_df = pd.DataFrame(unique_columns, columns=['Unique Symptoms'])
unique_columns_df.to_csv('unique_symptoms.csv', index=False)

In [9]:
# Disease column ke unique values ko nikal kar ek DataFrame banayen
unique_diseases = df['Disease'].unique()
unique_diseases_df = pd.DataFrame(unique_diseases, columns=['Disease'])

# Unique diseases ko ek CSV file me save karen
unique_diseases_df.to_csv('unique_diseases.csv', index=False)

# **Model Training**

  * Splitting the data into features (X) and target variable (Y)

In [10]:
X = df.drop('Disease', axis=1)
y = df['Disease']

# ecoding Disease
le = LabelEncoder()
le.fit(y)
Y = le.transform(y)

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

In [11]:
# # Save the LabelEncoder
pickle.dump(le, open('label_encoder.pkl', 'wb'))

In [12]:
print("Shape of X_train : ", X_train.shape)
print("Shape of X_test : ", X_test.shape)
print("Shape of y_train : ", y_train.shape)
print("Shape of y_test : ", y_test.shape)

Shape of X_train :  (8768, 605)
Shape of X_test :  (2193, 605)
Shape of y_train :  (8768,)
Shape of y_test :  (2193,)


In [13]:
X_train[:2]

Unnamed: 0,mucoid_sputum,blackheads,mood_swings,movement_stiffness,bladder_discomfort,swelling_joints,yellow_urine,increased_appetite,loss_of_balance,indigestion,...,itching,dehydration,coma,nausea,irritability,dizziness,headache,vomiting,diarrhoea,anxiety
3917,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0
6915,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [14]:
# Create a dictionary to store models
models = {
    'SVC': SVC(kernel='linear'),
    'Logistic_Regression': LogisticRegression(),
    'NaiveBayes': GaussianNB(),
    'RandomForest': RandomForestClassifier(n_estimators=100, random_state=42),
    'GradientBoosting': GradientBoostingClassifier(n_estimators=100, random_state=42),
    'KNeighbors': KNeighborsClassifier(n_neighbors=5),
    'MultinomialNB': MultinomialNB()
}

In [15]:
# Dictionary to store accuracies and confusion matrices
accuracies = {}
confusion_matrices = {}

# Loop through the models, train, test, and store results
for model_name, model in models.items():
    # Train the model
    model.fit(X_train, y_train)

    # Test the model
    predictions = model.predict(X_test)

    # Calculate accuracy
    accuracy = accuracy_score(y_test, predictions)
    accuracies[model_name] = accuracy

    # Calculate confusion matrix
    cm = confusion_matrix(y_test, predictions)
    confusion_matrices[model_name] = cm

# 30m

In [16]:
# Print accuracies
for model_name, accuracy in accuracies.items():
    print(f"{model_name} Accuracy: {accuracy}")

SVC Accuracy: 0.9028727770177839
Logistic_Regression Accuracy: 0.9215686274509803
NaiveBayes Accuracy: 0.9343365253077975
RandomForest Accuracy: 0.9179206566347469
GradientBoosting Accuracy: 0.8691290469676243
KNeighbors Accuracy: 0.8910168718650251
MultinomialNB Accuracy: 0.853625170998632


> The accuracy results show that the NaiveBayes model performed the best with an accuracy of `93.43%` , followed by Logistic Regression at `92.11%` . The Gradient Boosting model had the lowest accuracy at `86.91%` , indicating that while most models performed well, some need further tuning or may not be as suitable for this dataset.

> Calculate cross-validation, precision, recall, and F1-score for each class. These metrics are useful for understanding the performance of a classification model, especially when dealing with imbalanced classes.

In [17]:
def get_metrics(y_true, predictions):
    metrics = {}
    MSE = mean_squared_error(y_true, predictions)
    RMSE = np.sqrt(MSE)
    MAE = mean_absolute_error(y_true, predictions)
    R2 = r2_score(y_true, predictions)

    metrics['MSE'] = MSE
    metrics['RMSE'] = RMSE
    metrics['MAE'] = MAE
    metrics['R2'] = R2

    return metrics
# Create an empty DataFrame to store metrics
metrics_df = pd.DataFrame(columns=['Model', 'MSE', 'RMSE', 'MAE', 'R2'])
# Iterate through each model in the dictionary
for model_name, model in models.items():
    metrics = get_metrics(y_test, predictions)
    metrics['Model'] = model_name
    metrics_df = pd.concat([metrics_df, pd.DataFrame(metrics, index=[0])], ignore_index=True)

# Print the DataFrame
print(metrics_df)

                 Model          MSE       RMSE        MAE        R2
0                  SVC  1839.001368  42.883579  13.201094  0.575237
1  Logistic_Regression  1839.001368  42.883579  13.201094  0.575237
2           NaiveBayes  1839.001368  42.883579  13.201094  0.575237
3         RandomForest  1839.001368  42.883579  13.201094  0.575237
4     GradientBoosting  1839.001368  42.883579  13.201094  0.575237
5           KNeighbors  1839.001368  42.883579  13.201094  0.575237
6        MultinomialNB  1839.001368  42.883579  13.201094  0.575237


> All models have identical MSE, RMSE, MAE, and R2 values, indicating a possible issue with the evaluation process.

In [18]:
# Print confusion matrices
for model_name, cm in confusion_matrices.items():
    print(f"{model_name} Confusion Matrix:")
    print(np.array2string(cm, separator=', '))
    print("\n" + "="*40 + "\n")

SVC Confusion Matrix:
[[1, 0, 0, ..., 0, 0, 0],
 [0, 1, 0, ..., 0, 0, 0],
 [0, 0, 5, ..., 0, 0, 0],
 ...,
 [0, 0, 0, ..., 3, 0, 0],
 [0, 0, 0, ..., 0, 2, 0],
 [0, 0, 0, ..., 0, 0, 0]]


Logistic_Regression Confusion Matrix:
[[1, 0, 0, ..., 0, 0, 0],
 [0, 1, 0, ..., 0, 0, 0],
 [0, 0, 5, ..., 0, 0, 0],
 ...,
 [0, 0, 0, ..., 2, 0, 0],
 [0, 0, 0, ..., 0, 3, 0],
 [0, 0, 0, ..., 0, 0, 4]]


NaiveBayes Confusion Matrix:
[[1, 0, 0, ..., 0, 0, 0],
 [0, 1, 0, ..., 0, 0, 0],
 [0, 0, 5, ..., 0, 0, 0],
 ...,
 [0, 0, 0, ..., 3, 0, 0],
 [0, 0, 0, ..., 0, 5, 0],
 [0, 0, 0, ..., 0, 0, 0]]


RandomForest Confusion Matrix:
[[1, 0, 0, ..., 0, 0, 0],
 [0, 1, 0, ..., 0, 0, 0],
 [0, 0, 5, ..., 0, 0, 0],
 ...,
 [0, 0, 0, ..., 3, 0, 0],
 [0, 0, 0, ..., 0, 4, 0],
 [0, 0, 0, ..., 0, 0, 0]]


GradientBoosting Confusion Matrix:
[[1, 0, 0, ..., 0, 0, 0],
 [0, 0, 0, ..., 0, 0, 0],
 [0, 0, 0, ..., 0, 0, 0],
 ...,
 [0, 0, 0, ..., 3, 0, 0],
 [0, 0, 0, ..., 0, 3, 0],
 [0, 0, 0, ..., 0, 0, 0]]


KNeighbors Confusion Matr

# Visualise Model Prediction Performance

In [19]:
# Function to plot true vs predicted values
def plot_combined_predictions(models, X_test, y_test):
    fig = go.Figure()

    for model_name, model in models.items():
        predictions = model.predict(X_test)
        fig.add_trace(go.Scatter(x=y_test, y=predictions, mode='markers', name=model_name))

    fig.add_trace(go.Scatter(x=y_test, y=y_test, mode='lines', name='Ideal Fit', line=dict(color='red', dash='dash')))
    fig.update_layout(xaxis_title='True Values', yaxis_title='Predicted Values', title='True vs Predicted Values for Models')
    fig.show()
# Usage:
plot_combined_predictions(models, X_test, y_test)

* The graph show illustrates the comparison between actual and predicted values across various models, highlighting the precision of predictive analytics.

In [20]:
# Function to plot feature importance for all models
def plot_all_feature_importance(models):
    fig = go.Figure()

    for model_name, model in models.items():
        if hasattr(model, "feature_importances_"):
            importance = model.feature_importances_
            feature_importance = pd.DataFrame({'feature': X.columns, 'importance': importance})
            feature_importance = feature_importance.sort_values(by='importance', ascending=False)
            fig.add_trace(go.Bar(x=feature_importance['feature'], y=feature_importance['importance'], name=model_name))

    fig.update_layout(title="Feature Importance for Models", xaxis_title="Feature", yaxis_title="Importance")
    fig.show()

# Plot feature importance for all models
plot_all_feature_importance(models)

* The chart illustrates the varying impact of specific features on model predictions, highlighting the importance of feature selection in model accuracy

In [21]:
# Function to plot combined ROC curve
def plot_combined_roc_curve(models, X_test, y_test):
    fig = go.Figure()

    for model_name, model in models.items():
        if hasattr(model, "predict_proba"):
            y_prob = model.predict_proba(X_test)
            if len(np.unique(y_test)) == 2:  # Binary classification
                fpr, tpr, _ = roc_curve(y_test, y_prob[:, 1])
                roc_auc = auc(fpr, tpr)
                fig.add_trace(go.Scatter(x=fpr, y=tpr, mode='lines', name=f'{model_name} (AUC = {roc_auc:.2f})'))
            else:  # Multi-class classification
                for i in range(len(np.unique(y_test))):
                    fpr, tpr, _ = roc_curve(y_test == i, y_prob[:, i])
                    roc_auc = auc(fpr, tpr)
                    fig.add_trace(go.Scatter(x=fpr, y=tpr, mode='lines', name=f'{model_name} - Class {i} (AUC = {roc_auc:.2f})'))

    fig.update_layout(title='Models ROC Curves', xaxis_title='False Positive Rate', yaxis_title='True Positive Rate',)
    fig.add_shape(type='line', line=dict(dash='dash'), x0=0, x1=1, y0=0, y1=1)
    fig.show()

# Plot combined ROC curves for all models
plot_combined_roc_curve(models, X_test, y_test)

Output hidden; open in https://colab.research.google.com to view.

* The plot show a ROC curve analysis, comparing the true positive and false positive rates of classification models, essential for evaluating their predictive performance.

In [22]:
# Function to plot combined Precision-Recall curve
def plot_precision_recall_curve(models, X_test, y_test):
    fig = go.Figure()

    for model_name, model in models.items():
        if hasattr(model, "predict_proba"):
            y_prob = model.predict_proba(X_test)
            if len(np.unique(y_test)) == 2:  # Binary classification
                precision, recall, _ = precision_recall_curve(y_test, y_prob[:, 1])
                fig.add_trace(go.Scatter(x=recall, y=precision, mode='lines', name=f'{model_name}'))
            else:  # Multi-class classification
                for i in range(len(np.unique(y_test))):
                    precision, recall, _ = precision_recall_curve(y_test == i, y_prob[:, i])
                    fig.add_trace(go.Scatter(x=recall, y=precision, mode='lines', name=f'{model_name} - Class {i}'))

    fig.update_layout(title='Precision-Recall Curves', xaxis_title='Recall', yaxis_title='Precision',)
    fig.show()

# Plot combined Precision-Recall curves for all models
plot_precision_recall_curve(models, X_test, y_test)

Output hidden; open in https://colab.research.google.com to view.

* The plot show a precision-recall analysis, a critical evaluation metric for classification models, showcasing their ability to identify relevant data points accurately.

In [23]:
# Calculate precision, recall, and F1-score for each class
precision = precision_score(y_test, predictions, average=None)
recall = recall_score(y_test, predictions, average=None)
f1 = f1_score(y_test, predictions, average=None)

# Make sure the number of classes matches the length of metrics
num_classes = len(precision)  # or len(recall) or len(f1), they should be the same
class_labels = [f"Class {i}" for i in range(num_classes)]

# Create a DataFrame to display the metrics
metrics_df = pd.DataFrame({'Class': class_labels,  # Use generated class labels
                           'Precision': precision,
                           'Recall': recall,
                           'F1-score': f1})

print(metrics_df)

         Class  Precision    Recall  F1-score
0      Class 0        1.0  1.000000  1.000000
1      Class 1        0.0  0.000000  0.000000
2      Class 2        1.0  0.500000  0.666667
3      Class 3        0.0  0.000000  0.000000
4      Class 4        1.0  1.000000  1.000000
..         ...        ...       ...       ...
218  Class 218        0.0  0.000000  0.000000
219  Class 219        1.0  0.142857  0.250000
220  Class 220        1.0  0.500000  0.666667
221  Class 221        0.0  0.000000  0.000000
222  Class 222        1.0  0.333333  0.500000

[223 rows x 4 columns]


> The precision, recall, and F1-score results indicate significant performance variability across classes, with some classes performing perfectly.

> Cross-validation is a robust technique to evaluate a model's performance by splitting the data into multiple train-test sets. Here's perform k-fold cross-validation:

In [24]:
# Define the number of folds for cross-validation
k_folds = 5

# Initialize KFold cross-validation
kf = KFold(n_splits=k_folds, shuffle=True, random_state=42)

# Convert feature names to strings
X.columns = X.columns.astype(str)

# Perform cross-validation
cv_accuracy = cross_val_score(model, X, Y, cv=kf, scoring='accuracy')

# Print the accuracy for each fold
print("Cross-Validation Accuracy for each fold:")
for i, accuracy in enumerate(cv_accuracy, 1):
    print(f"Fold {i}: {accuracy}")

# Calculate the mean and standard deviation of the cross-validation accuracy
mean_cv_accuracy = cv_accuracy.mean()
std_cv_accuracy = cv_accuracy.std()
print(f"\nMean Cross-Validation Accuracy: {mean_cv_accuracy}")
print(f"Standard Deviation of Cross-Validation Accuracy: {std_cv_accuracy}")

Cross-Validation Accuracy for each fold:
Fold 1: 0.853625170998632
Fold 2: 0.8553832116788321
Fold 3: 0.8458029197080292
Fold 4: 0.8499087591240876
Fold 5: 0.8572080291970803

Mean Cross-Validation Accuracy: 0.8523856181413322
Standard Deviation of Cross-Validation Accuracy: 0.0040803052814175084


> The cross-validation accuracies are consistent across the five folds, with a mean accuracy of 85.24% and a low standard deviation of 0.41%. This indicates that the model's performance is stable and reliable across different subsets of the data.

In [25]:
# selecting NaiveBayes
NaiveBayes = GaussianNB()
NaiveBayes.fit(X_train,y_train)
ypred = NaiveBayes.predict(X_test)
accuracy_score(y_test,ypred)

0.9343365253077975

In [26]:
# test 1:
print("predicted disease :",NaiveBayes.predict(X_test.iloc[0].values.reshape(1,-1)))
print("Actual Disease :", y_test[0])

predicted disease : [130]
Actual Disease : 130


In [27]:
# save NaiveBayes
pickle.dump(NaiveBayes,open('NaiveBayes.pkl','wb'))

In [28]:
# Load trained model
NaiveBayes = pickle.load(open('NaiveBayes.pkl', 'rb'))

# Recommendation System and Prediction
## Load database and use logic for recommendations

In [29]:
# Load additional datasets
precautions = pd.read_csv("/content/drive/MyDrive/Data Science My Repository/Projects/AI Health Guard Research /AI Health Guard Datasets/Symptoms-Disease Datasets/precautions_df.csv")
workout = pd.read_csv("/content/drive/MyDrive/Data Science My Repository/Projects/AI Health Guard Research /AI Health Guard Datasets/Symptoms-Disease Datasets/workout_df.csv")
description = pd.read_csv("/content/drive/MyDrive/Data Science My Repository/Projects/AI Health Guard Research /AI Health Guard Datasets/Symptoms-Disease Datasets/description.csv", encoding='latin-1')
medications = pd.read_csv('/content/drive/MyDrive/Data Science My Repository/Projects/AI Health Guard Research /AI Health Guard Datasets/Symptoms-Disease Datasets/medications.csv')
diets = pd.read_csv("/content/drive/MyDrive/Data Science My Repository/Projects/AI Health Guard Research /AI Health Guard Datasets/Symptoms-Disease Datasets/diets.csv")

# Normalize column names and data to handle inconsistencies
workout.rename(columns={'disease': 'Disease'}, inplace=True)

In [30]:
# Strip whitespace and convert to lower case for consistency
def normalize_column(df, column_name):
    df[column_name] = df[column_name].str.strip().str.lower()

for df in [description, precautions, medications, workout, diets]:
    normalize_column(df, 'Disease')

# Normalize user symptoms
def normalize_symptoms(symptoms):
    return [symptom.strip().lower() for symptom in symptoms]

In [31]:
# Function to predict disease based on symptoms
def predict_disease(symptoms):
    # Create a DataFrame for the symptoms
    input_data = pd.DataFrame(columns=X_train.columns)
    input_data.loc[0] = 0
    for symptom in symptoms:
        if symptom in input_data.columns:
            input_data[symptom] = 1

    # Predict the disease
    predicted_disease = NaiveBayes.predict(input_data)
    disease_name = le.inverse_transform(predicted_disease)[0].strip().lower()

    return disease_name

# Function to provide recommendations based on predicted disease
def provide_recommendations(disease):
    print(f"\n=================predicted disease============")
    print(disease.capitalize())

    # Fetch and display related information
    print("\n=================description==================")
    if disease in description['Disease'].values:
        print(description[description['Disease'] == disease]['Description'].values[0])
    else:
        print('No description available')

    print("\n=================precautions==================")
    if disease in precautions['Disease'].values:
        # Access all precaution columns dynamically
        precaution_columns = [col for col in precautions.columns if 'Precaution_' in col]
        precautions_list = precautions[precautions['Disease'] == disease][precaution_columns].values[0]
        for idx, precaution in enumerate(precautions_list, 1):
            if pd.notna(precaution):
                print(f"{idx} : {precaution}")
    else:
        print('No precautions available')


    print("\n=================medications==================")
    if disease in medications['Disease'].values:
        medication_columns = [col for col in medications.columns if 'Medication_' in col]
        medication_list = medications[medications['Disease'] == disease][medication_columns].values[0]
        for idx, medication in enumerate(medication_list, 1):
            if pd.notna(medication):
                print(f"{idx} : {medication}")
    else:
        print('No medications available')

    print("\n=================workout==================")
    if disease in workout['Disease'].values:
        workout_columns = [col for col in workout.columns if 'workout_' in col]
        workouts_list = workout[workout['Disease'] == disease][workout_columns].values[0]
        for idx, workout_item in enumerate(workouts_list, 1):
          if pd.notna(medication):
              print(f"{idx} : {workout_item}")
    else:
        print('No workout information available')

    print("\n=================diets==================")
    if disease in diets['Disease'].values:
        diets_columns = [col for col in diets.columns if 'Diet_' in col]
        diets_list = diets[diets['Disease'] == disease][diets_columns].values[0]
        for idx, diet in enumerate(diets_list, 1):
            if pd.notna(diet):
                print(f"{idx} : {diet}")
    else:
        print('No diet information available')

In [32]:
# Get user input for symptoms
user_symptoms = ['fever', 'loss appetite', 'tiredness', 'headache', 'small', 'itchy blister']

# Predict the disease
predicted_disease = predict_disease(user_symptoms)

# Provide recommendations
provide_recommendations(predicted_disease)


Chickenpox

Chickenpox is a highly contagious viral infection causing an itchy, blister-like rash.

1 :  Get vaccinated (Varicella vaccine)
2 :  Avoid contact with infected individuals
3 :  Keep nails short to prevent scratching
4 :  Use antihistamines and calamine lotion for itch relief

1 :  Antiviral drugs
2 :  Pain relievers
3 :  Antihistamines
4 :  Calamine lotion
5 :  Antipyretics

1 : Stay hydrated
2 : Rest and conserve energy
3 : Use calamine lotion for itching
4 : Monitor for symptoms
5 : Practice good hygiene

1 : Comfort measures
2 : Hydration
3 : Antihistamines
4 : Calamine lotion
5 : Avoid scratching


In [33]:
# Get user input for symptoms
user_symptoms = ['headache', 'nausea', 'fatigue']

# Predict the disease
predicted_disease = predict_disease(user_symptoms)

# Provide recommendations
provide_recommendations(predicted_disease)


Dehydration

Dehydration occurs when the body loses more fluids than it takes in.

1 :  Drink plenty of fluids
2 :  Especially in hot weather or during illness
3 :  Eat foods high in water content
4 :  Seek medical attention for severe dehydration

1 :  Oral rehydration solution
2 :  IV fluids
3 :  Electrolyte replacement
4 :  Antipyretics
5 :  Antiemetics

1 : Stay hydrated
2 : Monitor for symptoms
3 : Consult a healthcare professional
4 : Engage in regular physical activity as tolerated
5 : Follow a balanced diet

1 : Oral rehydration solution
2 : Hydration
3 : Electrolyte-rich fluids
4 : Rest
5 : Balanced diet


In [34]:
# Get user input for symptoms
user_symptoms = ['chills', 'knee_pain', 'acidity']

# Predict the disease
predicted_disease = predict_disease(user_symptoms)

# Provide recommendations
provide_recommendations(predicted_disease)


Allergy

Allergy is an immune system reaction to a substance in the environment.

1 : apply calamine
2 : cover area with bandage
4 : use ice to compress itching

1 :  Antihistamines
2 :  Decongestants
3 :  Epinephrine
4 :  Corticosteroids
5 :  Immunotherapy

1 : Avoid allergenic foods
2 : Consume anti-inflammatory foods
3 : Include omega-3 fatty acids
4 : Stay hydrated
5 : Eat foods rich in vitamin C

1 : Elimination Diet
2 : Omega-3-rich foods
3 : Vitamin C-rich foods
4 : Quercetin-rich foods
5 : Probiotics


In [35]:
# Get user input for symptoms
user_symptoms = ['itching', 'skin rash', 'nodal skin eruptions']

# Predict the disease
predicted_disease = predict_disease(user_symptoms)

# Provide recommendations
provide_recommendations(predicted_disease)


Lymphoma

Lymphoma is a type of cancer that affects the lymphatic system.

1 :  Follow prescribed treatment plan (chemotherapy
2 :  radiation)
3 :  Manage side effects of treatment
4 :  Attend regular medical check-ups
5 :  Seek support from oncologists and support groups

1 :  Chemotherapy
2 :  Radiation therapy
3 :  Immunotherapy
4 :  Targeted therapy
5 :  Stem cell transplant

1 : Follow a treatment plan
2 : Engage in light physical activity
3 : Consume a balanced diet
4 : Stay hydrated
5 : Monitor for symptoms

1 : Balanced diet
2 : Hydration
3 : Nutritional supplements
4 : High-calorie foods
5 : Small


In [36]:
# Get user input for symptoms
user_symptoms = ['muscle_wasting', 'patches_in_throat', 'high_fever', 'extra_marital_contacts']

# Predict the disease
predicted_disease = predict_disease(user_symptoms)

# Provide recommendations
provide_recommendations(predicted_disease)


Aids

AIDS (Acquired Immunodeficiency Syndrome) is a disease caused by HIV that weakens the immune system.

1 : avoid open cuts
2 : wear ppe if possible
3 : consult doctor
4 : follow up

1 :  Antiretroviral drugs
2 :  Protease inhibitors
3 :  Integrase inhibitors
4 :  Entry inhibitors
5 :  Fusion inhibitors

1 : Follow a balanced and nutritious diet
2 : Include lean proteins
3 : Consume nutrient-rich foods
4 : Stay hydrated
5 : Include healthy fats

1 : Balanced Diet
2 : Protein-rich foods
3 : Fruits and vegetables
4 : Whole grains
5 : Healthy fats


# **Thanks**