# **AI Heatlh Guard Visualization On Original Dataset**

# **Welcome to AI-Health Guard Research Paper**



> AI HEALTH GUARD
* Your Personalized Health Advisor. Predicts diseases, offers
tailored medical advice, workouts, and diet plans for holistic
well-being.

---

# **About Dataset**
### **Context**
---
> The data for "AI HEALTH GUARD" is from [Kaggle](https://www.kaggle.com/datasets/alokchoudhary2005/ai-health-guard), a platform for data scientists and machine learning engineers. This dataset includes 8 `csv` files

**● Symptom-severity.csv:** Describes the severity of specific symptoms.

**● Original_Dataset.csv:** The main dataset used to train the machine learning model.

**● description.csv**: Gives detailed descriptions of the health conditions.

**● diets.csv:** Provides information about which diets are appropriate for various health conditions.

**● medications.csv**: Gives details of when and how to take what kind of medication, should you need some.

**● precautions_df.csv:** Lists the different precautions that you are advised to adopt when facing various health conditions.

**● symtoms_df.csv:** Contains an exhaustive list of symptoms presented by different illnesses.

**● workout_df.csv:** Lists planned ways that are suited to an individual's specific health demands and encompasses work-outs combined with lifestyle advice for a healthy lifestyle

### **Data Analysis Insight:**
* Insights from data analysis shed light on trends, patterns and correlations
between symptoms and health conditions.

### **Recommendation Generation:**
* The recommendation generation process involves analyzing user-input symptoms
and generating personalized health recommendations.


---

## **Conclusion**
* The AI Health Guard project represents a significant endeavor in utilizing data science and machine learning techniques to empower individuals in managing their health effectively. By leveraging advanced algorithms and personalized recommendations, the system aims to enhance healthcare outcomes and promote overall well-being.

In [None]:
# import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from wordcloud import WordCloud
import networkx as nx

import warnings
from sklearn.utils import shuffle
warnings.filterwarnings("ignore")

In [None]:
# import data
df = pd.read_csv('/content/drive/MyDrive/Data Science My Repository/Projects/AI Health Guard Research /AI Health Guard Datasets/Symptoms-Disease Datasets/Original_Dataset.csv')
df = shuffle(df, random_state=42)
df.head()

Unnamed: 0,Disease,Symptom_1,Symptom_2,Symptom_3,Symptom_4,Symptom_5,Symptom_6,Symptom_7,Symptom_8,Symptom_9,Symptom_10,Symptom_11,Symptom_12,Symptom_13,Symptom_14,Symptom_15,Symptom_16,Symptom_17
373,Acne,skin_rash,blackheads,scurring,,,,,,,,,,,,,,
4916,Acne,skin_rash,pus_filled_pimples,blackheads,scurring,,,,,,,,,,,,,
1550,Hyperthyroidism,fatigue,mood_swings,weight_loss,restlessness,sweating,diarrhoea,fast_heart_rate,excessive_hunger,muscle_weakness,irritability,abnormal_menstruation,,,,,,
3081,AIDS,muscle_wasting,patches_in_throat,high_fever,extra_marital_contacts,,,,,,,,,,,,,
3857,Chronic cholestasis,itching,vomiting,yellowish_skin,nausea,loss_of_appetite,abdominal_pain,yellowing_of_eyes,,,,,,,,,,


In [None]:
df.shape

(4920, 18)

* Current shape of dataset is `4920` rows and `18` columns.

In [None]:
# charactieristics of data
df.describe()

Unnamed: 0,Disease,Symptom_1,Symptom_2,Symptom_3,Symptom_4,Symptom_5,Symptom_6,Symptom_7,Symptom_8,Symptom_9,Symptom_10,Symptom_11,Symptom_12,Symptom_13,Symptom_14,Symptom_15,Symptom_16,Symptom_17
count,4920,4920,4920,4920,4572,3714,2934,2268,1944,1692,1512,1194,744,504,306,240,192,72
unique,41,34,48,54,50,38,32,26,21,22,21,18,11,8,4,3,3,1
top,Acne,vomiting,vomiting,fatigue,high_fever,headache,nausea,abdominal_pain,abdominal_pain,yellowing_of_eyes,yellowing_of_eyes,irritability,malaise,stomach_bleeding,chest_pain,chest_pain,loss_of_smell,muscle_pain
freq,120,822,870,726,378,348,390,264,276,228,198,120,126,72,96,144,72,72


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4920 entries, 373 to 860
Data columns (total 18 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Disease     4920 non-null   object
 1   Symptom_1   4920 non-null   object
 2   Symptom_2   4920 non-null   object
 3   Symptom_3   4920 non-null   object
 4   Symptom_4   4572 non-null   object
 5   Symptom_5   3714 non-null   object
 6   Symptom_6   2934 non-null   object
 7   Symptom_7   2268 non-null   object
 8   Symptom_8   1944 non-null   object
 9   Symptom_9   1692 non-null   object
 10  Symptom_10  1512 non-null   object
 11  Symptom_11  1194 non-null   object
 12  Symptom_12  744 non-null    object
 13  Symptom_13  504 non-null    object
 14  Symptom_14  306 non-null    object
 15  Symptom_15  240 non-null    object
 16  Symptom_16  192 non-null    object
 17  Symptom_17  72 non-null     object
dtypes: object(18)
memory usage: 730.3+ KB


In [None]:
# check null values
null_checker = df.apply(lambda x: sum(x.isnull())).to_frame(name='count')
print(null_checker)

            count
Disease         0
Symptom_1       0
Symptom_2       0
Symptom_3       0
Symptom_4     348
Symptom_5    1206
Symptom_6    1986
Symptom_7    2652
Symptom_8    2976
Symptom_9    3228
Symptom_10   3408
Symptom_11   3726
Symptom_12   4176
Symptom_13   4416
Symptom_14   4614
Symptom_15   4680
Symptom_16   4728
Symptom_17   4848


In [None]:
# Assuming 'null_checker' is your DataFrame with the null values count
fig = px.line(null_checker, x=null_checker.index, y='count', title='Ratio of Null Values')
fig.update_layout(xaxis_title='Column Names', yaxis_title='Count of Null Values')
fig.update_xaxes(tickangle=45)
fig.add_annotation(xref="paper", yref="paper", showarrow=False, x=0.5, y=-0.2)
fig.show()

* **Ratio of Null value** graph represent after `Symptom_3` it has lots of null values.

In [None]:
df.head()

Unnamed: 0,Disease,Symptom_1,Symptom_2,Symptom_3,Symptom_4,Symptom_5,Symptom_6,Symptom_7,Symptom_8,Symptom_9,Symptom_10,Symptom_11,Symptom_12,Symptom_13,Symptom_14,Symptom_15,Symptom_16,Symptom_17
373,Acne,skin_rash,blackheads,scurring,,,,,,,,,,,,,,
4916,Acne,skin_rash,pus_filled_pimples,blackheads,scurring,,,,,,,,,,,,,
1550,Hyperthyroidism,fatigue,mood_swings,weight_loss,restlessness,sweating,diarrhoea,fast_heart_rate,excessive_hunger,muscle_weakness,irritability,abnormal_menstruation,,,,,,
3081,AIDS,muscle_wasting,patches_in_throat,high_fever,extra_marital_contacts,,,,,,,,,,,,,
3857,Chronic cholestasis,itching,vomiting,yellowish_skin,nausea,loss_of_appetite,abdominal_pain,yellowing_of_eyes,,,,,,,,,,


In [None]:
# Count the number of occurrences of each disease
disease_counts = df['Disease'].value_counts()
disease_counts

Disease
Acne                                       120
Pneumonia                                  120
Gastroenteritis                            120
Varicose veins                             120
Jaundice                                   120
Drug Reaction                              120
(vertigo) Paroymsal  Positional Vertigo    120
Heart attack                               120
Tuberculosis                               120
Typhoid                                    120
Common Cold                                120
Peptic ulcer diseae                        120
Paralysis (brain hemorrhage)               120
Fungal infection                           120
Impetigo                                   120
GERD                                       120
Dengue                                     120
Malaria                                    120
Chicken pox                                120
Hypothyroidism                             120
Hepatitis C                                120
Hyper

In [None]:
# Disease Frequency Distribution
# Plotting the frequency of each disease
fig = px.bar(disease_counts, x=disease_counts.index, y=disease_counts.values,
             title='Frequency of Each Disease', color=disease_counts.index)
fig.update_layout(xaxis_title='Disease', yaxis_title='Count',
                  xaxis={'categoryorder':'total descending'})
fig.show()

* **Frequency of Each Disease** plots represent approx all disease has same count frequency.

In [None]:
# Heatmap of Symptoms by Disease
# Prepare data for heatmap
symptom_columns = df.columns[1:]
disease_symptom_matrix = df.melt(id_vars=['Disease'], value_vars=symptom_columns, var_name='Symptom', value_name='Presence')
disease_symptom_matrix['Presence'] = disease_symptom_matrix['Presence'].notnull().astype(int)

# Create a pivot table
pivot_table = disease_symptom_matrix.pivot_table(index='Disease', columns='Symptom', values='Presence', aggfunc=np.sum, fill_value=0)

trace = go.Heatmap(z=pivot_table.values, x=pivot_table.columns.tolist(), y=pivot_table.index.tolist(), colorscale='RdBu', colorbar=dict(title='Presence of Symptom'))

layout = go.Layout(title='Heatmap of Symptoms by Disease', xaxis=dict(title='Symptom', ticks='', side='top'), yaxis=dict(title='Disease', ticks='', ticksuffix=' '),
    margin=dict(l=70, b=50, t=90, r=50), hovermode='closest',
    plot_bgcolor='rgba(240,240,240,1)', # Neutral grey background
    paper_bgcolor='rgba(240,240,240,1)' )

fig = go.Figure(data=[trace], layout=layout)
fig.show()

* **Heatmap of Symptoms by Disease** representation :-

    * `Red color area` represent it has `zero` Symptoms.
    * `White color area` represent it has `55 to 70` counts of disease.
    * `Blue color area` represent it has `90 to 120` counts of disease.

In [None]:
# Pairplot for Symptoms
# Select a subset of data to visualize relationships between symptoms
subset_symptoms = ['Symptom_1', 'Symptom_2', 'Symptom_3', 'Symptom_4', 'Symptom_5']  # Select relevant symptoms for pairplot
subset_df = df[['Disease'] + subset_symptoms].dropna()

# Convert symptoms to binary presence/absence for the pairplot
for symptom in subset_symptoms:
    subset_df[symptom] = subset_df[symptom].notnull().astype(int)

fig = px.scatter_matrix(subset_df, dimensions=subset_symptoms, color='Disease', title='Representation of the Presence or Absence of Symptoms in Relation to different diseases.',
                        labels={col: col.replace('_', ' ') for col in subset_symptoms})  # Replace underscores with spaces for better readability

fig.update_traces(diagonal_visible=False)
fig.show()

* The plot is showing relationships between five symptoms across different diseases. Here are some key points:
    * **Axes:** Each axis represents one of the five selected symptoms (Symptom 1 to Symptom 5).

    * **Points:** Each point in the scatter matrix represents a data entry (e.g., a patient).

    * **Colors:** Points are colored according to the Disease column, which indicates different diseases. The legend on the right lists the diseases with their corresponding colors.
    
    * **Absence/Presence:** Since symptoms are binary (0 or 1), the plots show whether symptoms are present (1) or absent (0) for the pairs of symptoms.

In [None]:
# Combine all symptoms into one series
all_symptoms = df[symptom_columns].values.flatten()
all_symptoms = [symptom for symptom in all_symptoms if pd.notna(symptom)]

# Generate word cloud
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(' '.join(all_symptoms))

# Convert word cloud to array
wordcloud_array = wordcloud.to_array()

fig = go.Figure()
# Add trace for word cloud image
fig.add_trace(go.Image(z=wordcloud_array))
# Set axes properties
fig.update_xaxes(showgrid=False, showticklabels=False, zeroline=False)
fig.update_yaxes(showgrid=False, showticklabels=False, zeroline=False)
# Add layout properties
fig.update_layout(title_text='Word Cloud of Symptoms', title_x=0.5, width=800, height=400, margin={'l': 0, 'r': 0, 't': 30, 'b': 0}, clickmode='event+select' )

fig.show()

Output hidden; open in https://colab.research.google.com to view.

* A word shown in `big size` indicates that it is used more times.

In [None]:
# Frequency Plot of Top Symptoms
# Flatten the symptom columns into a single list
symptom_list = df[symptom_columns].values.flatten()
symptom_list = [symptom for symptom in symptom_list if pd.notna(symptom)]

# Count the frequency of each symptom
symptom_counts = pd.Series(symptom_list).value_counts().head(20)

fig = px.bar(symptom_counts, x=symptom_counts.values, y=symptom_counts.index, orientation='h',
             title='Most Prevalent Symptoms in the Dataset', color=symptom_counts.index)
fig.update_layout(xaxis_title='Count', yaxis_title='Symptom')
fig.show()

* Top Reported Symptoms in the Study  is `fatigue,` followed by `vomiting`, `high_fever`, `loss_of_appetite`, and `nausea`, etc...

In [None]:
#  Symptom Co-occurrence Heatmap
# Create a dataframe for symptoms
symptom_columns = df.columns[1:]
symptom_df = df[symptom_columns].notna().astype(int)

# Calculate co-occurrence matrix
co_occurrence_matrix = symptom_df.T.dot(symptom_df)

fig = px.imshow(co_occurrence_matrix, labels=dict(x="Symptoms", y="Symptoms", color="Co-occurrence"), x=symptom_columns, y=symptom_columns, text_auto=True)

fig.update_xaxes(side="top")
fig.update_layout(title='Symptom Correlation Matrix', title_x=0.5, coloraxis_colorbar=dict(title='Co-occurrence'),
    autosize=False, width=1300, height=700, margin=dict(l=10, r=10, b=10, t=30))

fig.show()

* The Heatmap Each row and column of the table represents a different symptom, and the value at the intersection of two rows and columns represents the number of times that the two symptoms co-occurred.

In [None]:
# Symptom Distribution per Disease
disease = 'Common Cold'  # Replace with any disease of interest
symptom_counts = df[df['Disease'] == disease].iloc[:, 1:].notna().sum()

fig = px.bar(symptom_counts.sort_values(), orientation='h', color=symptom_counts.index,
             labels={'index': 'Symptoms', 'value': 'Count'}, title=f'Symptom Distribution for {disease}')
fig.update_layout(xaxis_title='Count', yaxis_title='Symptoms')
fig.show()

* The bar chart shows the distribution of Commons cold symptoms. The x-axis shows the count of people experiencing the symptom, and the y-axis shows the different symptoms. For example, the bar labeled `Symptom_1` is the most common symptom, and around 120 people experienced it.

In [None]:
# Disease-Symptom Network Graph with Interactive Plot
B = nx.Graph()
# Add nodes with the node attribute "bipartite"
diseases = df['Disease'].unique()
symptoms = pd.melt(df, id_vars=['Disease'], value_vars=df.columns[1:]).dropna()['value'].unique()

B.add_nodes_from(diseases, bipartite=0)
B.add_nodes_from(symptoms, bipartite=1)
# Add edges between diseases and symptoms
edges = []
for index, row in df.iterrows():
    for symptom in df.columns[1:]:
        if pd.notna(row[symptom]):
            edges.append((row['Disease'], row[symptom]))

B.add_edges_from(edges)
# Get positions for the nodes in G
pos = nx.spring_layout(B)
# Extract the edge and node information
edge_x = []
edge_y = []
for edge in B.edges():
    x0, y0 = pos[edge[0]]
    x1, y1 = pos[edge[1]]
    edge_x.append(x0)
    edge_x.append(x1)
    edge_x.append(None)
    edge_y.append(y0)
    edge_y.append(y1)
    edge_y.append(None)
edge_trace = go.Scatter(x=edge_x, y=edge_y, line=dict(width=0.5, color='#888'), hoverinfo='none', mode='lines')

node_x = []
node_y = []
node_text = []
for node in B.nodes():
    x, y = pos[node]
    node_x.append(x)
    node_y.append(y)
    node_text.append(node)

node_trace = go.Scatter(x=node_x, y=node_y, mode='markers+text', text=node_text, textposition='top center', hoverinfo='text', marker=dict(color=[], size=10, line=dict(width=2)))
node_trace.marker.color = ['blue' if node in diseases else 'red' for node in B.nodes()]

fig = go.Figure(data=[edge_trace, node_trace],
                layout=go.Layout(title='<br>Disease-Symptom Network Graph', titlefont_size=16, showlegend=False, hovermode='closest', margin=dict(b=20,l=5,r=5,t=40),
                    annotations=[ dict(text="Alok Choudhary", showarrow=False, xref="paper", yref="paper")],
                    xaxis=dict(showgrid=False, zeroline=False), yaxis=dict(showgrid=False, zeroline=False)))
fig.show()

* The network graph shows connections between different diseases and symptoms.

In [None]:
# Pie Chart of Top Symptoms
# Count the frequency of each symptom
symptom_counts = df.iloc[:, 1:].stack().value_counts()

# Plot the top 10 symptoms
top_symptoms = symptom_counts.head(10)

fig = px.pie(top_symptoms, values=top_symptoms.values, names=top_symptoms.index, title='Prevalence of Common Cold Symptoms')
fig.update_traces(textinfo='percent+label', pull=[0.1]*10)
fig.show()

* It shows the percentage of people experiencing various symptoms. `Fatigue` is the most common symptom, affecting `15.8%` of people. Other symptoms include `high fever`, `loss of appetite`, `nausea`, `headache`, `abdominal pain`, `yellowish skin`, and `vomiting`.

In [None]:
# Pie Chart of Disease Distribution
# Count the frequency of each disease
disease_counts = df['Disease'].value_counts().reset_index()
disease_counts.columns = ['Disease', 'Count']

# Create an interactive pie chart with improved readability
fig = px.pie(disease_counts, values='Count', names='Disease', title='Disease Distribution Chart', hover_data=['Disease'], labels={'Disease':'Disease Name'})
fig.update_traces(textposition='inside', textinfo='percent+label', textfont_size=20, marker=dict(line=dict(color='#000000', width=2)))
fig.update_layout(showlegend=True, legend_title_text='Disease')
fig.show()

* It  shows the distribution of different diseases.  Slices of the pie chart represent different diseases, and the size of the slice corresponds to the number of people who have that disease.

## **Thanks**