<a href="https://colab.research.google.com/github/abirinpajamas/adhd-prediction-ml-model/blob/main/adhd.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


This dataset contains 6,500 rows and 32 columns, and is designed to support mental and behavioral health analysis, specifically for diagnosing Attention Deficit Hyperactivity Disorder (ADHD) and its subtypes. The dataset combines demographic data (like Age, Gender, Educational_Level, and Family_History) with behavioral indicators such as Sleep_Hours, Daily_Activity_Hours, Phone Usage, and Caffeine Consumption.

Collaborator: Ahmed Mohamed

In [27]:
df=pd.read_csv("adhd_data.csv")
df.head()

Unnamed: 0,Age,Gender,Educational_Level,Family_History,Sleep_Hours,Daily_Activity_Hours,Q1_1,Q1_2,Q1_3,Q1_4,...,Q2_8,Q2_9,Diagnosis_Class,Daily_Phone_Usage_Hours,Daily_Walking_Running_Hours,Difficulty_Organizing_Tasks,Focus_Score_Video,Daily_Coffee_Tea_Consumption,Learning_Difficulties,Anxiety_Depression_Levels
0,8,1,Primary,No,8,7,0,0,0,1,...,0,1,0,2,0.5,0,5,1,0,0
1,9,2,Primary,No,11,7,3,2,2,3,...,2,3,3,2,0.9,1,6,0,1,3
2,9,1,Primary,No,9,5,3,2,3,3,...,2,3,3,2,1.4,1,3,0,1,3
3,5,2,Kindergarten,Yes,7,11,3,3,3,2,...,0,0,2,6,0.6,1,6,0,1,1
4,13,1,Middle,No,3,0,3,2,3,3,...,1,0,2,4,1.0,1,5,1,1,2


Categorically Encoding "Gender" and "Family History" columns

In [28]:
df['Gender'] = df['Gender'].map({1: 'Male', 2: 'Female'}).astype('category')
df['Family_History']=df['Family_History'].astype('category')


In [29]:
df.head()

Unnamed: 0,Age,Gender,Educational_Level,Family_History,Sleep_Hours,Daily_Activity_Hours,Q1_1,Q1_2,Q1_3,Q1_4,...,Q2_8,Q2_9,Diagnosis_Class,Daily_Phone_Usage_Hours,Daily_Walking_Running_Hours,Difficulty_Organizing_Tasks,Focus_Score_Video,Daily_Coffee_Tea_Consumption,Learning_Difficulties,Anxiety_Depression_Levels
0,8,Male,Primary,No,8,7,0,0,0,1,...,0,1,0,2,0.5,0,5,1,0,0
1,9,Female,Primary,No,11,7,3,2,2,3,...,2,3,3,2,0.9,1,6,0,1,3
2,9,Male,Primary,No,9,5,3,2,3,3,...,2,3,3,2,1.4,1,3,0,1,3
3,5,Female,Kindergarten,Yes,7,11,3,3,3,2,...,0,0,2,6,0.6,1,6,0,1,1
4,13,Male,Middle,No,3,0,3,2,3,3,...,1,0,2,4,1.0,1,5,1,1,2


In [30]:
print(df['Gender'].dtype)


category


In [31]:
edu_cat=df['Educational_Level'].unique()
print(edu_cat)

['Primary' 'Kindergarten' 'Middle' 'Secondary' 'Working' 'University'
 'Not Working']


Categorical (Ordinal) Encoding of "Educational Level" column

In [32]:
from pandas.api.types import CategoricalDtype
education_order=['Kindergarten','Primary','Middle','Secondary','University','Working','Not Working']
edu_cat_type=CategoricalDtype(categories=education_order,ordered=True)
df['Educational_Level']=df['Educational_Level'].astype(edu_cat_type)
print("Categories:", df['Educational_Level'].cat.categories)
df.head()

Categories: Index(['Kindergarten', 'Primary', 'Middle', 'Secondary', 'University',
       'Working', 'Not Working'],
      dtype='object')


Unnamed: 0,Age,Gender,Educational_Level,Family_History,Sleep_Hours,Daily_Activity_Hours,Q1_1,Q1_2,Q1_3,Q1_4,...,Q2_8,Q2_9,Diagnosis_Class,Daily_Phone_Usage_Hours,Daily_Walking_Running_Hours,Difficulty_Organizing_Tasks,Focus_Score_Video,Daily_Coffee_Tea_Consumption,Learning_Difficulties,Anxiety_Depression_Levels
0,8,Male,Primary,No,8,7,0,0,0,1,...,0,1,0,2,0.5,0,5,1,0,0
1,9,Female,Primary,No,11,7,3,2,2,3,...,2,3,3,2,0.9,1,6,0,1,3
2,9,Male,Primary,No,9,5,3,2,3,3,...,2,3,3,2,1.4,1,3,0,1,3
3,5,Female,Kindergarten,Yes,7,11,3,3,3,2,...,0,0,2,6,0.6,1,6,0,1,1
4,13,Male,Middle,No,3,0,3,2,3,3,...,1,0,2,4,1.0,1,5,1,1,2


Summing all the hyperactivity and inatention questioniarre feature columns into one each for each row

In [37]:
hyperactivity_cols=['Q1_1', 'Q1_2', 'Q1_3', 'Q1_4', 'Q1_5', 'Q1_6', 'Q1_7', 'Q1_8', 'Q1_9']
inattention_cols=['Q2_1', 'Q2_2', 'Q2_3', 'Q2_4', 'Q2_5', 'Q2_6', 'Q2_7', 'Q2_8', 'Q2_9']

df['hyperactivity_score']=df[hyperactivity_cols].sum(axis=1)
df['inattention_score']=df[inattention_cols].sum(axis=1)

In [38]:
df.head()

Unnamed: 0,Age,Gender,Educational_Level,Family_History,Sleep_Hours,Daily_Activity_Hours,Q1_1,Q1_2,Q1_3,Q1_4,...,Diagnosis_Class,Daily_Phone_Usage_Hours,Daily_Walking_Running_Hours,Difficulty_Organizing_Tasks,Focus_Score_Video,Daily_Coffee_Tea_Consumption,Learning_Difficulties,Anxiety_Depression_Levels,hyperactivity_score,inattention_score
0,8,Male,Primary,No,8,7,0,0,0,1,...,0,2,0.5,0,5,1,0,0,4,6
1,9,Female,Primary,No,11,7,3,2,2,3,...,3,2,0.9,1,6,0,1,3,21,24
2,9,Male,Primary,No,9,5,3,2,3,3,...,3,2,1.4,1,3,0,1,3,23,21
3,5,Female,Kindergarten,Yes,7,11,3,3,3,2,...,2,6,0.6,1,6,0,1,1,23,3
4,13,Male,Middle,No,3,0,3,2,3,3,...,2,4,1.0,1,5,1,1,2,23,5


Dropping the seperate questioniarre feature columns to maintain Data Clarity

In [40]:
drop_cols=inattention_cols+hyperactivity_cols
df.drop(columns=drop_cols,inplace=True)

In [41]:
df.head()

Unnamed: 0,Age,Gender,Educational_Level,Family_History,Sleep_Hours,Daily_Activity_Hours,Diagnosis_Class,Daily_Phone_Usage_Hours,Daily_Walking_Running_Hours,Difficulty_Organizing_Tasks,Focus_Score_Video,Daily_Coffee_Tea_Consumption,Learning_Difficulties,Anxiety_Depression_Levels,hyperactivity_score,inattention_score
0,8,Male,Primary,No,8,7,0,2,0.5,0,5,1,0,0,4,6
1,9,Female,Primary,No,11,7,3,2,0.9,1,6,0,1,3,21,24
2,9,Male,Primary,No,9,5,3,2,1.4,1,3,0,1,3,23,21
3,5,Female,Kindergarten,Yes,7,11,2,6,0.6,1,6,0,1,1,23,3
4,13,Male,Middle,No,3,0,2,4,1.0,1,5,1,1,2,23,5
