# NFHSense: AI-powered Health Data Assistant 🧠🇮🇳

Welcome to **NFHSense**, an intelligent health analytics tool built using the **NFHS-5 District-Level Dataset**.

*To begin the analysis, I imported all the essential Python libraries that are commonly used in data analysis and visualization. I used Pandas for handling and exploring the dataset, NumPy for numerical operations, Matplotlib and Seaborn for creating visualizations. These libraries help in loading the data, analyzing patterns, handling missing values, and generating charts to better understand the insights hidden in the data.*

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

*I loaded the NFHS-5 dataset into a Pandas DataFrame from an Excel file. This allows me to work with the data in a structured, tabular format within Python.*

In [9]:
dataset_path = "dataset/NFHS_5_India_Districts_Factsheet_Data.xlsx"
df = pd.ExcelFile(dataset_path).parse("Sheet1")

*Lets quickly preview the data and understand its structure.*

In [10]:
df.head()

Unnamed: 0,District Names,State/UT,Number of Households surveyed,Number of Women age 15-49 years interviewed,Number of Men age 15-54 years interviewed,Female population age 6 years and above who ever attended school (%),Population below age 15 years (%),"Sex ratio of the total population (females per 1,000 males)","Sex ratio at birth for children born in the last five years (females per 1,000 males)",Children under age 5 years whose birth was registered with the civil authority (%),...,Men age 15 years and above wih Mildly elevated blood pressure (Systolic 140-159 mm of Hg and/or Diastolic 90-99 mm of Hg) (%),Men age 15 years and above wih Moderately or severely elevated blood pressure (Systolic ≥160 mm of Hg and/or Diastolic ≥100 mm of Hg) (%),Men age 15 years and above wih Elevated blood pressure (Systolic ≥140 mm of Hg and/or Diastolic ≥90 mm of Hg) or taking medicine to control blood pressure (%),Women (age 30-49 years) Ever undergone a screening test for cervical cancer (%),Women (age 30-49 years) Ever undergone a breast examination for breast cancer (%),Women (age 30-49 years) Ever undergone an oral cavity examination for oral cancer (%),Women age 15 years and above who use any kind of tobacco (%),Men age 15 years and above who use any kind of tobacco (%),Women age 15 years and above who consume alcohol (%),Men age 15 years and above who consume alcohol (%)
0,Nicobars,Andaman & Nicobar Islands,882,764,125,78.01,22.98,973.31,927.41,98.01,...,32.86,11.06,46.97,13.35,13.16,5.37,63.46,76.79,29.59,64.49
1,North & Middle Andaman,Andaman & Nicobar Islands,874,789,108,82.66,19.82,949.82,844.43,100.0,...,22.62,5.97,32.2,1.7,0.25,15.84,46.77,70.47,5.08,45.26
2,South Andaman,Andaman & Nicobar Islands,868,844,134,84.68,20.95,967.48,934.92,96.53,...,17.88,6.09,26.9,1.32,0.67,8.0,19.6,50.76,1.72,32.81
3,Srikakulam,Andhra Pradesh,874,780,100,59.99,20.71,1139.51,1162.58,94.96,...,14.37,5.51,22.89,1.04,0.23,3.76,7.09,21.32,0.55,28.26
4,Vizianagaram,Andhra Pradesh,902,853,134,55.95,20.56,1114.35,898.03,95.42,...,14.83,6.43,25.13,4.9,0.63,7.33,11.41,21.47,0.8,32.3


*Each row in the dataset represents a district and contains a large number of data points - 109 columns capturing various health, demographic, and social indicators. 
To make the dataset easier to work with, I plan to rename these columns to shorter, more meaningful names in the upcoming step.*

*I'm manually creating new, short column names for the dataset. Also preserving the old column names in a data dictionary for future reference.*


In [11]:
data_dictionary = {
    'district': 'District Names',
    'state_ut': 'State/UT',
    'households_surveyed': 'Number of Households surveyed',
    'women_15_49_interviewed': 'Number of Women age 15-49 years interviewed',
    'men_15_54_interviewed': 'Number of Men age 15-54 years interviewed',
    'female_6plus_school_pct': 'Female population age 6 years and above who ever attended school (%)',
    'population_below_15_pct': 'Population below age 15 years (%)',
    'sex_ratio_total': ' Sex ratio of the total population (females per 1,000 males)',
    'sex_ratio_birth_last5yrs': 'Sex ratio at birth for children born in the last five years (females per 1,000 males)',
    'birth_registered_under5_pct': 'Children under age 5 years whose birth was registered with the civil authority (%)',
    'deaths_registered_3yrs_pct': 'Deaths in the last 3 years registered with the civil authority (%)',
    'electricity_access_pct': 'Population living in households with electricity (%)',
    'improved_water_source_pct': 'Population living in households with an improved drinking-water source1 (%)',
    'improved_sanitation_pct': 'Population living in households that use an improved sanitation facility2 (%)',
    'clean_cooking_fuel_pct': 'Households using clean fuel for cooking3 (%)',
    'iodized_salt_pct': 'Households using iodized salt (%)',
    'health_insurance_coverage_pct': 'Households with any usual member covered under a health insurance/financing scheme (%)',
    'preprimary_school_age5_pct': 'Children age 5 years who attended pre-primary school during the school year 2019-20 (%)',
    'women_literate_15_49_pct': 'Women (age 15-49) who are literate4 (%)',
    'women_10plus_education_pct': 'Women (age 15-49)  with 10 or more years of schooling (%)',
    'women_20_24_married_before_18_pct': 'Women age 20-24 years married before age 18 years (%)',
    'third_or_higher_order_births_pct': 'Births in the 5 years preceding the survey that are third or higher order (%)',
    'teen_mothers_15_19_pct': 'Women age 15-19 years who were already mothers or pregnant at the time of the survey (%)',
    'women_15_24_hygienic_menstrual_pct': 'Women age 15-24 years who use hygienic methods of protection during their menstrual period5 (%)',
    'family_planning_any_method_pct': 'Current Use of Family Planning Methods (Currently Married Women Age 15-49  years) - Any method6 (%)',
    'family_planning_modern_method_pct': 'Current Use of Family Planning Methods (Currently Married Women Age 15-49  years) - Any modern method6 (%)',
    'female_sterilization_pct': 'Current Use of Family Planning Methods (Currently Married Women Age 15-49  years) - Female sterilization (%)',
    'male_sterilization_pct': 'Current Use of Family Planning Methods (Currently Married Women Age 15-49  years) - Male sterilization (%)',
    'iud_ppiud_pct': 'Current Use of Family Planning Methods (Currently Married Women Age 15-49  years) - IUD/PPIUD (%)',
    'pill_usage_pct': 'Current Use of Family Planning Methods (Currently Married Women Age 15-49  years) - Pill (%)',
    'condom_usage_pct': 'Current Use of Family Planning Methods (Currently Married Women Age 15-49  years) - Condom (%)',
    'injectables_usage_pct': 'Current Use of Family Planning Methods (Currently Married Women Age 15-49  years) - Injectables (%)',
    'unmet_need_total_pct': 'Total Unmet need for Family Planning (Currently Married Women Age 15-49  years)7 (%)',
    'unmet_need_spacing_pct': 'Unmet need for spacing (Currently Married Women Age 15-49  years)7 (%)',
    'health_worker_fp_talk_pct': 'Health worker ever talked to female non-users about family planning (%)',
    'side_effects_explained_pct': 'Current users ever told about side effects of current method of family planning8 (%)',
    'anc_first_trimester_pct': 'Mothers who had an antenatal check-up in the first trimester  (for last birth in the 5 years before the survey) (%)',
    'anc_4plus_visits_pct': 'Mothers who had at least 4 antenatal care visits  (for last birth in the 5 years before the survey) (%)',
    'tetanus_protection_pct': 'Mothers whose last birth was protected against neonatal tetanus (for last birth in the 5 years before the survey)9 (%)',
    'ifa_100_days_pct': 'Mothers who consumed iron folic acid for 100 days or more when they were pregnant (for last birth in the 5 years before the survey) (%)',
    'ifa_180_days_pct': 'Mothers who consumed iron folic acid for 180 days or more when they were pregnant (for last birth in the 5 years before the survey} (%)',
    'mcp_card_received_pct': 'Registered pregnancies for which the mother received a Mother and Child Protection (MCP) card (for last birth in the 5 years before the survey) (%)',
    'postnatal_care_mother_2days_pct': 'Mothers who received postnatal care from a doctor/nurse/LHV/ANM/midwife/other health personnel within 2 days of delivery (for last birth in the 5 years before the survey) (%)',
    'oop_expenditure_delivery_public_rs': 'Average out-of-pocket expenditure per delivery in a public health facility (for last birth in the 5 years before the survey) (Rs.)',
    'home_birth_checkup_24hrs_pct': 'Children born at home who were taken to a health facility for a check-up within 24 hours of birth (for last birth in the 5 years before the survey} (%))',
    'postnatal_care_child_2days_pct': 'Children who received postnatal care from a doctor/nurse/LHV/ANM/midwife/ other health personnel within 2 days of delivery (for last birth in the 5 years before the survey) (%)',
    'institutional_births_pct': 'Institutional births (in the 5 years before the survey) (%)',
    'institutional_births_public_pct': 'Institutional births in public facility (in the 5 years before the survey) (%)',
    'skilled_home_births_pct': 'Home births that were conducted by skilled health personnel  (in the 5 years before the survey)10 (%)',
    'skilled_birth_attendance_pct': 'Births attended by skilled health personnel (in the 5 years before the survey)10 (%)',
    'c_section_total_pct': 'Births delivered by caesarean section (in the 5 years before the survey) (%)',
    'c_section_private_pct': 'Births in a private health facility that were delivered by caesarean section (in the 5 years before the survey) (%)',
    'c_section_public_pct': 'Births in a public health facility that were delivered by caesarean section (in the 5 years before the survey) (%)',
    'fully_vaccinated_12_23_either_pct': "Children age 12-23 months fully vaccinated based on information from either vaccination card or mother's recall11 (%)",
    'fully_vaccinated_12_23_card_pct': 'Children age 12-23 months fully vaccinated based on information from vaccination card only12 (%)',
    'bcg_12_23_pct': 'Children age 12-23 months who have received BCG (%)',
    'polio_3_doses_12_23_pct': 'Children age 12-23 months who have received 3 doses of polio vaccine13 (%)',
    'dpt_3_doses_12_23_pct': 'Children age 12-23 months who have received 3 doses of penta or DPT vaccine (%)',
    'measles_1_dose_12_23_pct': 'Children age 12-23 months who have received the first dose of measles-containing vaccine (MCV) (%)',
    'measles_2_dose_24_35_pct': 'Children age 24-35 months who have received a second dose of measles-containing vaccine (MCV) (%)',
     'rota_3_doses_12_23_pct': 'Children age 12-23 months who have received 3 doses of rotavirus vaccine14 (%)',
    'hepB_3_doses_12_23_pct': 'Children age 12-23 months who have received 3 doses of penta or hepatitis B vaccine (%)',
    'vitaminA_9_35_last6mo_pct': 'Children age 9-35 months who received a vitamin A dose in the last 6 months (%)',
    'vax_most_public_12_23_pct': 'Children age 12-23 months who received most of their vaccinations in a public health facility (%)',
    'vax_most_private_12_23_pct': 'Children age 12-23 months who received most of their vaccinations in a private health facility (%)',
    'diarrhoea_prev_u5_pct': 'Prevalence of diarrhoea in the 2 weeks preceding the survey (Children under age 5 years) (%)',
    'diarrhoea_ors_u5_pct': 'Children with diarrhoea in the 2 weeks preceding the survey who received oral rehydration salts (ORS) (Children under age 5 years) (%)',
    'diarrhoea_zinc_u5_pct': 'Children with diarrhoea in the 2 weeks preceding the survey who received zinc (Children under age 5 years) (%)',
    'diarrhoea_treated_u5_pct': 'Children swith diarrhoea in the 2 weeks preceding the survey taken to a health facility or health provider (Children under age 5 years) (%)',
    'ari_prev_u5_pct': 'Children Prevalence of symptoms of acute respiratory infection (ARI) in the 2 weeks preceding the survey (Children under age 5 years) (%)',
     'ari_fever_treated_u5_pct': 'Children with fever or symptoms of ARI in the 2 weeks preceding the survey taken to a health facility or health provider (Children under age 5 years) (%)',
    'breastfed_1hr_u3_pct': 'Children under age 3 years breastfed within one hour of birth15 (%)',
    'exclusive_breastfeeding_u6mo_pct': 'Children under age 6 months exclusively breastfed16 (%)',
    'solid_food_6_8mo_pct': 'Children age 6-8 months receiving solid or semi-solid food and breastmilk16 (%)',
    'adequate_diet_bf_6_23mo_pct': 'Breastfeeding children age 6-23 months receiving an adequate diet16, 17  (%)',
    'adequate_diet_nonbf_6_23mo_pct': 'Non-breastfeeding children age 6-23 months receiving an adequate diet16, 17 (%)',
    'adequate_diet_total_6_23mo_pct': 'Total children age 6-23 months receiving an adequate diet16, 17  (%)',
    'stunted_u5_pct': 'Children under 5 years who are stunted (height-for-age)18 (%)',
    'wasted_u5_pct': 'Children under 5 years who are wasted (weight-for-height)18 (%)',
    'severely_wasted_u5_pct': 'Children under 5 years who are severely wasted (weight-for-height)19 (%)',
     'underweight_u5_pct': 'Children under 5 years who are underweight (weight-for-age)18 (%)',
    'overweight_u5_pct': 'Children under 5 years who are overweight (weight-for-height)20 (%)',
    'women_bmi_low_pct': 'Women (age 15-49 years) whose Body Mass Index (BMI) is below normal (BMI <18.5 kg/m2)21 (%)',
    'women_bmi_high_pct': 'Women (age 15-49 years) who are overweight or obese (BMI ≥25.0 kg/m2)21 (%)',
    'women_high_whr_pct': 'Women (age 15-49 years) who have high risk waist-to-hip ratio (≥0.85) (%)',
    'anaemic_children_6_59mo_pct': 'Children age 6-59 months who are anaemic (<11.0 g/dl)22 (%)',
    'anaemic_nonpreg_women_15_49_pct': 'Non-pregnant women age 15-49 years who are anaemic (<12.0 g/dl)22 (%)',
    'anaemic_preg_women_15_49_pct': 'Pregnant women age 15-49 years who are anaemic (<11.0 g/dl)22 (%)',
    'anaemic_all_women_15_49_pct': 'All women age 15-49 years who are anaemic22 (%)',
    'anaemic_all_women_15_19_pct': 'All women age 15-19 years who are anaemic22 (%)',
    'women_bs_high_pct': 'Women  age 15 years and above with high (141-160 mg/dl) Blood sugar level23 (%)',
    'women_bs_very_high_pct': 'Women age 15 years and above wih very high (>160 mg/dl) Blood sugar level23 (%)',
    'women_bs_high_medicated_pct': 'Women age 15 years and above wih high or very high (>140 mg/dl) Blood sugar level or taking medicine to control blood sugar level23 (%)',
    'men_bs_high_pct': 'Men age 15 years and above wih high (141-160 mg/dl) Blood sugar level23 (%)',
    'men_bs_very_high_pct': 'Men (age 15 years and above wih  very high (>160 mg/dl) Blood sugar level23 (%)',
    'men_bs_high_medicated_pct': 'Men age 15 years and above wih high or very high (>140 mg/dl) Blood sugar level  or taking medicine to control blood sugar level23 (%)',
    'women_bp_mild_pct': 'Women age 15 years and above wih Mildly elevated blood pressure (Systolic 140-159 mm of Hg and/or Diastolic 90-99 mm of Hg) (%)',
    'women_bp_mod_severe_pct': 'Women age 15 years and above wih Moderately or severely elevated blood pressure (Systolic ≥160 mm of Hg and/or Diastolic ≥100 mm of Hg) (%)',
    'women_bp_elevated_medicated_pct': 'Women age 15 years and above wih Elevated blood pressure (Systolic ≥140 mm of Hg and/or Diastolic ≥90 mm of Hg) or taking medicine to control blood pressure (%)',
    'men_bp_mild_pct': 'Men age 15 years and above wih Mildly elevated blood pressure (Systolic 140-159 mm of Hg and/or Diastolic 90-99 mm of Hg) (%)',
    'men_bp_mod_severe_pct': 'Men age 15 years and above wih Moderately or severely elevated blood pressure (Systolic ≥160 mm of Hg and/or Diastolic ≥100 mm of Hg) (%)',
    'men_bp_elevated_medicated_pct': 'Men age 15 years and above wih Elevated blood pressure (Systolic ≥140 mm of Hg and/or Diastolic ≥90 mm of Hg) or taking medicine to control blood pressure (%)',
    'cancer_screen_cervical_women_30_49_pct': 'Women (age 30-49 years) Ever undergone a screening test for cervical cancer (%)',
    'cancer_screen_breast_women_30_49_pct': 'Women (age 30-49 years) Ever undergone a breast examination for breast cancer (%)',
    'cancer_screen_oral_women_30_49_pct': 'Women (age 30-49 years) Ever undergone an oral cavity examination for oral cancer (%)',
    'tobacco_use_women_15plus_pct': 'Women age 15 years and above who use any kind of tobacco (%)',
    'tobacco_use_men_15plus_pct': 'Men age 15 years and above who use any kind of tobacco (%)',
    'alcohol_use_women_15plus_pct': 'Women age 15 years and above who consume alcohol (%)',
    'alcohol_use_men_15plus_pct': 'Men age 15 years and above who consume alcohol (%)'
    
}

*We are now renaming the original column names in our DataFrame to shorter names using the data_dictionary*

In [20]:
df.columns = data_dictionary.keys()
df.columns

Index(['district', 'state_ut', 'households_surveyed',
       'women_15_49_interviewed', 'men_15_54_interviewed',
       'female_6plus_school_pct', 'population_below_15_pct', 'sex_ratio_total',
       'sex_ratio_birth_last5yrs', 'birth_registered_under5_pct',
       ...
       'men_bp_mild_pct', 'men_bp_mod_severe_pct',
       'men_bp_elevated_medicated_pct',
       'cancer_screen_cervical_women_30_49_pct',
       'cancer_screen_breast_women_30_49_pct',
       'cancer_screen_oral_women_30_49_pct', 'tobacco_use_women_15plus_pct',
       'tobacco_use_men_15plus_pct', 'alcohol_use_women_15plus_pct',
       'alcohol_use_men_15plus_pct'],
      dtype='object', length=109)

### Let's start by checking the shape of the dataset, which tells us how many districts and features are present.

In [24]:
df.shape

(707, 109)

### So we have 109 datapoints on 707 districts

*Let's see a general overview of the dataset*

In [25]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 707 entries, 0 to 706
Columns: 109 entries, district to alcohol_use_men_15plus_pct
dtypes: float64(75), int64(3), object(31)
memory usage: 602.2+ KB


*The dataset contains 75 columns with float64 data type, 3 columns as int64, and 31 columns as object data type.*

*Let's see a statistical overview of the dataset*

In [27]:
df.describe()

Unnamed: 0,households_surveyed,women_15_49_interviewed,men_15_54_interviewed,female_6plus_school_pct,population_below_15_pct,sex_ratio_total,sex_ratio_birth_last5yrs,birth_registered_under5_pct,electricity_access_pct,improved_water_source_pct,...,men_bp_mild_pct,men_bp_mod_severe_pct,men_bp_elevated_medicated_pct,cancer_screen_cervical_women_30_49_pct,cancer_screen_breast_women_30_49_pct,cancer_screen_oral_women_30_49_pct,tobacco_use_women_15plus_pct,tobacco_use_men_15plus_pct,alcohol_use_women_15plus_pct,alcohol_use_men_15plus_pct
count,707.0,707.0,707.0,707.0,707.0,707.0,707.0,707.0,707.0,707.0,...,707.0,707.0,707.0,707.0,707.0,707.0,707.0,707.0,707.0,707.0
mean,900.502122,1024.019802,144.002829,71.507822,26.356874,1020.696139,937.990184,91.064795,96.996025,93.726054,...,16.253621,6.042518,24.759661,1.565827,0.6529,0.700495,11.614965,40.599646,2.912631,23.19075
std,69.273371,177.064999,31.953268,10.311666,5.296601,73.367114,165.625452,9.392697,4.354175,8.71469,...,4.336475,2.573082,6.768313,2.774292,1.566614,1.468252,11.943028,14.081028,6.079181,13.36201
min,213.0,216.0,17.0,45.36,15.98,754.98,-1261.45,51.58,68.35,41.18,...,5.29,0.83,10.02,0.0,0.0,0.0,0.06,6.75,0.0,0.07
25%,882.0,911.0,124.0,64.395,22.505,969.055,864.85,87.03,96.39,92.03,...,13.19,4.105,19.825,0.19,0.0,0.0,4.09,30.42,0.27,13.585
50%,908.0,1020.0,145.0,71.34,25.36,1013.26,930.04,94.89,98.65,96.98,...,16.25,5.83,24.42,0.55,0.21,0.28,7.67,42.49,0.5,20.16
75%,931.0,1141.0,164.0,78.97,29.53,1065.53,1014.575,97.745,99.515,99.25,...,18.845,7.555,28.97,1.52,0.5,0.69,14.765,50.97,1.71,30.89
max,990.0,1621.0,241.0,99.16,50.56,1331.64,1484.97,100.0,100.0,100.0,...,32.86,19.46,49.61,23.22,14.55,15.84,70.58,80.56,42.77,68.38
