https://www.cdc.gov/brfss/annual_data/annual_2022.html

Data codebook : https://www.cdc.gov/brfss/annual_data/2022/zip/codebook22_llcp-v2-508.zip

Data from: https://www.cdc.gov/brfss/annual_data/2022/files/LLCP2022XPT.zip


In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

In [None]:
df = pd.read_csv('diabetes_012_health_indicators_BRFSS2015.csv.zip')


In [None]:
df

The data sample is very informative and is represented by 253 thousand surveys on 22 criteria.

What features characterize the data?

* Diabetes_012 - target category containing: No Diabetes, Pre-Diabetes, and Diabetes
* HighBP - If the person has high blood pressure.
* HighChol - 
0 = no high cholesterol 1 = high cholesterol

CholCheck
0 = no cholesterol check in 5 years 1 = yes cholesterol check in 5 years

BMI
Body Mass Index

Smoker
Have you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes] 0 = no 1 = yes

Stroke
(Ever told) you had a stroke. 0 = no 1 = yes

HeartDiseaseorAttack
coronary heart disease (CHD) or myocardial infarction (MI) 0 = no 1 = yes

Fruits
Consume Fruit 1 or more times per day 0 = no 1 = yes

Veggies
Consume Vegetables 

In [None]:
#Check our missing data from which columns and how many unique features they have. 
pd.concat([df.isnull().sum(), df.nunique(), df.dtypes], axis = 1, sort= False, keys=['Null','Unique','Data Type'])

There are no missing values.

Diabetes_012    
0 = no diabetes 1 = prediabetes 2 = diabetes

HighBP
0 = no high BP 1 = high BP

HighChol
0 = no high cholesterol 1 = high cholesterol

CholCheck
0 = no cholesterol check in 5 years 1 = yes cholesterol check in 5 years

BMI
Body Mass Index

Smoker
Have you smoked at least 100 cigarettes in your entire life? [Note: 5 packs = 100 cigarettes] 0 = no 1 = yes

Stroke
(Ever told) you had a stroke. 0 = no 1 = yes

HeartDiseaseorAttack
coronary heart disease (CHD) or myocardial infarction (MI) 0 = no 1 = yes

Fruits
Consume Fruit 1 or more times per day 0 = no 1 = yes

Veggies
Consume Vegetables 1 or more times per day 0 = no 1 = yes

HvyAlcoholConsump
(adult men >=14 drinks per week and adult women>=7 drinks per week) 0 = no 1 = yes

AnyHealthcare
Have any kind of health care coverage, including health insurance, prepaid plans such as HMO, etc. 0 = no 1 = yes

NoDocbcCost
Was there a time in the past 12 months when you needed to see a doctor but could not because of cost? 0 = no 1 = yes

GenHlth
Would you say that in general your health is: scale 1-5 1 = excellent 2 = very good 3 = good 4 = fair 5 = poor

MentHlth
days of poor mental health scale 1-30 days

PhysHlth
physical illness or injury days in past 30 days scale 1-30

DiffWalk
Do you have serious difficulty walking or climbing stairs? 0 = no 1 = yes

Sex
0 = female 1 = male

Age
13-level age category (_AGEG5YR see codebook) 1 = 18-24 9 = 60-64 13 = 80 or older

Education
Education level (EDUCA see codebook) scale 1-6 1 = Never attended school or only kindergarten 2 = elementary etc.

Income
Income scale (INCOME2 see codebook) scale 1-8 1 = less than $10,000 5 = less than $35,000 8 = $75,000 or more






In [None]:

sns.countplot(x='Diabetes_012',data=df,palette='pastel')

In [None]:
# check label matches description, expect [0,1,2] - No Diabetes, Pre-Diabetes, Diabetes
df.Diabetes_012.value_counts()

In [None]:
df.describe()

In [None]:
#Values 
df_description = df.describe()
df_max = df_description.iloc[[7]]  # Max Line
df_min = df_description.iloc[[3]]  # Min Line
df_min = pd.melt(df_min) # Pivot the data
df_max = pd.melt(df_max) # Pivot the data
df_min_max = df_max
df_min_max.rename(columns={'value':'Max'}, inplace=True)
df_min_max['Min'] = df_min['value']
df_min_max[(df_min_max.Max != 1) | (df_min_max.Min != 0)]



In [None]:
fig, ax = plt.subplots()

In [None]:
plt.scatter(df["BMI"],df["Age"], s=1, c='b', marker='.', alpha=0.3, label="Male")

# categories = height_weight_gender[height_weight_gender.Gender_Male == 0]["Gender_Male"]
# plt.scatter(height_weight_gender[height_weight_gender.Gender_Male == 0]["Weight"],height_weight_gender[height_weight_gender.Gender_Male == 0]["Height"], s=1, c='r', marker='.', alpha=0.3, label="Female")

# plt.xlabel('Weight')
# plt.ylabel("Height")
# legend = plt.legend(title="Gender")
# plt.title("Height vs Weight")
# ax.legend(markerscale=20)
sns.catplot(data=df, x="Age", y="BMI", hue="Sex", kind="bar")

In [None]:
# from sklearn.preprocessing import RobustScaler
# scaler = RobustScaler()
# data_scaled = scaler.fit_transform(df)



In [2]:
raw_df = pd.read_sas(r"C:\Users\djhar\Downloads\LLCP2022XPT\LLCP2022.XPT")

In [3]:
df = pd.DataFrame()

df['diabetes'] = raw_df.DIABETE4
# 1	Yes
# 2	Yes, but female told only during pregnancy
# 3	No
# 4	No, pre-diabetes or borderline diabetes
# 7	Don’t know/Not Sure
# 9	Refused—Go to Section
# BLANK	Not asked or Missing


df['completed_survey'] = raw_df.DISPCODE
# 1100	Completed Interview	
# 1200	Partial Complete Interview

df['bmi'] = raw_df._BMI5CAT
# 1	Underweight
# 2	Normal Weight
# 3	Overweight
# 4	Obese
# BLANK	Don’t know/Refused/Missing

df['smoker'] = raw_df._SMOKGRP
# 1	Current smoker, 20+ Pack Years
# 2	Former smoker, 20+ Pack Years, quit < 15 years
# 3	All other current and former smokers
# 4	Never smoker
# BLANK	Don’t know/Refused/Missing

df['stroke'] = raw_df.CVDSTRK3
# 1	Yes
# 2	No
# 7	Don’t know/Not sure
# 9	Refused
# BLANK	Not asked or Missing

df['heart_attack'] = raw_df.CVDINFR4
# # 1	Yes	
# 2	No
# 7	Don’t know/Not sure	
# 9	Refused	
# BLANK	Not asked or Missing

df['angina_or_chd'] = raw_df.CVDCRHD4
# 1	Yes
# 2	No
# 7	Don’t know/Not sure
# 9	Refused
# BLANK	Not asked or Missing

df['chd_mi'] = raw_df._MICHD
# 1	Reported having MI or CHD
# 2	Did not report having MI or CHD
# BLANK	Not asked or Missing.

df['asthma'] = raw_df._ASTHMS1
# 1	Current
# 2	Former
# 3	Never
# 9	Don’t know/Not Sure Or Refused/Missing
# Notes: ASTHMA3 = 7 or 9 or Missing or ASTHNOW = 7 or 9 or Missing

df['physical_activity'] = raw_df._TOTINDA
# 1	Had physical activity or exercise
# 2	No physical activity or exercise in last 30 days
# 9	Don’t know/Refused/Missing

df['heavy_drinking'] = raw_df._RFDRHV8
# 1	No
# 2	Yes
# 9	Don’t know/Refused/Missing

df['no_doctor_due_to_cost'] = raw_df.MEDCOST1
# 1	Yes
# 2	No
# 7	Don’t know/Not sure
# 9	Refused
# BLANK	Not asked or Missing

df['any_healthcare_insurance'] = raw_df.PRIMINSR
# 1	A plan purchased through an employer or union (including plans purchased through another person´s employer)
# 2	A private nongovernmental plan that you or another family member buys on your own
# 3	Medicare
# 4	Medigap
# 5	Medicaid
# 6	Children´s Health Insurance Program (CHIP)
# 7	Military related health care: TRICARE (CHAMPUS) / VA health care / CHAMP- VA
# 8	Indian Health Service
# 9	State sponsored health plan
# 10 Other government program
# 88 No coverage of any type
# 77 Don’t know/Not Sure
# 99 Refused
# BLANK	Not asked or Missing

df['general_health_status'] = raw_df.GENHLTH
# 1	Excellent
# 2	Very good
# 3	Good	
# 4	Fair
# 5	Poor
# 7	Don’t know/Not Sure
# 9	Refused
# BLANK	Not asked or Missing

df['mental_health_status'] = raw_df._MENT14D
# 1	Zero days when mental health not good
# 2	1-13 days when mental health not good
# 3	14+ days when mental health not good
# 9	Don’t know/Refused/Missing

df['physical_health_status'] = raw_df._PHYS14D
# 1	Zero days when physical health not good	
# 2	1-13 days when physical health not good
# 3	14+ days when physical health not good
# 9	Don’t know/Refused/Missing


df['difficulty_walking'] = raw_df.DIFFWALK
# 1	Yes	
# 2	No	
# 7	Don’t know/Not Sure
# 9	Refused	
# BLANK	Not asked or Missing

df['gender'] = raw_df._SEX
# 1	Male
# 2	Female

df['age'] = raw_df._AGEG5YR
# 1	Age 18 to 24
# 2	Age 25 to 29
# 3	Age 30 to 34
# 4	Age 35 to 39
# 5	Age 40 to 44
# 6	Age 45 to 49
# 7	Age 50 to 54
# 8	Age 55 to 59
# 9	Age 60 to 64
# 10 Age 65 to 69
# 11 Age 70 to 74
# 12 Age 75 to 79
# 13 Age 80 or older
# 14 Don’t know/Refused/Missing

df['education'] = raw_df._EDUCAG
# 1	Did not graduate High School
# 2	Graduated High School
# 3	Attended College or Technical School
# 4	Graduated from College or Technical School
# 9	Don’t know/Not sure/Missing

df['income'] = raw_df._INCOMG1
# 1	Less than $15,000
# 2	$15,000 to < $25,000
# 3	$25,000 to < $35,000
# 4	$35,000 to < $50,000
# 5	$50,000 to < $100,000
# 6	$100,000 to < $200,000
# 7	$200,000 or more
# 9	Don’t know/Not sure/Missing

df['race'] = raw_df._PRACE2
# 1	White
# 2	Black or African American
# 3	American Indian or Alaskan Native
# 4	Asian
# 5	Native Hawaiian or other Pacific Islander
# 7	Multiracial but no preferred race
# 88	No race choice given
# 77	Don’t know/Not sure
# 99	Refused
# BLANK	Missing

df['sleep_time'] = raw_df.SLEPTIM1
# 1 - 24	Number of hours [1-24]
# 77	Don’t know/Not Sure
# 99	Refused
# BLANK	Missing

df['years_smoked'] = raw_df.COPDSMOK
# 1 - 76	Number of years
# 88	Never smoked or smoked < one year
# 77	Don´t know/Not sure
# 99	Refused
# BLANK	Not asked or Missing


AttributeError: 'DataFrame' object has no attribute '_SMOKGRP'

Features: bmi,smoker,chd_mi,difficulty_walking, years_smoked have alot of nulls, need to investigate

In [None]:
df

In [None]:
pd.concat([df.isnull().sum(), df.nunique(), df.dtypes], axis = 1, sort= False, keys=['Null','Unique','Data Type'])

In [None]:
#1 - diabetes
# 1	Yes
# 2	Yes, but female told only during pregnancy—Go to Section 08.01 AGE
# 3	No—Go to Section 08.01 AGE
# 4	No, pre-diabetes or borderline diabetes—Go to Section 08.01 AGE
# 7	Don’t know/Not Sure—Go to Section 08.01 AGE
# 9	Refused—Go to Section 08.01 AGE
# BLANK	Not asked or Missing	

#Remove refused
df.drop(df[df.diabetes == 9.0].index, inplace=True)

#Update to new layout
#During pregnancy to 0
#No to 0
#Yes to 2
#Pre to 1
#Don't no to 0
#refused to 0
df.diabetes = df.diabetes.replace({2:0, 3:0, 1:2, 4:1, 7:0, 9:0})

#New layout
# 0 - None
# 1 - Diabetes
# 2 - Pre-Diabetes

#Validate
df.groupby(df.diabetes, dropna=False, as_index=False).size()

In [None]:
#2 completed_survey
# Drop incomplete surveys
df.drop(df[df.completed_survey == '1200'].index, inplace=True)

#Drop the completed_survey column
df.drop(["completed_survey"], axis=1, inplace=True)

In [None]:
#3 bmi

# 1	Underweight
# 2	Normal Weight
# 3	Overweight
# 4	Obese
# BLANK	Don’t know/Refused/Missing

#unknown bmi to new category
df.loc[df.bmi.isnull(), "bmi"] = 0

#Validate
df.groupby(df.bmi, dropna=False, as_index=False).size()

In [None]:
# 4 smoker
# Value	Value Label	Frequency	Percentage	Weighted Percentage
# 1	Current smoker, 20+ Pack Years
# 2	Former smoker, 20+ Pack Years, quit < 15 years
# 3	All other current and former smokers
# 4	Never smoker
# BLANK	Don’t know/Refused/Missing

#Don’t know/Refused/Missing to probably smoked
df.loc[df.smoker.isnull(), "smoker"] = 3

#Move heavy current smoker to heavy former smoker
df.loc[df.smoker == 1, "smoker"] = 2

#Move Never smoked to 0
df.loc[df.smoker == 4, "smoker"] = 0

#Validate
df.groupby(df.smoker, dropna=False, as_index=False).size()

In [None]:
#5 stroke
df.groupby(df.stroke, dropna=False, as_index=False).size()
# 1	Yes	
# 2	No	
# 7	Don’t know/Not sure	
# 9	Refused	

#Change to binary
df.stroke = df.stroke.replace({2:0, 7:0, 9:0})

#Validate
df.groupby(df.stroke, dropna=False, as_index=False).size()

In [None]:
# 6 heart_attack
df.groupby(df.heart_attack, dropna=False, as_index=False).size()
# 1	Yes	
# 2	No
# 7	Don’t know/Not sure	
# 9	Refused	
# BLANK	Not asked or Missing

# heart_attack - #Change to binary
# No to 0
# Don't know to 0
# REfused to 0
df.heart_attack = df.heart_attack.replace({2:0, 7:0, 9:0})

#Validate
df.groupby(df.heart_attack, dropna=False, as_index=False).size()

In [None]:
#7 angina_or_chd
# 1	Yes
# 2	No
# 7	Don’t know/Not sure
# 9	Refused
# BLANK	Not asked or Missing
# angina_or_chd - #Change to binary

#To binary
# No - 0
# Don't know/Not sure to 0
# Refused to 0
df.angina_or_chd = df.angina_or_chd.replace({2:0, 7:0, 9:0})

#Validate
df.groupby(df.angina_or_chd, dropna=False, as_index=False).size()

In [None]:
#8 - chd_mi

# 1	Reported having MI or CHD
# 2	Did not report having MI or CHD
# BLANK	Not asked or Missing

#Change to binary
df.loc[df.chd_mi == 2, "chd_mi"] = 0

#Nulls to no
df.loc[df.chd_mi.isnull(), "chd_mi"] = 0

#Validate
df.groupby(df.chd_mi, dropna=False, as_index=False).size()

In [None]:
# 9 asthma

# 1	Current
# 2	Former
# 3	Never
# 9	Don’t know/Not Sure Or Refused/Missing
# Notes: ASTHMA3 = 7 or 9 or Missing or ASTHNOW = 7 or 9 or Missing

# asthma - Change to binary
# former to 1
# Never to 0
# Don't know to 0
df.asthma = df.asthma.replace({2:1, 3:0, 9:0})

#Validate
df.groupby(df.asthma, dropna=False, as_index=False).size()

In [None]:
# 10 physical_activity
# 1	Had physical activity or exercise
# 2	No physical activity or exercise in last 30 days
# 9	Don’t know/Refused/Missing


# To Binary
# No activity to 0
# Don't know to 0
df.physical_activity = df.physical_activity.replace({2:0, 9:0})

#Validate
df.groupby(df.physical_activity, dropna=False, as_index=False).size()

In [None]:
#11 heavy_drinking

# 1	No
# 2	Yes
# 9	Don’t know/Refused/Missing

#To Binary
# No to 0
# Yes to 1
# Don't know to 0
df.heavy_drinking = df.heavy_drinking.replace({1:0, 2:1, 9:0})

#Validate
df.groupby(df.heavy_drinking, dropna=False, as_index=False).size()

In [None]:
#12 no_doctor_due_to_cost

#Validate
df.groupby(df.heavy_drinking, dropna=False, as_index=False).size()


In [None]:
#13 any_healthcare_insurance
# 1	A plan purchased through an employer or union (including plans purchased through another person´s employer)
# 2	A private nongovernmental plan that you or another family member buys on your own
# 3	Medicare #
# 4	Medigap #
# 5	Medicaid #
# 6	Children´s Health Insurance Program (CHIP) #
# 7	Military related health care: TRICARE (CHAMPUS) / VA health care / CHAMP- VA #
# 8	Indian Health Service #
# 9	State sponsored health plan#
# 10	Other government program #
# 88	No coverage of any type #
# 77	Don’t know/Not Sure
# 99	Refused
# BLANK	Not asked or Missing


#0 - No Insurance
#1 - Commercial Insurance
#2 - Federal Insurance
#3 - State Insurance
#4 - Unknown Insurance

#No coverage to No Insurance
df.loc[df.any_healthcare_insurance == 88, "any_healthcare_insurance"] = 0

# Commercial Insurance
# Add Private personal, Medigap, Indian
df.loc[df.any_healthcare_insurance.isin([2,4]), "any_healthcare_insurance"] = 1

# Federal Insurance
# Add Indian Health Service, CHIP, Medicare, Indian, Other government program
df.loc[df.any_healthcare_insurance.isin([3,6,7,8,10]), "any_healthcare_insurance"] = 2

# State Insurance
# Add Medicaid, state sponsored medical
df.loc[df.any_healthcare_insurance.isin([5,9]), "any_healthcare_insurance"] = 3

#>65 and don't know/refused. = Medicare
df.loc[(df.any_healthcare_insurance.isin([77,99])) & (df.age.isin([10,11,12,13])), "any_healthcare_insurance"] = 2

#Unknown Insurance
df.loc[df.any_healthcare_insurance.isin([77,99]), "any_healthcare_insurance"] = 4

#Validate
df.groupby(df.any_healthcare_insurance, dropna=False, as_index=False).size()    

In [None]:
# 13 general_health_status
# 1	Excellent
# 2	Very good
# 3	Good	
# 4	Fair
# 5	Poor
# 7	Don’t know/Not Sure
# 9	Refused
# BLANK	Not asked or Missing

#Refused to Don't know/Not Sure
df.loc[df.general_health_status.isin([9]), "general_health_status"] = 7
df.loc[df.general_health_status==7,'general_health_status'] = 6

#Validate
df.groupby(df.general_health_status, dropna=False, as_index=False).size()

In [None]:
# 14 mental_health_status
# 1	Zero days when mental health not good
# 2	1-13 days when mental health not good
# 3	14+ days when mental health not good
# 9	Don’t know/Refused/Missing

# Don’t know/Refused/Missing, Zero days to 0
# 1-13 days when mental health not good to 1 
# 14+ days when mental health not good to 2
df.loc[df.mental_health_status.isin([9,1]), "mental_health_status"] = 0
df.loc[df.mental_health_status.isin([2]), "mental_health_status"] = 1
df.loc[df.mental_health_status.isin([3]), "mental_health_status"] = 2

#Validate
df.groupby(df.mental_health_status, dropna=False, as_index=False).size()

In [None]:
# 15 physical_health_status
# 1	Zero days when physical health not good	
# 2	1-13 days when physical health not good
# 3	14+ days when physical health not good
# 9	Don’t know/Refused/Missing

# Don’t know/Refused/Missing, Zero days to 0
# 1-13 days when mental health not good to 1 
# 14+ days when mental health not good to 2
df.loc[df.physical_health_status.isin([9,1]), "physical_health_status"] = 0
df.loc[df.physical_health_status.isin([2]), "physical_health_status"] = 1
df.loc[df.physical_health_status.isin([3]), "physical_health_status"] = 2

#Validate
df.groupby(df.physical_health_status, dropna=False, as_index=False).size()

In [None]:
# 16 difficulty_walking        
# 1	    Yes	                    
# 2	    No	                    
# 7	    Don’t know/Not Sure	    
# 9	    Refused	                
# BLANK	Not asked or Missing

# Set Don't know/Not Sure, Refused, Not asked or Missing, or Null to 2
df.loc[df.difficulty_walking > 2, "difficulty_walking"] = 2.0 # No
df.loc[df.difficulty_walking.isnull(), "difficulty_walking"] = 2.0 # No

#Change to binary
df.loc[df.difficulty_walking == 1, "difficulty_walking"] = 1
df.loc[df.difficulty_walking == 2, "difficulty_walking"] = 0

#validate
df.groupby(df.difficulty_walking, dropna=False, as_index=False).size()

In [None]:
#17 gender

#Female to 0
df.loc[df.gender == 2, "gender"] = 0

#validate
df.groupby(df.gender, dropna=False, as_index=False).size()

In [None]:
#18 age
# 1	Age 18 to 24
# 2	Age 25 to 29
# 3	Age 30 to 34
# 4	Age 35 to 39
# 5	Age 40 to 44
# 6	Age 45 to 49
# 7	Age 50 to 54
# 8	Age 55 to 59
# 9	Age 60 to 64
# 10 Age 65 to 69
# 11 Age 70 to 74
# 12 Age 75 to 79
# 13 Age 80 or older
# 14 Don’t know/Refused/Missing

#validate
df.groupby(df.age, dropna=False, as_index=False).size()

In [None]:
#19 education
# 1	Did not graduate High School
# 2	Graduated High School
# 3	Attended College or Technical School
# 4	Graduated from College or Technical School
# 9	Don’t know/Not sure/Missing


df.loc[df.education.isin([9]), "education"] = 1

#validate
df.groupby(df.education, dropna=False, as_index=False).size()

In [None]:
#19 income
# 1	Less than $15,000
# 2	$15,000 to < $25,000
# 3	$25,000 to < $35,000
# 4	$35,000 to < $50,000
# 5	$50,000 to < $100,000
# 6	$100,000 to < $200,000
# 7	$200,000 or more
# 9	Don’t know/Not sure/Missing

# Move don't know to 0
df.loc[df.education.isin([9]), "income"] = 0

#validate
df.groupby(df.income, dropna=False, as_index=False).size()

In [None]:
# 20 race
# 1	White
# 2	Black or African American
# 3	American Indian or Alaskan Native
# 4	Asian
# 5	Native Hawaiian or other Pacific Islander
# 7	Multiracial but no preferred race
# 88	No race choice given
# 77	Don’t know/Not sure
# 99	Refused
# BLANK	Missing


# 1	White
# 2	Black or African American
# 3	American Indian or Alaskan Native
# 4	Asian
# 5	Native Hawaiian or other Pacific Islander
# 7	Multiracial but no preferred race
# 8 Unknown
df.loc[df.race.isin([88,77,99]), "race"] = 8

#validate
df.groupby(df.race, dropna=False, as_index=False).size()

In [None]:
df.sleep_time.mean().round()
#['std'].round()

In [None]:
# 21 sleep_time
# 1 - 24	Number of hours [1-24]
# 77	Don’t know/Not Sure
# 99	Refused
# BLANK	Missing

#Set unknown to mean
df.loc[df.sleep_time.isin([77,99,np.nan]), "sleep_time"] = df.sleep_time.mean().round()

#validate
df.groupby(df.sleep_time, dropna=False, as_index=False).size()

In [None]:
#22 years_smoked
df.groupby(df.years_smoked, dropna=False, as_index=False).size()

#No indication of smoking 0 years smoked
df.loc[df.years_smoked.isin([77,88,99,np.nan]), "years_smoked"] = 0

#validate
df.groupby(df.years_smoked, dropna=False, as_index=False).size()


In [None]:
#data with nulls remaining
print(f"Nulls remaining {df.isnull().sum().sum()}")

#remove nulls
df.dropna(inplace=True)

#Validate
print(f"Nulls remaining {df.isnull().sum().sum()}")

In [None]:
#Validate Nulls
pd.concat([df.isnull().sum(), df.nunique(), df.dtypes], axis = 1, sort= False, keys=['Null','Unique','Data Type'])

In [None]:

df.describe()

In [None]:
df.groupby(df.chd_mi, dropna=False, as_index=False).size()

In [None]:
df.groupby(df.asthma, dropna=False, as_index=False).size()


In [None]:
no_doctor_due_to_cost

In [None]:
df.columns