# Exploratory Data Analysis

## Notebook Summary
In this notebook, I summarize where the data came from and some of its methodologies, and I look through some key columns in the dataset. I also explore important statistics for the project as a whole moving forward, and specifically examine distributions of various columns and how they relate to the target column.

## Data Source and Methodologies
The NSCH is a household survey that produces national and state-level data on the physical and emotional health of children 0 - 17 years old in the United States. Since 2016, the NSCH has been an annual survey. The survey supports national estimates every year and state-level estimates by combining 2 or 3 years of data. In this project I am looking at the 2020 data.

The survey collects information related to the health and well-being of children, including access to and use of health care, family interactions, parental health, school and after-school experiences, and neighborhood characteristics. A parent or other adult caregiver with knowledge of the sampled child’s health and health care filled out the topical questionnaire.

Survey topics include:
- Child and family characteristics
- **Physical and mental health status, including current conditions and functional difficulties**
- Health insurance status, type, and adequacy
- Access and use of health care services
- Medical, dental, and specialty care needed and received
- Family health and activities
- Impact of child’s health on family
- Neighborhood characteristics

Please see [this document](https://www2.census.gov/programs-surveys/nsch/technical-documentation/methodology/2020-NSCH-Methodology-Report.pdf) for a full report on the methodologies used by the US Census Bureau when obtaining this data.

## Loading and Exploring the Data
Lets first load and explore the data, and confirm its size and shape.

In [1]:
# Import statements
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
# Loading data into pandas dateframe
nsch = pd.read_sas('Data/nsch_2020_topical_SAS/nsch_2020_topical.sas7bdat')

FileNotFoundError: [Errno 2] No such file or directory: 'Data/nsch_2020_topical_SAS/nsch_2020_topical.sas7bdat'

## ADHD

Our target column for prediction will be "K2Q31A", which says:


> "Has a doctor or other health care provider EVER told you that this child has Attention Deficit Disorder or Attention-Deficit/Hyperactivity Disorder, that is, ADD or ADHD?"
