# Introduction

This is an exploratory analysis on birth rates in the United States using birth data from The National Vital Statistics System.

This dataset includes birth rates for females aged 15-19 years in the United States since 1940. The number of states in the reporting area differ historically. In 1915 (when the birth registration area was established), 10 states and the District of Columbia reported births; by 1933, 48 states and the District of Columbia were reporting births, with the last two states, Alaska and Hawaii, added to the registration area in 1959 and 1960, when these regions gained statehood.

The data was used to answer the following questions:
1. How does the birth rate for Females Aged 15–19 years differ over the years ?
2. What are the States with the highest number of teen births in 2016? 
3. How do total births compare among the two age groups ?

# Data gathering and exploration

In [142]:
# Import libraries 
import numpy as np
import pandas as pd
import plotly 
import plotly.express as px


df = pd.read_csv("nchs-u.s.-and-state-trends-on-teen-births.csv")

In [143]:
df.head()

Unnamed: 0,Year,State,Age Group (Years),State Rate,State Births,U.S. Births,U.S. Birth Rate,Unit
0,1990,Alabama,15-17 years,47.4,4222,183327,37.5,"per 1,000"
1,1990,Alaska,15-17 years,31.2,335,183327,37.5,"per 1,000"
2,1990,Arizona,15-17 years,47.7,3436,183327,37.5,"per 1,000"
3,1990,Arkansas,15-17 years,50.4,2549,183327,37.5,"per 1,000"
4,1990,California,15-17 years,44.6,24880,183327,37.5,"per 1,000"


In [144]:
# Check for null values
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4212 entries, 0 to 4211
Data columns (total 8 columns):
Year                 4212 non-null int64
State                4212 non-null object
Age Group (Years)    4212 non-null object
State Rate           4212 non-null float64
State Births         4212 non-null int64
U.S. Births          4212 non-null int64
U.S. Birth Rate      4212 non-null float64
Unit                 4212 non-null object
dtypes: float64(2), int64(3), object(3)
memory usage: 263.3+ KB


Its clear that we don't have any null value

In [145]:
# Exploeing the data 
df['Age Group (Years)'].unique()

array(['15-17 years', '15-19 years', '18-19 years'], dtype=object)

# Data analysis and visualization

## Question 1: How does the birth rate for Females Aged 15–19 years differ over the years ?

In [146]:
df.head()

Unnamed: 0,Year,State,Age Group (Years),State Rate,State Births,U.S. Births,U.S. Birth Rate,Unit
0,1990,Alabama,15-17 years,47.4,4222,183327,37.5,"per 1,000"
1,1990,Alaska,15-17 years,31.2,335,183327,37.5,"per 1,000"
2,1990,Arizona,15-17 years,47.7,3436,183327,37.5,"per 1,000"
3,1990,Arkansas,15-17 years,50.4,2549,183327,37.5,"per 1,000"
4,1990,California,15-17 years,44.6,24880,183327,37.5,"per 1,000"


In [147]:
# Filtiring data 
df1=df[(df['State'] =='Total U.S.' )
  & (df['Age Group (Years)'] == '15-19 years')]
df1.head()

Unnamed: 0,Year,State,Age Group (Years),State Rate,State Births,U.S. Births,U.S. Birth Rate,Unit
1396,1990,Total U.S.,15-19 years,59.9,521826,521826,59.9,"per 1,000"
1448,1991,Total U.S.,15-19 years,61.8,519577,519577,61.8,"per 1,000"
1500,1992,Total U.S.,15-19 years,60.3,505415,505415,60.3,"per 1,000"
1552,1993,Total U.S.,15-19 years,59.0,501093,501093,59.0,"per 1,000"
1604,1994,Total U.S.,15-19 years,58.2,505488,505488,58.2,"per 1,000"


In [148]:
# Ploting the data
fig = px.line(df1, x="Year", y="U.S. Birth Rate", text="U.S. Birth Rate")
fig.update_traces(textposition="top center")
fig.update_xaxes(dtick=2, showgrid=False)
fig.update_yaxes(showgrid=False)
fig.update_layout(
    title = 'Birth rate over years',
    title_x= 0.5 )
fig.show()


## Question 2: What are the States with the highest number of teen births in 2016 ?

In [149]:
df2= df[(df['Year']== 2016)
       & (df['State']!='Total U.S.' )
       & (df['Age Group (Years)'] == '15-19 years')]

# Group data by State
df2 = df2.groupby(['State']).sum()['State Births'].reset_index()
df2.head()

Unnamed: 0,State,State Births
0,Alabama,4480
1,Alaska,583
2,Arizona,5357
3,Arkansas,3372
4,California,21412


In [150]:
fig = px.treemap(df2, path=[px.Constant("Total U.S."), 'State'], values='State Births' )
fig.data[0].textinfo = 'label+text+value'
fig.update_layout(
    title = 'Births by state',
    title_x= 0.5 )

fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show() 

## Question 3: How do total births compare among all age groups ?

In [151]:
df3= df[(df['Age Group (Years)'] != '15-19 years')
       & (df['State']!='Total U.S.' )]
df3 = df3.groupby(['Year','Age Group (Years)']).sum().reset_index()
df3.head()

Unnamed: 0,Year,Age Group (Years),State Rate,State Births,U.S. Births,U.S. Birth Rate
0,1990,15-17 years,1799.1,183327,9349677,1912.5
1,1990,18-19 years,4421.9,338499,17263449,4518.6
2,1991,15-17 years,1862.2,188226,9599526,1968.6
3,1991,18-19 years,4642.6,331351,16898901,4794.0
4,1992,15-17 years,1795.9,187549,9564999,1917.6


In [153]:
fig = px.bar(df3, x='Year', y='State Births', color='Age Group (Years)')
fig.update_layout(
    title = 'Births by age group',
    title_x= 0.5,
    xaxis_title = 'years',
    yaxis_title = 'Number of births'
)
fig.update_xaxes(dtick=2)
fig.update_yaxes(showgrid=False)
fig.show()
