# Exploring US Births

This juptyer notebook is based on a guided project from [DataQuest](https://www.dataquest.io), a data analytics tutorial.

The data comes from a [FiveThirtyEight](https://fivethirtyeight.com/) analysis, [Some People Are Too Superstitious to Have a Baby on Friday the 13th](https://github.com/fivethirtyeight/data/tree/master/births).

## Scope

In this project, I'll analyze CDC and SSA data to determine the frequency of births by

- year,
- month,
- and day of the week.


# Making a list...

I'll start by reading both datasets into lists, and then creating a list of those lists:

In [5]:
import csv
import requests

CDC_CSV_URL_1 = 'https://raw.githubusercontent.com/fivethirtyeight/data/master/births/US_births_1994-2003_CDC_NCHS.csv'
SSA_CSV_URL_2 = 'https://raw.githubusercontent.com/fivethirtyeight/data/master/births/US_births_2000-2014_SSA.csv'

seeEssVees = [CDC_CSV_URL_1, SSA_CSV_URL_2]
listOfLists = []

for each in seeEssVees:
    with requests.Session() as s:
        download = s.get(each)
        decoded_content = download.content.decode('utf-8')
        data = csv.reader(decoded_content.splitlines(), delimiter=',')
        listOfLists.append(list(data))
        

# Checking it twice...

The first element of each list should be a header row describing the datapoints that follow:

In [6]:
listOfLists[0][0:5]

[['year', 'month', 'date_of_month', 'day_of_week', 'births'],
 ['1994', '1', '1', '6', '8096'],
 ['1994', '1', '2', '7', '7772'],
 ['1994', '1', '3', '1', '10142'],
 ['1994', '1', '4', '2', '11248']]

In [7]:
listOfLists[1][0:5]

[['year', 'month', 'date_of_month', 'day_of_week', 'births'],
 ['2000', '1', '1', '6', '9083'],
 ['2000', '1', '2', '7', '8006'],
 ['2000', '1', '3', '1', '11363'],
 ['2000', '1', '4', '2', '13032']]

# Benchmarking...

Both datasets contain datapoints for the years 2000 - 2003. Presumably, the CDC and SSA births should match. 

The code below checks whether this is the case:

In [30]:
checkList = []

for each in listOfLists[0]:
    try:
        year = int(each[0])
        if (2000 <= year < 2004):
            checkList.append(each)
    except:
        pass


print("It's",(len(listOfLists[1]) == len(listOfLists[1])),"that both lists have the same number of elements.")



It's True that both lists have the same number of elements.


In [28]:
len(listOfLists[1])

5480

In [31]:
len(checkList)

1461