## Birth Dates in the United States

The raw data behind the story **Some People Are Too Superstitious To Have A Baby on Friday the 13th**, which you can read [here](http://fivethirtyeight.com/features/some-people-are-too-superstitious-to-have-a-baby-on-friday-the-13th/).

We'll be working with the data set from the Centers for Disease Control and Prevention's National Center for Health Statistics. The data set has the following structure:

- `year` - Year
- `month` - Month
- `date_of_month` - Day number of the month
- `day_of_week` - Day of week, where 1 is Monday and 7 is Sunday
- `births` - Number of births

the data can be downloaded [here](https://github.com/fivethirtyeight/data/tree/master/births/)

### Get the data

There are a few ways to read data directly from github.

In [49]:
url = "https://raw.githubusercontent.com/fivethirtyeight/data/master/births/US_births_2000-2014_SSA.csv"

Using csv and urllib. 

In [13]:
import csv
import urllib.request as ur

Open and read the html file to investigate the contents

In [50]:
file = ur.urlopen(url)
html = file.read()
text_urllib = html.decode()

We can also use requests package

In [51]:
import requests

text_requests = requests.get(url).text

From here on both methods use the same code. Split the file into a list for each row in the csv file by using the delimiter '\r'

In [47]:
split = text_urllib.split('\r')
split = text_requests.split('\r')

In [48]:
for row in split[0:5]:
    line = row.split(',')
    print(line)

['year', 'month', 'date_of_month', 'day_of_week', 'births']
['2000', '1', '1', '6', '9083']
['2000', '1', '2', '7', '8006']
['2000', '1', '3', '1', '11363']
['2000', '1', '4', '2', '13032']


Or we can use pandas

In [52]:
import pandas as pd
df = pd.read_csv(url,index_col=0,parse_dates=[0])

print(df.head(5))

            month  date_of_month  day_of_week  births
year                                                 
2000-01-01      1              1            6    9083
2000-01-01      1              2            7    8006
2000-01-01      1              3            1   11363
2000-01-01      1              4            2   13032
2000-01-01      1              5            3   12558


### Count births on each day of week

Create a dictionary containing the number of births on each unique day of the week

In [58]:
day_counts = {}
split_1 = split[1:len(split)]

for row in split_1:
    line = row.split(',')
    day_of_week = line[3]
    births = int(line[4])
    if day_of_week in day_counts.keys():
        day_counts[day_of_week] += births
    else:
        day_counts[day_of_week] = births
        
print(day_counts)

{'6': 6704495, '7': 5886889, '1': 9316001, '2': 10274874, '3': 10109130, '4': 10045436, '5': 9850199}


Or using pandas

In [56]:
df.groupby(df.day_of_week)['births'].sum()

day_of_week
1     9316001
2    10274874
3    10109130
4    10045436
5     9850199
6     6704495
7     5886889
Name: births, dtype: int64