## US Birth Data Set

[Original article by FiveThirtyEight about Friday the 13th](http://fivethirtyeight.com/features/some-people-are-too-superstitious-to-have-a-baby-on-friday-the-13th/)

The data set contains U.S. births data for the years 1994 to 2003, as provided by the Centers for Disease Control and Prevention's National Center for Health Statistics

---

---
## Imports

In [19]:
import os
import os.path as osp
from typing import Dict, List

### Assignment

- Open the CSV and split based on new lines
- Preview the first 10 entries

In [9]:
os.chdir(osp.join('..', 'data'))
with open('us_births.csv', 'r') as f:
    data = f.readlines()
data = [x.rstrip('\n') for x in data]
data[:10]

['year,month,date_of_month,day_of_week,births',
 '1994,1,1,6,8096',
 '1994,1,2,7,7772',
 '1994,1,3,1,10142',
 '1994,1,4,2,11248',
 '1994,1,5,3,11053',
 '1994,1,6,4,11406',
 '1994,1,7,5,11251',
 '1994,1,8,6,8653',
 '1994,1,9,7,7910']

### Assignment

- Create a function that takes in a CSV and converts the data into a list of lists
    - Each row will be a list
    - Make sure to convert the values to int
    - Return the final list of lists
- Preview the first 10 entries of the output

In [17]:
def convert_csv(csv_file: str) -> List[List]:
    """Convert a comma separated value (.csv) file into a list of lists.
    
    :param str csv_file: path to .csv file
    :returns: (header, csv data)
    :rtype: list of lists
    """
    with open(csv_file, 'r') as f:
        data = f.readlines()
    header = data[0].rstrip('\n').split(',')
    return header, [list(map(int, x.split(','))) for x in data[1:]]

header, data = convert_csv('us_births.csv')
print('Fields: {}'.format(tuple(zip(range(5), header))))
data[:10]

Fields: ((0, 'year'), (1, 'month'), (2, 'date_of_month'), (3, 'day_of_week'), (4, 'births'))


[[1994, 1, 1, 6, 8096],
 [1994, 1, 2, 7, 7772],
 [1994, 1, 3, 1, 10142],
 [1994, 1, 4, 2, 11248],
 [1994, 1, 5, 3, 11053],
 [1994, 1, 6, 4, 11406],
 [1994, 1, 7, 5, 11251],
 [1994, 1, 8, 6, 8653],
 [1994, 1, 9, 7, 7910],
 [1994, 1, 10, 1, 10498]]

### Assignment

- Create a function that calculates the number of births each month
    - The function input should be the previous list of lists you created
    - Use a dictionary and increment the values associated with each month key
    - Return the final dictionary
- Preview the output

In [37]:
def qty_birth_month(birth_data: List[List]) -> Dict:
    """Determine the number of births per month based on provided data.
    
    :param birth_data: data set of births
    :type: list of lists
    :returns: dictionary summing the births for a given month
    :rtype: dict
    """
    births = {}
    for entry in birth_data:
        key = entry[1]
        births[key] = births.get(key, 0) + entry[4]
    return births

jan = sum([x[4] for x in data if x[1] == 1])
print('Births in January: {}'.format(jan))

data_months = qty_birth_month(data)
data_months

Births in January: 3232517


{1: 3232517,
 2: 3018140,
 3: 3322069,
 4: 3185314,
 5: 3350907,
 6: 3296530,
 7: 3498783,
 8: 3525858,
 9: 3439698,
 10: 3378814,
 11: 3171647,
 12: 3301860}

### Assignment

- Create a function that calculates the number of births each day of the week

In [35]:
def qty_birth_days(birth_data: List[List]) -> Dict:
    """Determine the number of births per day of the week based on provided data.
    
    :param birth_data: data set of births
    :type: list of lists
    :returns: dictionary summing the births for a given month
    :rtype: dict
    """
    births = {}
    for entry in birth_data:
        key = entry[3]
        births[key] = births.get(key, 0) + entry[4]
    return births

day_1 = sum([x[4] for x in data if x[3] == 1])
print('Births on Day 1: {}'.format(day_1))

data_days = qty_birth_days(data)
data_days

Births on Day 1: 5789166


{1: 5789166,
 2: 6446196,
 3: 6322855,
 4: 6288429,
 5: 6233657,
 6: 4562111,
 7: 4079723}

### Assignment

- Create a general function that takes the data list of lists and a column index, it should return a dictionary mapping of the unique keys and summed values

In [39]:
def general_sum(input_data: List[List], col: int) -> Dict:
    """Sum the data on the requested column.
    
    :param input_data: data set
    :type: list of lists
    :returns: dictionary summing the requested column values
    :rtype: dict
    """
    sum_data = {}
    for entry in input_data:
        key = entry[col]
        sum_data[key] = sum_data.get(key, 0) + entry[4]
    return sum_data

general_sum(data, 3)

{1: 5789166,
 2: 6446196,
 3: 6322855,
 4: 6288429,
 5: 6233657,
 6: 4562111,
 7: 4079723}