# Explore U.S. Baby Births

## Objective

Practice the basics of Python by exploring baby births in the U.S.

Working with Loops, Booleans & If Statements, Dictionaries, doing List operations, and Write a basic Functions


## Data Set

The data set contains U.S. births data for the years 1994 to 2003, as provided by the Centers for Disease Control and Prevention's National Center for Health Statistics and compiled by [FiveThirtyEight](https://raw.githubusercontent.com/fivethirtyeight/data/master/births/US_births_1994-2003_CDC_NCHS.csv)

The data set have the following structure :

| Header        | Definition                                     |
|---------------|------------------------------------------------|
| year          | Year                                           |
| month         | Month                                          |
| date_of_month | Day number of the month                        |
| day_of_week   | Day of week, where 1 is Monday and 7 is Sunday |
| births        | Number of births                               |


## Reading the Data

In [1]:
data_list = open("csv/US_births_1994-2003_CDC_NCHS.csv").read().split("\n")

Display the first 10 values in the resulting list

In [2]:
data_list[0:10]

['year,month,date_of_month,day_of_week,births',
 '1994,1,1,6,8096',
 '1994,1,2,7,7772',
 '1994,1,3,1,10142',
 '1994,1,4,2,11248',
 '1994,1,5,3,11053',
 '1994,1,6,4,11406',
 '1994,1,7,5,11251',
 '1994,1,8,6,8653',
 '1994,1,9,7,7910']

## Converting Data Into A List of Lists

Convert list of strings to into a list of lists to more structured format to be able to analyze it

In [3]:
def read_data(file_name):
    data_string = open(file_name).read()
    data_list = data_string.split("\n")[1:]
    final_data = []
    
    for each in data_list:
        each_string = each.split(",")
        int_list = []
        for val in each_string:
            int_list.append(int(val))
        final_data.append(int_list)
    return final_data

births_data = read_data("csv/US_births_1994-2003_CDC_NCHS.csv")

In [4]:
births_data[0:10]

[[1994, 1, 1, 6, 8096],
 [1994, 1, 2, 7, 7772],
 [1994, 1, 3, 1, 10142],
 [1994, 1, 4, 2, 11248],
 [1994, 1, 5, 3, 11053],
 [1994, 1, 6, 4, 11406],
 [1994, 1, 7, 5, 11251],
 [1994, 1, 8, 6, 8653],
 [1994, 1, 9, 7, 7910],
 [1994, 1, 10, 1, 10498]]

## Calculating Number of Births Each Month

Start to analyze the data;
Write a Function to calculate births per month

In [5]:
def calc_month_births(data):
    births_per_month = {}
    
    for each in data:
        month = each[1]
        births = each[4]
        if month in births_per_month:
            births_per_month[month] = births_per_month[month] + births
        else:
            births_per_month[month] = births
    return births_per_month

total_births_per_month = calc_month_births(births_data)
    

In [6]:
total_births_per_month

{1: 3232517,
 2: 3018140,
 3: 3322069,
 4: 3185314,
 5: 3350907,
 6: 3296530,
 7: 3498783,
 8: 3525858,
 9: 3439698,
 10: 3378814,
 11: 3171647,
 12: 3301860}

## Calculating Number of Births Each Day of Week

In [7]:
def calc_dow_births(data):
    births_per_dow = {}
    
    for each in data:
        dow = each[3]
        births = each[4]
        if dow in births_per_dow:
            births_per_dow[dow] = births_per_dow[dow] + births
        else:
            births_per_dow[dow] = births
    return births_per_dow

total_births_per_dow = calc_dow_births(births_data)

In [8]:
total_births_per_dow

{1: 5789166,
 2: 6446196,
 3: 6322855,
 4: 6288429,
 5: 6233657,
 6: 4562111,
 7: 4079723}

## Writing a more general Function 

Create a single function that works for any column

In [9]:
def calc_counts(data, column):
    sums_dict = {}
    
    for each in data:
        col = each[column]
        births = each[4]
        if col in sums_dict:
            sums_dict[col] = sums_dict[col] + births
        else:
            sums_dict[col] = births
    return sums_dict

total_births_per_year = calc_counts(births_data, 0) 
total_births_per_month = calc_counts(births_data, 1)
total_births_per_dom = calc_counts(births_data, 2) 
total_births_per_dow = calc_counts(births_data, 3) 

In [10]:
total_births_per_year

{1994: 3952767,
 1995: 3899589,
 1996: 3891494,
 1997: 3880894,
 1998: 3941553,
 1999: 3959417,
 2000: 4058814,
 2001: 4025933,
 2002: 4021726,
 2003: 4089950}

In [11]:
total_births_per_month

{1: 3232517,
 2: 3018140,
 3: 3322069,
 4: 3185314,
 5: 3350907,
 6: 3296530,
 7: 3498783,
 8: 3525858,
 9: 3439698,
 10: 3378814,
 11: 3171647,
 12: 3301860}

In [12]:
total_births_per_dom

{1: 1276557,
 2: 1288739,
 3: 1304499,
 4: 1288154,
 5: 1299953,
 6: 1304474,
 7: 1310459,
 8: 1312297,
 9: 1303292,
 10: 1320764,
 11: 1314361,
 12: 1318437,
 13: 1277684,
 14: 1320153,
 15: 1319171,
 16: 1315192,
 17: 1324953,
 18: 1326855,
 19: 1318727,
 20: 1324821,
 21: 1322897,
 22: 1317381,
 23: 1293290,
 24: 1288083,
 25: 1272116,
 26: 1284796,
 27: 1294395,
 28: 1307685,
 29: 1223161,
 30: 1202095,
 31: 746696}

In [13]:
total_births_per_dow

{1: 5789166,
 2: 6446196,
 3: 6322855,
 4: 6288429,
 5: 6233657,
 6: 4562111,
 7: 4079723}