**I was given two files; `US_births_1994-2003_CDC_NCHS.csv`, and `US_births_2000-2014_SSA.csv`. The first file contains the total number of births in the United States from 1994-2003 according to the Center for Disease Control. The second file contains the total number of births in the United States from 2003-2013 according to the Social Security Agency. Let's see if we can find something interesting here.**

# 1) Open the files, make them useful

In [46]:
def read_csv(file_name):
    f = open(file_name,'r')
    data = f.read()
    split_list = data.split('\n')
    string_list = split_list[1:len(split_list)]
    final_list = []
    for each in string_list:
        int_fields = []
        string_fields = each.split(',')
        for every in string_fields:
            int_fields.append(int(every))
            final_list.append(int_fields)
    return final_list

In [47]:
def combine_data(csv_1, csv_2):
    data_set_1 = read_csv(csv_1)
    data_set_2 = read_csv(csv_2)
    third_dict = {}
    
    first_dict = {}
    for item in data_set_1:
        key_1 = item[0], item[1], item[2], item[3]
        value_1 = item[4]
        first_dict[key_1] = value_1
        third_dict.update(first_dict)
    
    second_dict = {}
    for row in data_set_2:
        key_2 = row[0], row[1], row[2], row[3]
        value_2 = row[4]
        second_dict[key_2] = value_2
        third_dict.update(second_dict)
    
    for key in first_dict:
        if key in second_dict:
            val_1 = first_dict[key]
            val_2 = second_dict[key]
            avg = int((int(val_1) + int(val_2)) / 2)
            third_dict[key] = avg
            
    aggregated_lst = []
    for entries in third_dict:
        temp_list = list(entries)
        temp_list.append(third_dict[entries])
        aggregated_lst.append(temp_list)
        
    sort_aggr = sorted(aggregated_lst)
    
    return sort_aggr

In [48]:
final_data = combine_data('US_births_1994-2003_CDC_NCHS.csv','US_births_2000-2014_SSA.csv')
final_data[0:10]

[[1994, 1, 1, 6, 8096],
 [1994, 1, 2, 7, 7772],
 [1994, 1, 3, 1, 10142],
 [1994, 1, 4, 2, 11248],
 [1994, 1, 5, 3, 11053],
 [1994, 1, 6, 4, 11406],
 [1994, 1, 7, 5, 11251],
 [1994, 1, 8, 6, 8653],
 [1994, 1, 9, 7, 7910],
 [1994, 1, 10, 1, 10498]]

# 3) Use it

## How many births are we talking about?

85,549,129 new humans from 1993-2013

In [49]:
def sum_list_index(input_list, index):
    total = 0
    for each in input_list:
        total = total + each[index]
    return total

total_births = sum_list_index(final_data, 4)
total_births

85549129

## What else can we find here?

Let's create a few lists to see the total number of births per variable; year, month, day of month (dom), and day of week (dow).

Here are two functions, `month_births` and `dow_births`, that calculate the total number of births for a specific variable, like months and days of the week,

In [43]:
def month_births(data):
    births_per_month = {}
    for days in data:
        month = days[1]
        births = days[4]
        if month in births_per_month:
            births_per_month[month] = births_per_month[month] + births
        else:
            births_per_month[month] = births
    return births_per_month

In [44]:
def dow_births(data):
    births_per_day_of_week = {}
    for days in data:
        day = days[3]
        births = days[4]
        if day in births_per_day_of_week:
            births_per_day_of_week[day] = births_per_day_of_week[day] + births
        else:
            births_per_day_of_week[day] = births
    return births_per_day_of_week

...but this method, `calc_counts`, is better because it uses one function to determine total births for any variable.

In [80]:
def calc_counts(data,column):
    births_per_variable = {}
    for days in data:
        variable = days[column]
        births = days[4]
        if variable in births_per_variable:
            births_per_variable[variable] = births_per_variable[variable] + births
        else:
            births_per_variable[variable] = births
    return births_per_variable

**Total births per year**

In [81]:
total_year_births = calc_counts(final_data, 0)
total_year_births

{1994: 3952767,
 1995: 3899589,
 1996: 3891494,
 1997: 3880894,
 1998: 3941553,
 1999: 3959417,
 2000: 4104119,
 2001: 4068359,
 2002: 4060428,
 2003: 4126419,
 2004: 4186863,
 2005: 4211941,
 2006: 4335154,
 2007: 4380784,
 2008: 4310737,
 2009: 4190991,
 2010: 4055975,
 2011: 4006908,
 2012: 4000868,
 2013: 3973337,
 2014: 4010532}

**Total births per month**

In [82]:
total_month_births = calc_counts(final_data, 1)
total_month_births

{1: 6965310,
 2: 6499459,
 3: 7134617,
 4: 6838762,
 5: 7162927,
 6: 7110295,
 7: 7514008,
 8: 7610244,
 9: 7425952,
 10: 7278923,
 11: 6869491,
 12: 7139141}

**Total births per dom**

In [83]:
total_dom_births = calc_counts(final_data, 2)
total_dom_births

{1: 2760350,
 2: 2794016,
 3: 2808740,
 4: 2761441,
 5: 2794079,
 6: 2802351,
 7: 2831672,
 8: 2835775,
 9: 2817897,
 10: 2840733,
 11: 2821260,
 12: 2842163,
 13: 2752336,
 14: 2850695,
 15: 2846776,
 16: 2838432,
 17: 2850928,
 18: 2851472,
 19: 2836179,
 20: 2860735,
 21: 2857502,
 22: 2833601,
 23: 2793161,
 24: 2751434,
 25: 2715173,
 26: 2752609,
 27: 2797042,
 28: 2822641,
 29: 2642868,
 30: 2589769,
 31: 1595299}

** Total births per dow**

In [84]:
total_dow_births = calc_counts(final_data, 3)
total_dow_births

{1: 12696731,
 2: 14044930,
 3: 13803846,
 4: 13717707,
 5: 13497703,
 6: 9433618,
 7: 8354594}

# What are the min and max births?
per year, month, dom, dow

**This function calculates and returns the min and the max for any variable.**

In [85]:
def max_min(dic):
    values_list = dic.values()
    max_value = max(values_list)
    min_value = min(values_list)
    max_key = next(key for key, value in dic.items() if value == max_value)
    min_key = next(key for key, value in dic.items() if value == min_value)
    return(max_key, max_value, min_key, min_value)

**There we're about 500k more births in 2007 than in 1997**

In [86]:
max_min_annual_births = max_min(total_year_births)
max_min_annual_births

(2007, 4380784, 1997, 3880894)

**February has the fewests births per month. 11 Million fewer than August, which has the most births per month.**

In [87]:
max_min_month_births = max_min(total_month_births)
max_min_month_births

(8, 7610244, 2, 6499459)

In [88]:
max_min_dom_births
max_min_dom_births = max_min(total_dom_births)

**14 Million births on Tuesdays. 8.3 Million births on Sundays.**

In [89]:
max_min_dow_births = max_min(total_dow_births)
max_min_dow_births

(2, 14044930, 7, 8354594)

# How have the number of births changed over time accross these variables?

In [90]:
def calc_counts2(data,column,val): 
    births = {} 
    for element in data: 
        if element[0] in births and element[column] == val: 
            births[element[0]] += element[4]
        else:
            if element[0] not in births and element[column] == val: 
                births[element[0]] = element[4] 
    return(births) 
def calc_diff(input_dict):
    differences = {}
    for i in range(1994,2013):
        differences[i+1] = round(((input_dict[i+1]-input_dict[i])/input_dict[i]*100),2)
    return differences

Percent change in **total** births per year

In [91]:
change_annual = calc_diff(total_year_births)
change_annual

{1995: -1.35,
 1996: -0.21,
 1997: -0.27,
 1998: 1.56,
 1999: 0.45,
 2000: 3.65,
 2001: -0.87,
 2002: -0.19,
 2003: 1.63,
 2004: 1.46,
 2005: 0.6,
 2006: 2.93,
 2007: 1.05,
 2008: -1.6,
 2009: -2.78,
 2010: -3.22,
 2011: -1.21,
 2012: -0.15,
 2013: -0.69}

Percent change in **February** births per year

In [92]:
change_february_births = calc_diff(calc_counts2(final_data,1,2))
change_february_births

{1995: -2.07,
 1996: 2.26,
 1997: -3.39,
 1998: 2.46,
 1999: -0.38,
 2000: 7.78,
 2001: -4.38,
 2002: 0.09,
 2003: 1.0,
 2004: 3.66,
 2005: -2.01,
 2006: 3.03,
 2007: 2.21,
 2008: 3.54,
 2009: -6.47,
 2010: -4.59,
 2011: -1.46,
 2012: 2.1,
 2013: -4.34}

Percent change in **first of month** births per year

In [93]:
change_first_of_month_births = calc_diff(calc_counts2(final_data,2,1))
change_first_of_month_births

{1995: -3.41,
 1996: -1.87,
 1997: -3.78,
 1998: 2.97,
 1999: 4.86,
 2000: 4.26,
 2001: -5.92,
 2002: 1.44,
 2003: -3.02,
 2004: 6.64,
 2005: 6.28,
 2006: -0.32,
 2007: -3.41,
 2008: -1.53,
 2009: -1.99,
 2010: 2.59,
 2011: 1.75,
 2012: -8.7,
 2013: 1.88}

Percent change in **Sunday** births per year

In [94]:
change_sunday_births = calc_diff(calc_counts2(final_data,3,7))
change_sunday_births

{1995: -0.69,
 1996: -2.92,
 1997: -2.14,
 1998: 0.66,
 1999: -1.26,
 2000: 4.63,
 2001: -4.72,
 2002: -1.56,
 2003: 0.42,
 2004: -0.09,
 2005: -1.77,
 2006: 4.72,
 2007: -0.87,
 2008: -1.85,
 2009: -3.11,
 2010: -2.66,
 2011: -0.97,
 2012: 3.09,
 2013: -1.26}