# Exploring U.S. Births 
### In this project I will be analyzing data from the Centers for Disease Control and Prevention's National Center for Health Statistics.

#### The 'births.csv' file contains U.S. birth data from 1994 to 2003. The csv file has the following structure:

| Header        |  Definition            |
|:--------------|:-----------------------|
| year          | Year                   |
| month         | Month                  |
| date_of_month | Day number of the month|
| day_of_week   | Day of week, where 1 is Monday and 7 is Sunday|
| births        | Number of births|

In [5]:
file = open('births.csv','r')
data = file.read()
data = data.split('\n')
data[:10]                       #Preview of first 10 rows of the data file

['year,month,date_of_month,day_of_week,births',
 '1994,1,1,6,8096',
 '1994,1,2,7,7772',
 '1994,1,3,1,10142',
 '1994,1,4,2,11248',
 '1994,1,5,3,11053',
 '1994,1,6,4,11406',
 '1994,1,7,5,11251',
 '1994,1,8,6,8653',
 '1994,1,9,7,7910']

## Converting The Data Into A List Of Lists
#### As we can see from the output above, the data is in string format. In order to analyze the data, I will first need to convert the data type into integer format (the data is all numbers), then I will need to create a list for each line, or day, of data.

In [6]:
def read_csv(file_name):                #Function to convert data file to list of lists
    file = open(file_name)
    data = file.read()
    string_list = data.split('\n')
    final_list = []
    
    for line in string_list[1:]:
        int_fields = []
        string_fields = line.split(',')
        for string in string_fields:
            int_fields.append(int(string))
        final_list.append(int_fields)
    
    return final_list
        
cdc_list = read_csv('births.csv')
cdc_list[:10]                           #Preview first 10 rows of list

[[1994, 1, 1, 6, 8096],
 [1994, 1, 2, 7, 7772],
 [1994, 1, 3, 1, 10142],
 [1994, 1, 4, 2, 11248],
 [1994, 1, 5, 3, 11053],
 [1994, 1, 6, 4, 11406],
 [1994, 1, 7, 5, 11251],
 [1994, 1, 8, 6, 8653],
 [1994, 1, 9, 7, 7910],
 [1994, 1, 10, 1, 10498]]

## Calculating Number Of Births Each Month

In [7]:
def month_births(list_name):            #Function to create dictionary of birth counts by month
    births_per_month = {}
  
    for line in list_name:
        month = line[1]
        births = line[4]
        
        if month in births_per_month:
            births_per_month[month] = births_per_month[month] + births
        else:
            births_per_month[month] = births
        
    return births_per_month
            
cdc_month_births = month_births(cdc_list)
cdc_month_births

{1: 3232517,
 2: 3018140,
 3: 3322069,
 4: 3185314,
 5: 3350907,
 6: 3296530,
 7: 3498783,
 8: 3525858,
 9: 3439698,
 10: 3378814,
 11: 3171647,
 12: 3301860}

## Calculating Number Of Births Per Day-Of-Month

In [8]:
def dow_births(list_name):
    births_per_weekday = {}
    
    for line in list_name:
        week_day = line[3]
        num_births = line[4]
        
        if week_day in births_per_weekday:
            births_per_weekday[week_day] = births_per_weekday[week_day] + num_births
        else:
            births_per_weekday[week_day] = num_births
    
    return births_per_weekday

cdc_day_births = dow_births(cdc_list)
cdc_day_births

{1: 5789166,
 2: 6446196,
 3: 6322855,
 4: 6288429,
 5: 6233657,
 6: 4562111,
 7: 4079723}

## Defining A General Function To Calculate Births By User Input
#### Previously I defined a few functions to calculate the number of births for a specific column (year, month, dday of week, etc.), however, it would be more efficient to define a single function that can calculate the number of births for any specified time-measure, based on two parameters: list name and column.
#### The function below is able to return the different outputs above, depending on the parameter measure given

In [9]:
def calc_counts(list_name, column):       #Defining a general function to calculate births by year, month, day, or week day
    totals_dict = {}
    
    for line in list_name:
        column_value = line[column]
        num_births = line[4]
        
        if column_value in totals_dict:
            totals_dict[column_value] = totals_dict[column_value] + num_births
        else:
            totals_dict[column_value] = num_births
            
    return totals_dict
    
cdc_year_births = calc_counts(cdc_list, 0)
cdc_month_births = calc_counts(cdc_list, 1)
cdc_dom_births = calc_counts(cdc_list, 2)
cdc_dow_births = calc_counts(cdc_list, 3)

### The cells below show the different dictionaries based on the second parameter:

In [10]:
print("Yearly births:")
cdc_year_births

Yearly births:


{1994: 3952767,
 1995: 3899589,
 1996: 3891494,
 1997: 3880894,
 1998: 3941553,
 1999: 3959417,
 2000: 4058814,
 2001: 4025933,
 2002: 4021726,
 2003: 4089950}

In [11]:
print("Monthly births")
cdc_month_births

Monthly births


{1: 3232517,
 2: 3018140,
 3: 3322069,
 4: 3185314,
 5: 3350907,
 6: 3296530,
 7: 3498783,
 8: 3525858,
 9: 3439698,
 10: 3378814,
 11: 3171647,
 12: 3301860}

In [12]:
print("Day-of-Month births:")
cdc_dom_births

Day-of-Month births:


{1: 1276557,
 2: 1288739,
 3: 1304499,
 4: 1288154,
 5: 1299953,
 6: 1304474,
 7: 1310459,
 8: 1312297,
 9: 1303292,
 10: 1320764,
 11: 1314361,
 12: 1318437,
 13: 1277684,
 14: 1320153,
 15: 1319171,
 16: 1315192,
 17: 1324953,
 18: 1326855,
 19: 1318727,
 20: 1324821,
 21: 1322897,
 22: 1317381,
 23: 1293290,
 24: 1288083,
 25: 1272116,
 26: 1284796,
 27: 1294395,
 28: 1307685,
 29: 1223161,
 30: 1202095,
 31: 746696}

In [13]:
print("Day-of-Week births:")
cdc_dow_births

Day-of-Week births:


{1: 5789166,
 2: 6446196,
 3: 6322855,
 4: 6288429,
 5: 6233657,
 6: 4562111,
 7: 4079723}

## Calculating The Minimum And Maximum Births For Each Time Scale
#### Now that I got the dictionaries with the number of births for each time scale, I want to find out what is the minimum and maximum births for each dictionary? 

In [27]:
print("Maximum and minimum births by year (1997-2003):")
print("Maximum births:", max(cdc_year_births.items(), key=lambda x: x[1]))
print("Minimum births:", min(cdc_year_births.items(), key=lambda x: x[1]))

Maximum and minimum births by year (1997-2003):
Maximum births: (2003, 4089950)
Minimum births: (1997, 3880894)


In [26]:
print("Maximum and minimim births by month (1-12):")
print("Maximum births:", max(cdc_month_births.items(), key=lambda x: x[1]))
print("Minimum births:", min(cdc_month_births.items(), key=lambda x: x[1]))

Maximum and minimim births by month (1-12):
Maximum births: (8, 3525858)
Minimum births: (2, 3018140)


In [29]:
print("Maximum and minimum births by day-of-month (1-31):")
print("Maximum births:", max(cdc_dom_births.items(), key=lambda x: x[1]))
print("Minimum births:", min(cdc_dom_births.items(), key=lambda x: x[1]))

Maximum and minimum births by day-of-month (1-31):
Maximum births: (18, 1326855)
Minimum births: (31, 746696)


In [31]:
print("Maximum and minimum births by daay of week (1-7):")
print("Maximum births:", max(cdc_dow_births.items(), key=lambda x: x[1]))
print("Minimum births:", min(cdc_dow_births.items(), key=lambda x: x[1]))

Maximum and minimum births by daay of week (1-7):
Maximum births: (2, 6446196)
Minimum births: (7, 4079723)


## The Table Below Summarizes The Max And Min Of Each Dictionary:

<h3 align="center">Minimum Births</h3> 

| Dictionary         |  Time of Minimum-Births| Number of Births |          
|:-------------------|:-----------------------|:-----------------|     
| cdc_year_births    | 1997                   | 3,880,894        |          
| cdc_month_births   | 2   (Febuary)          | 3,018,140        |
| cdc_dom_births     | 31                     | 746,696          |
| cdc_dow_births     | 7   (Sunday)           | 4,079,723        |

<h3 align="center">Maximum Births</h3> 

| Dictionary         |  Time of Maximum-Births | Number of Births |          
|:-------------------|:------------------------|:-----------------|          
| cdc_year_births    | 2003                    | 4,089,950        |          
| cdc_month_births   | 8  (August)             | 3,525,858        |
| cdc_dom_births     | 18                      | 1,326,855        |
| cdc_dow_births     | 2  (Tuesday)            | 6,446,196        |

## Analysis
- It seems that on a yearly basis, more people are born than in the previous year. The starting year in this dataset is 1997, which has the least recorded number of births, while the last year in the dataset is 2003, which has the most. This is expected since advancements in medicine and technology has allowed humans to live longer and healthier.
- On a monthly basis, Febuary had the least births while August had the most. This could mean that on average more people are born in the 2nd half of the year than the first half.
- The day that had the least number of births is the last day of the month, the 31st, however this could be because not all the months have 31 days. On the other hand, the 18th day had the most number of births, which could mean that most Americans are born mid-month.
- As for the day or week, Sunday had the least while Tuesday had the most. The gap between these two is the most in all of the dictionary, with a gap of over 2.36 million births. This means that on more Americans are born in the weekday than in the weekend.

## Future Research
#### To analyze number of births further, I plan on analyzing with a longer dataset, and with datasets from other countries as well. This dataset was only American births, but by analyzing number of burths in other countries I may find some interesting patterns or results. I also think an interesting analysis would be to analyze birthdays (especially on leap-years!).