# <font color="crimson">Dataquest Guided Project 1:</font>
## Explore US Births

In this project, I analyze CDC and SSA data to determine the frequency of births by

- year, 
- month, 
- date of month, 
- and day of the week.

First, I will need to

- **scrape the raw data from the web** 
- **and convert the data into lists**. 

The code to do this follows:

In [1]:
import csv
import requests

CSV_URL_1 = 'https://raw.githubusercontent.com/fivethirtyeight/data/master/births/US_births_1994-2003_CDC_NCHS.csv'
CSV_URL_2 = 'https://raw.githubusercontent.com/fivethirtyeight/data/master/births/US_births_2000-2014_SSA.csv'

with requests.Session() as s:
    download = s.get(CSV_URL_1)

    decoded_content = download.content.decode('utf-8')

    cr1 = csv.reader(decoded_content.splitlines(), delimiter=',')
    my_list_1 = list(cr1)
        
with requests.Session() as s:
    download = s.get(CSV_URL_2)

    decoded_content = download.content.decode('utf-8')

    cr2 = csv.reader(decoded_content.splitlines(), delimiter=',')
    my_list_2 = list(cr2)


Now there are two lists containing the birth data:

- *my_list_1* has the CDC data,
- *and my_list_2* has the SSA data.

Both lists have a header with column denoting variable names. The following code shows that the headers are identical:


In [2]:
print(my_list_1[0])
print(my_list_2[0])

['year', 'month', 'date_of_month', 'day_of_week', 'births']
['year', 'month', 'date_of_month', 'day_of_week', 'births']


The next step is to **append one list to the other**. 

Since the columns in the lists are the same, I don't need to rearrange the columns before appending one list to the other. In other words, both lists record data on the same variables in the same order. 

NB: If this were not the case, Python would make it easy to swap columns. Say we wanted to swap the order of the variables *month* and *day_of_week* in the list *my_list_1*, we could do so with a simple loop:


In [3]:
for variable in my_list_1:
    variable[1], variable[3] = variable[3], variable[1]
    
print(my_list_1[0:10]) 

[['year', 'day_of_week', 'date_of_month', 'month', 'births'], ['1994', '6', '1', '1', '8096'], ['1994', '7', '2', '1', '7772'], ['1994', '1', '3', '1', '10142'], ['1994', '2', '4', '1', '11248'], ['1994', '3', '5', '1', '11053'], ['1994', '4', '6', '1', '11406'], ['1994', '5', '7', '1', '11251'], ['1994', '6', '8', '1', '8653'], ['1994', '7', '9', '1', '7910']]


But we don't want to do that, so we'll undo that code with some more:

In [4]:
for variable in my_list_1:
    variable[3], variable[1] = variable[1], variable[3]
    
print(my_list_1[0:10]) 

[['year', 'month', 'date_of_month', 'day_of_week', 'births'], ['1994', '1', '1', '6', '8096'], ['1994', '1', '2', '7', '7772'], ['1994', '1', '3', '1', '10142'], ['1994', '1', '4', '2', '11248'], ['1994', '1', '5', '3', '11053'], ['1994', '1', '6', '4', '11406'], ['1994', '1', '7', '5', '11251'], ['1994', '1', '8', '6', '8653'], ['1994', '1', '9', '7', '7910']]


Good as new!

Unfortunately, while the variable orders for the lists match, there is some overlap in the time periods they cover:


In [5]:
print(my_list_1[1],my_list_1[len(my_list_1)-1])
print(my_list_2[1],my_list_2[len(my_list_2)-1])

['1994', '1', '1', '6', '8096'] ['2003', '12', '31', '3', '12374']
['2000', '1', '1', '6', '9083'] ['2014', '12', '31', '3', '11990']
