<div style="font-size:16px; border:1px solid black; padding:10px">
    <center><h1>Goals</h1></center>
<ul>
<li>Demonstrate how to construct a Census API request and load the response into a pandas dataframe<br>          
    </li><br> 
</ul>
</div>

<hr style="border-top: 3px solid Black;">

# Import Dependencies

In [1]:
import requests
import pandas as pd

<div style="font-size:16px; border:1px solid black; padding:10px">
    <h3>Dependencies explained</h3>
<ul>    
    <li><code>requests</code>: library allows users to send HTTP/1.1 requests extremely easily. Obviates the need to manually add query strings to your URLs, or to form-encode your <code>PUT, POST</code> data.<ul>
            <li>Request Documentation(12/6/20)<br>     
                <a href="https://requests.readthedocs.io/en/master/">python requests documentation</a></li></ul>
    </li>
    <br>
    <li><code>pandas</code>: open source data analysis and manipulation tool, built on top of the Python programming language.
    </li>
    <li>Will be used in this example to load and manipulate census data
    <ul>
        <li>Python Documentation(12/6/20)<br>     
                <a href="https://pandas.pydata.org/">python pandas documentation</a></li>
    </ul>
    </li>
</ul>
</div>

<hr style="border-top: 3px solid Black;">

<div style="font-size:16px; border:1px solid black; padding:10px">

<h2>Elements of a Census API Request</h2>

<h3><em>Base URL</em>is everything before the <code>?</code> mark.</h3>
<ul>
    <li>Host = <code>https://api.census.gov/data</code></li>
     <li>Year = <code>2010</code></li>
     <li>Dataset = <code>dec/sf1</code></li>    
</ul> 

<h3><em>Query String</em> is everything after the <code>?</code> mark.</h3>
<ul>
    <li><code>get</code>: specifies the parameters such as the variables being requested.</li>
     <li><code>for</code>: specifies the geography of interest</li>
</ul> 
    </div>

<h3>Step 1: Define variables</h3>
<li><code>dataset</code>: stands for summary fie 1 and refers to the full count data from the decennial census.</li>

In [2]:
HOST = "https://api.census.gov/data"
year = "2010"
dataset = "dec/sf1"

<h3>Step 2: Build base URL by joining variables with slashes</h3>

In [3]:
base_url = "/".join([HOST, year, dataset])

<h3>Step 3: Build and Explain the <strong><code>requests.get()</code></strong> method and the <strong>query string</strong> (everything after question mark)</h3>

<strong><code>requests.get()</code></strong>: the request method accepts query parameters as a dictionary.<br> 
<ol>General Steps:
<li>Census API documentations refers to these parameters as <em>predicates</em>, so name and instantiate a python dictionary <strong><em>predicates = {}</em></strong></li>
    <ul>
<li><code>"Name"</code>: is the name of the geographic unit</li>
<li><code>"P001001"</code>: is the full population counts</li>
    </ul>        
<li><code>predicates["get"]</code>: is the dictionary key that is created using the <code>.join()</code> method to join the variable names into a comma-seperated string.</li>
<li><code>predicates["for"]</code>: Sets the geographic level</li>   <li><code>"state:*"</code>: the wild card requests all states</li> 
<li><code>r = requests.get(base_url, params=predicates)</code>: Executes the request and stores the return value in a response object that we called <strong><code>r</code></strong> for response in this example</li>     
</ol>    

In [4]:
predicates = {}
get_vars = ["NAME", "P001001"]
predicates["get"] = ",".join(get_vars)
predicates["for"] = "state:*"
r = requests.get(base_url, params=predicates)

<hr style="border-top: 3px solid Black;">

<h3>Step 4: Methods to Return Payload Response Object Content.</h3><br>

<div style="font-size:16px; border:1px solid black; padding:10px">
The response of a GET request is known as a payload. Using the attributes and methods of Response, you can view the payload in a variety of different formats.<br>
    <ul>
        <li><code>r.content</code>: access to the raw bytes of the payload. Not very readable.
        </li><br>
        <li><code>r.text</code>: converts the payload to a string using a character encoding such as UTF-8. You can specify the encoding using the <code>r.encoding()</code> method, with the encoding scheme as a string parameter, example <code>r.encoding('UTF-8')</code>.
        </li><br>    
         <li><code>r.json</code>:  returns payload as a dictionary, so you can access values in the object by key.
        </li>
         <li><code>r.header</code>:  returns a dictionary-like object, allowing you to access header values by key.
        </li><br>        
    </ul>
</div>

<h2>Using the <strong><code>.text()</code></strong> method on response object</h3>

In [5]:
print(r.text)

[["NAME","P001001","state"],
["Alabama","4779736","01"],
["Alaska","710231","02"],
["Arizona","6392017","04"],
["Arkansas","2915918","05"],
["California","37253956","06"],
["Louisiana","4533372","22"],
["Kentucky","4339367","21"],
["Colorado","5029196","08"],
["Connecticut","3574097","09"],
["Delaware","897934","10"],
["District of Columbia","601723","11"],
["Florida","18801310","12"],
["Georgia","9687653","13"],
["Hawaii","1360301","15"],
["Idaho","1567582","16"],
["Illinois","12830632","17"],
["Indiana","6483802","18"],
["Iowa","3046355","19"],
["Kansas","2853118","20"],
["Maine","1328361","23"],
["Maryland","5773552","24"],
["Massachusetts","6547629","25"],
["Michigan","9883640","26"],
["Minnesota","5303925","27"],
["Mississippi","2967297","28"],
["Missouri","5988927","29"],
["Montana","989415","30"],
["Nebraska","1826341","31"],
["Nevada","2700551","32"],
["New Hampshire","1316470","33"],
["New Jersey","8791894","34"],
["New Mexico","2059179","35"],
["New York","19378102","36"],
["

<div style="font-size:16px; border:1px solid black; padding:10px">
    <ul><code>r.text</code> method summary
        <li>payload is returned as a string.
        </li><br>
        <li>Each sublist is a "row" of data.
        </li><br>    
         <li>First row is the table header row.
        </li>
         <li>Since data are strings, with quotes, we will need clean to fix and convert to the correct data types in pandas.
        </li><br>        
    </ul>
</div>

<hr style="border-top: 3px solid Black;">

<h2>Using the <strong><code>.json()</code></strong> method on response object</h2>

In [6]:
# .json() returns a list of list. Inspect a few of these lists
for list_item in r.json()[0:10]:
    print(list_item)

['NAME', 'P001001', 'state']
['Alabama', '4779736', '01']
['Alaska', '710231', '02']
['Arizona', '6392017', '04']
['Arkansas', '2915918', '05']
['California', '37253956', '06']
['Louisiana', '4533372', '22']
['Kentucky', '4339367', '21']
['Colorado', '5029196', '08']
['Connecticut', '3574097', '09']


<div style="font-size:16px; border:1px solid black; padding:10px">
    <ul><strong><code>r.json</code> method summary</strong>
        <li>Returns a list of lists
        </li><br>    
         <li>First sublist is the table header row.
        </li><br>       
    </ul>
</div>

<hr style="border-top: 3px solid Black;">

<h2>Load Data into Pandas Dataframe</h2>

In [7]:
# step 1: create new column header names for ease of reading
col_names = ["state_name", "population_size", "state_id"]

In [8]:
# step 2: construct data frame using pandas
df = pd.DataFrame(columns=col_names, data=r.json()[1:])

<div style="font-size:16px; border:1px solid black; padding:10px">
    <ul><strong>Code Explanation</strong>
        <li><code>pd.Dataframe()</code> instantiates a dataframe object
        </li><br>    
         <li><a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html">Documentation</a>
        </li><br>  
        <li><code>columns</code> parameter is used to specify column names
        </li><br>
        <li><code>data</code> parameter is used to specify the data
        </li><br>   
        <li><code>r.json()[1:]</code> limits the payload response data to everything after the first list. The first list is the old column name list we do not want to import and use, so we skip it here.
        </li><br>         
    </ul>
</div>

<h2>Clean Data</h2>

In [9]:
# inspect datatypes using .dtypes method
df.dtypes

state_name         object
population_size    object
state_id           object
dtype: object

In [10]:
# change population size column from object (string) to integers
df['population_size'] = df['population_size'].astype(int)

In [11]:
# inspect data
df.head()

Unnamed: 0,state_name,population_size,state_id
0,Alabama,4779736,1
1,Alaska,710231,2
2,Arizona,6392017,4
3,Arkansas,2915918,5
4,California,37253956,6


<hr style="border-top: 3px solid Black;">

<h2>Export Data to a CSV file</h2>

<code>df.to_csv('census_data.csv')</code><br>
Documentation (12/6/20): <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html">to_csv</a>

<hr style="border-top: 3px solid Black;">

<div style="font-size:16px; border:1px solid black; padding:10px">
    <ul><strong>Common Error</strong>
        <li><code>error: unknown variable'nonexistentvariable'</code>: variable name incorrectly specified.
        </li><br>      
    </ul>
</div>