# Data USA API Pulls - Statewide

In order to pull median property value, median income, age breakdown, and population of every small town in Montana, I will need to call each of them by their unique identifier from the DataUSA API. 

The unique identifiers for each "place" in Data USA's database is stored in the table on this page: https://datausa.io/about/classifications/Geography/Place

I did a V-Lookup in Excel to match the IDs from the API to the towns for which I pulled listings (those with populations fewer than 2,500). 

Note for the future: if you do this pull at a later date, remember to provide some time between each pull so the API is not overly taxed. Allowing several hours between each pull resulted in no errors, while trying to run them one after the other results in many errors. 

In [1]:
# imports
import requests
import pandas as pd
from pandas.io.json import json_normalize
import csv
import json    
from collections import defaultdict

In [2]:
# import list of town names and unique ID
towns = pd.read_csv('placeIDs.towns.csv')
towns.drop_duplicates()
towns.head()

Unnamed: 0,Town,ID Look up
0,"Yaak, MT",16000US3082130
1,"Wyola, MT",16000US3082075
2,"Wye, MT",16000US3081980
3,"Worden, MT",16000US3081925
4,"Woods Bay, MT",16000US3081700


In [5]:
# pull out the IDs
IDs = towns['Town'].tolist()
len(IDs)

277

In [6]:
# these test IDs are used to test the code on a sample before running full scripts
test_IDs = IDs[1:3]
test_IDs

['Wyola, MT', 'Wye, MT']

### Median Home Values

In [6]:
home_values = defaultdict(list)

for ID in test_IDs :
    url = (f'https://datausa.io/api/data?measure=Property%20Value&Geography={ID}')

    results = requests.get(url).json()       
        
    for item in results['data'] :
       
        year = item['Year']
        property_value = item['Property Value']
        town = item['Geography']

        home_values[ID].append((year, property_value, town))   
       

In [None]:
with open('median_propvalue.txt','w') as outfile:
    # input the header row
    outfile.write('IDtown\tyear\tproperty_value\ttown\n')
    
    for town, list_of_values in home_values.items() :
        
        for list_of_values in home_values[town] :
            #print(list_of_values)
            
            out_line = [town]
           
            out_line.extend(list_of_values)
            
            outfile.write('\t'.join(str(v) for v in out_line) + '\n')

Note to self: there were no errors in this pull, though that line of code was deleted. 

### Median Income Values

In [22]:
income_values = defaultdict(list)
errors = []

for ID in IDs :
    url = (f'https://datausa.io/api/data?measure=Household%20Income%20by%20Race&Geography={ID}')

    try :
        results = requests.get(url).json()  
        
    except :
            errors.append(ID)
    
    for item in results['data'] :
       
        year = item['Year']
        income_value = item['Household Income by Race']
        town = item['Geography']

        income_values[ID].append((year, income_value, town))   
       

In [23]:
len(errors)

0

In [24]:
with open('median_income.txt','w') as outfile:
    # input the header row
    outfile.write('IDtown\tyear\tincome_value\ttown\n')
    
    for town, list_of_values in income_values.items() :
        
        for list_of_values in income_values[town] :
            #print(list_of_values)
            
            out_line = [town]
           
            out_line.extend(list_of_values)
            
            outfile.write('\t'.join(str(v) for v in out_line) + '\n')

### Age Distribution

In [7]:
ages = defaultdict(list)
errors = []

for ID in test_IDs :
    url_age = (f'https://datausa.io/api/data?Geography={ID}&measures=Birthplace,Birthplace%20Moe&drilldowns=Place of Birth,Age')

    try :
        results = requests.get(url).json()  
        
    except :
        errors.append(ID)
    
 

In [8]:
results

NameError: name 'results' is not defined

In [26]:
errors

[]

In [6]:
with open('age_data.txt','w') as outfile:
    # input the header row
    outfile.write('IDtown\tyear\tage_bands\ttown\n')
    
    for town, list_of_values in ages.items() :
        
        for list_of_values in ages[town] :
            #print(list_of_values)
            
            out_line = [town]
           
            out_line.extend(list_of_values)
            
            outfile.write('\t'.join(str(v) for v in out_line) + '\n')

In [5]:
len(errors)

0

### Populations

In [4]:
pop = defaultdict(list)
errors_pop = []

for ID in IDs :
    url = (f'https://datausa.io/api/data?Geography={ID}&measure=Population')

    try :
        results = requests.get(url).json()  
        
    except :
        errors_pop.append(ID)
    
    for item in results['data'] :
       
        year = item['Year']
        population = item['Population']
        town = item['Geography']

        pop[ID].append((year, population, town))   

In [5]:
with open('population_data.txt','w') as outfile:
    # input the header row
    outfile.write('IDtown\tyear\tpopulation\ttown\n')
    
    for town, list_of_values in pop.items() :
        
        for list_of_values in pop[town] :
            #print(list_of_values)
            
            out_line = [town]
           
            out_line.extend(list_of_values)
            
            outfile.write('\t'.join(str(v) for v in out_line) + '\n')

In [6]:
len(errors_pop)

0