### Requesting Data from the USDA's NASS API

This is part 3 of a series of blogs posts on working with data from the USDA NASS database. It follows the second article <a href="" class="inlinelink"> insert info here</a>. In this post, we'll be using <a href="" class="inlinelink">JupyterLab</a> to request data from the NASS database using their API. Since we've already established our <a href="" class="inlinelink">credentials</a> and set up our infrastructure, we're ready to retrieve data and send it to AWS for storage. 

Before we can start requesting data, we need to import a number of modules that will helps us get make the requests and format the retreived information. Explaining each module is beyond the scope of this article but I'll point out that _boto3_ and _requests_ are two of the most important. _Requests_ helps us create http requests using Python and _boto3_ is AWS' Python software development kit. At the bottom of the next cell are variables imported from the container. These variables will allow us to upload our data to S3 later.

In [3]:
import ndjson
import requests
import random
import os
import pandas as pd
import time
import json
import copy
import boto3.session
key = os.environ.get("USDAKEY")
bucket = os.environ.get("CROP_BUCKET")
profile = os.environ.get("AWS_LP")

Now that we have the modules and variables set, we can make a test GET request to USDA NASS database. I'm retrieving information about North Carolina.

In [4]:
nc = requests.get(
            f'''http://quickstats.nass.usda.gov/api/api_GET/?key={key}&
            group_desc=INCOME&commodity_desc=COMMODITY+TOTALS&
            statisticcat_desc=SALES&unit_desc=$&state_alpha=NC&format=json''').json()['data']

This request retrieved 9627 records.

In [5]:
len(nc)

9627

By selecting the first record in this list, we can visualize and example of the data be received.

In [9]:
nc[0]

{'asd_code': '',
 'source_desc': 'CENSUS',
 'freq_desc': 'ANNUAL',
 'county_name': '',
 'zip_5': '',
 'asd_desc': '',
 'unit_desc': '$',
 'CV (%)': '4.3',
 'region_desc': '',
 'commodity_desc': 'COMMODITY TOTALS',
 'state_alpha': 'NC',
 'begin_code': '00',
 'country_code': '9000',
 'congr_district_code': '',
 'reference_period_desc': 'YEAR',
 'load_time': '2012-12-31 00:00:00',
 'watershed_code': '00000000',
 'country_name': 'UNITED STATES',
 'end_code': '00',
 'domaincat_desc': 'OPERATORS: (1 OPERATORS)',
 'agg_level_desc': 'STATE',
 'county_ansi': '',
 'county_code': '',
 'class_desc': 'ALL CLASSES',
 'year': 2012,
 'watershed_desc': '',
 'domain_desc': 'OPERATORS',
 'util_practice_desc': 'ALL UTILIZATION PRACTICES',
 'week_ending': '',
 'Value': '6,612,983,000',
 'prodn_practice_desc': 'ALL PRODUCTION PRACTICES',
 'location_desc': 'NORTH CAROLINA',
 'state_ansi': '37',
 'short_desc': 'COMMODITY TOTALS - SALES, MEASURED IN $',
 'statisticcat_desc': 'SALES',
 'group_desc': 'INCOME',
 

In [10]:
with open('test-state.jsonl','w') as f:
    ndjson.dump(nc, f)

In [77]:
states = ['AL', 'NC']
#     'AL', 'AK', 'AZ', 'AR', 'CA', 'CO',
#     'CT', 'DE', 'FL', 'GA', 'HI', 'ID',
#     'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 
#     'ME', 'MD', 'MA', 'MI', 'MN', 'MS', 
#     'MO', 'MT', 'NV', 'NE', 'NH', 'NJ', 
#     'NM', 'NY', 'NC', 'ND', 'OH', 'OK',
#     'OR', 'PA', 'RI', 'SC', 'SD', 'TN',
#     'TX', 'UT', 'VT', 'VA', 'WA', 'WV',
#     'WI', 'WY'
# ]

In [109]:
def state_info(state):
    
    state_info = {}
    
    for st in state:
        s = (
            requests.get(
            f'''http://quickstats.nass.usda.gov/api/api_GET/?key={key}&
            group_desc=INCOME&commodity_desc=COMMODITY+TOTALS&
            statisticcat_desc=SALES&unit_desc=$&state_alpha={st}&format=json''')
            .json()['data']
        )
        state_info[st] = s
        time.sleep(random.randint(2, 15))
        
    
    return(state_info)

In [110]:
si = state_info(states)

In [111]:
def test(d):
    
    ss = copy.deepcopy(d)
    
    for x in ss.values():
        for i in x[0:2]:
            i['cv_per'] = i.pop("CV (%)")
    
    return()

In [13]:
def change_column(nested_list):
    
    sd = copy.deepcopy(nested_list)
    
    final_ls = []
    
    for item in sd:
        for i in item:
            i['cv_per'] = i.pop("CV (%)")
        final_ls.append(item)
    
    return(final_ls)

In [15]:
cc = change_column(si)

In [16]:
def write_files(ls):
        
    for i, j in enumerate(ls):
        with open(f'../data/state_{i}.jsonl', 'w') as filehandle:
            [filehandle.write('%s\n' % item) for item in j]

In [17]:
write_files(cc)

In [69]:
s3 = (
    boto3.session.Session(profile_name=profile)
    .resource('s3')
)

In [21]:
def write_to_s3(ls):
    
    for file in ls:
        s3.Object(bucket, f'crop-data/{file}').upload_file(f'../data/{file}')

In [22]:
write_to_s3(gf)