# USDA FoodData Central Database API
Created April 28, 2021<br>
Alex Hegeman | ahegem1@gmail.com

## Resources
***

Overview:<br>https://fdc.nal.usda.gov/about-us.html<br>API documentation:<br>https://fdc.nal.usda.gov/api-guide.html<br>Data documentation:<br>https://fdc.nal.usda.gov/data-documentation.html<br>https://fdc.nal.usda.gov/help.html<br>Historical data downloads:<br>https://fdc.nal.usda.gov/download-datasets.html

## Data Sources / Types
***

There are 5 data types (sources) available in the FoodData Central (FDC) database. Some of these data types were previously housed in separate locations.

- **Foundation Foods**<span style="color:red">*</span>
    - expanded data values and extensive underlying metadata of analysis data including number of samples, sampling location, analytical approaches used, and if appropriate, agricultural information such as genotype and production practices
- **Experimental Foods**<span style="color:red">*</span>
    - Foods produced, aquired, or studied under unique conditions, such as alternative mangaement systems, experimental genotypes, or research/analytical protocols
- **SR Legacy**<span style="color:green">*</span>
    - Primary food composition data type in the United States for decades
    - comprehensive list of values for food components including nutrients, imputations, and published literature
    - April 2018 was the latest and final release of this data
- **Food and Nutrient Database for Dietary Studies (FNDDS)**<span style="color:green">*</span>
    - nutrient and food component values for foods reported in *What We Eat in America*, by the National health and Nutrition Examination Survey (NHANES)
    - released in two-year data cycles
- **USDA Global Branded Food Products Database**<span style="color:green">*</span>
    - formerly hosted on USDA Food Composition Database website
    - public-private partership
    - goal is to enhance the open sharing of nutrient data that appear on branded and private label foods and are provided by the food industry

<span style="color:red">* According to the documentation, the Foundation and Experimental Foods data will <i>'be  the primary focus of efforts in coming years'</i></span>

<span style="color:green">* These data are noted as <i>'well-established and familiar food composition data types'</i> </span>


### Data Type Conclusion

So after reviewing some of the documentation, my initial thoughts are to keep things simple and just consider the `SR Legacy` data for now.

- This data is noted as the primary data source for a long period, and is updated through 2018
- It seems like the FNDDS data (at least the nutrition data I'm concerned with right now) is a subset of the SR Legacy data
- The Foundational and Experimental are relatively new and probably don't offer much beyong the SR Legacy for my basic needs

## Exploration
***

<h3 style="color:Green">Questions / Thoughts</h3>

1. What is the difference between `calculated` and `analytical` derivation types?
    1. <span style="color:purple">I think analytical means they did chemistry analysis on food samples and calculated means the values are derived from another measurement (see below question)</span>
1. What are the conversion factors and how are they used?
    1. Fat, Carb, and Protein factors for kCal energy values (calculated)
        1. <span style="color:purple">I think that the kCal value is a multiplication of the values for these macronutrients or somthing similar.</span>
    1. Protein from nitrogen for protein values (analytical)
        1. <span style="color:purple">This is labeled as analytical in the web search for avocado. Kinda blows up my theory for the first question</span>
1. Can use `brandOwner` parameter in the /food/search endpoint to search for specific brands

In [2]:
from urllib.request import urlopen, Request
from urllib.parse import quote, urlencode
from explore_funcs import showkeys, showitems
import json
import pandas as pd
# insert valid API key below
api_key = ''
base_url = 'https://api.nal.usda.gov/fdc/v1'

### /food/search

In [70]:
# define API endpoint parameters
params = dict()
params['query'] = "cabbage"
params['dataType'] = "SR Legacy"
params['pageSize'] = 20
params['api_key'] = api_key
# create endpoint url and get search results for 'Avocado'
url = base_url + "/foods/search?" + urlencode(params)

with urlopen(url) as httpcon:
    payload = httpcon.read().decode()

av = json.loads(payload)

In [3]:
# show structure of json results
showkeys(av)

 totalHits <class 'int'> 
 currentPage <class 'int'> 
 totalPages <class 'int'> 
 pageList <class 'list'> 1
     <class 'int'> 1
 foodSearchCriteria <class 'dict'> 8
     dataType <class 'list'> 1
         <class 'str'> 1
     query <class 'str'> 7
     generalSearchInput <class 'str'> 7
     pageNumber <class 'int'> 
     numberOfResultsPerPage <class 'int'> 
     pageSize <class 'int'> 
     requireAllWords <class 'bool'> 
     foodTypes <class 'list'> 1
         <class 'str'> 1
 foods <class 'list'> 4
     <foods[0] dict keys>
         fdcId <class 'int'> 
         description <class 'str'> 12
         lowercaseDescription <class 'str'> 12
         commonNames <class 'str'> 0
         additionalDescriptions <class 'str'> 0
         dataType <class 'str'> 9
         ndbNumber <class 'int'> 
         publishedDate <class 'str'> 10
         foodCategory <class 'str'> 13
         allHighlightFields <class 'str'> 0
         score <class 'float'> 
         foodNutrients <class 'list'> 41


***
Ok, so it looks like we have a handful of meta fields and one field with the actual results.<br><br>
Metadata includes how many results are returned for the specified data type(s), number of pages, page information, search criteria, and number of matching results from other data types (not necessarily included
in the request).<br><br>
The food search results are in the 'foods' field, and it looks like we got 4 results. Let's take a look to see which result we may be most interested in
***

In [71]:
for i in range(len(av['foods'])):
    print(i, av['foods'][i]['description'])

0 Cabbage, kimchi
1 Cabbage, raw
2 Cabbage, mustard, salted
3 Cabbage, napa, cooked
4 Cabbage, red, raw
5 Cabbage, savoy, raw
6 Cabbage, chinese (pak-choi), raw
7 Cabbage, chinese (pe-tsai), raw
8 Cabbage, japanese style, fresh, pickled
9 Cabbage, common, cooked, boiled, drained, with salt
10 Cabbage, cooked, boiled, drained, without salt
11 Cabbage, red, cooked, boiled, drained, with salt
12 Cabbage, savoy, cooked, boiled, drained, with salt
13 Cabbage, red, cooked, boiled, drained, without salt
14 Cabbage, savoy, cooked, boiled, drained, without salt
15 Cabbage, chinese (pak-choi), cooked, boiled, drained, with salt
16 Cabbage, chinese (pe-tsai), cooked, boiled, drained, with salt
17 Cabbage, common (danish, domestic, and pointed types), stored, raw
18 Cabbage, chinese (pak-choi), cooked, boiled, drained, without salt
19 Cabbage, chinese (pe-tsai), cooked, boiled, drained, without salt


In [67]:
# I am interested in the whole raw avocado fruit.
# We'll pick the California variety for further exploration
av_ca = av['foods'][2]
# get id numbers
av_ca_desc = av_ca['description']
av_ca_id_fdc = av_ca['fdcId']
av_ca_id_ndb = av_ca['ndbNumber']
av_ca_cateogry = av_ca['foodCategory']
# get non-zero nutrient values
nutrient_list = []
for n in av_ca['foodNutrients']:
    if n['value'] > 0:
        nutrient_list.append(n)
# display information
print("Food:", av_ca_desc)
print("Category:", av_ca_cateogry)
print("FDC Id:", str(av_ca_id_fdc))
print("NDB Id:", str(av_ca_id_ndb))
print("\n")

nutrient_list.sort(key=lambda x: x['nutrientName'])
for n in nutrient_list:
    print("{} - {}  {}:{}{} {}".format(n['nutrientId'], n['nutrientNumber'] , n['nutrientName'], " "*10, n['value'], n['unitName']))

Food: Fish, salmon, sockeye, cooked, dry heat
Category: Finfish and Shellfish Products
FDC Id: 173692
NDB Id: 15086


1222 - 513  Alanine:          1.65 G
1220 - 511  Arginine:          1.72 G
1007 - 207  Ash:          1.5 G
1223 - 514  Aspartic acid:          2.71 G
1087 - 301  Calcium, Ca:          11.0 MG
1253 - 601  Cholesterol:          61.0 MG
1180 - 421  Choline, total:          113 MG
1098 - 312  Copper, Cu:          0.076 MG
1216 - 507  Cystine:          0.295 G
1008 - 208  Energy:          156 KCAL
1062 - 268  Energy:          653 kJ
1292 - 645  Fatty acids, total monounsaturated:          1.86 G
1293 - 646  Fatty acids, total polyunsaturated:          1.33 G
1258 - 606  Fatty acids, total saturated:          0.969 G
1257 - 605  Fatty acids, total trans:          0.023 G
1190 - 435  Folate, DFE:          7.0 UG
1187 - 432  Folate, food:          7.0 UG
1177 - 417  Folate, total:          7.0 UG
1224 - 515  Glutamic acid:          3.9 G
1225 - 516  Glycine:          1.27 G
122

***
Cool. Looks like a lot of good nutrient data. Something I don't see here is the portion sizes. Now that we have the FDC Id, we can make a request to the 'details' endpoint and see what additional information we can get there.

### /food/{fdcId}

The documentation says this endpoint reutrns details for a given food, as identified by that food's FDC Id number. We retrieved this id from the /food/search endpoint above. The documentation also states that if we want nutrient data for the given food, we must specify the nutrient number(s) in the request. However, only up to 25 nutrient numbers may be provided. So if we want more than that, it seems like maybe the search endpoint is going to be a better bet. But let's see what we've got here.
<br>
<div style="color:blue">EDIT -- The details endpoint actually returns all nutrients by default. The nutrient parameter is used to <i>reduce</i> the number of nutrients returned to only those specified</div>

In [12]:
# create endpoint url and retrieve detail data
# will limit nutrients returned to protein and calcium for this example
params_details = dict()
params_details['nutrients'] = '203,301,313' # 313 is not returned, presumably b/c avocados don't contain Fluoride
params_details['api_key'] = api_key

url_details = base_url + "/food/" + str(av_ca_id_fdc) + "?" + urlencode(params_details)

with urlopen(url_details) as httpcon:
    payload = httpcon.read().decode()

av_details = json.loads(payload)

In [13]:
# investigate json results
showkeys(av_details)

 fdcId <class 'int'> 
 description <class 'str'> 25
 publicationDate <class 'str'> 8
 foodNutrients <class 'list'> 2
     <foodNutrients[0] dict keys>
         type <class 'str'> 12
         nutrient <class 'dict'> 5
             id <class 'int'> 
             number <class 'str'> 3
             name <class 'str'> 7
             rank <class 'int'> 
             unitName <class 'str'> 1
         foodNutrientDerivation <class 'dict'> 4
             id <class 'int'> 
             code <class 'str'> 1
             description <class 'str'> 10
             foodNutrientSource <class 'dict'> 3
                 id <class 'int'> 
                 code <class 'str'> 1
                 description <class 'str'> 37
         id <class 'int'> 
         amount <class 'float'> 
         dataPoints <class 'int'> 
         max <class 'float'> 
         min <class 'float'> 
 foodPortions <class 'list'> 3
     <foodPortions[0] dict keys>
         id <class 'int'> 
         dataPoints <class 'int'> 
      

In [14]:
print(json.dumps(av_details['foodNutrients'], indent=4))

[
    {
        "type": "FoodNutrient",
        "nutrient": {
            "id": 1003,
            "number": "203",
            "name": "Protein",
            "rank": 600,
            "unitName": "g"
        },
        "foodNutrientDerivation": {
            "id": 1,
            "code": "A",
            "description": "Analytical",
            "foodNutrientSource": {
                "id": 1,
                "code": "1",
                "description": "Analytical or derived from analytical"
            }
        },
        "id": 1632893,
        "amount": 1.96,
        "dataPoints": 30,
        "max": 3.0,
        "min": 1.53
    },
    {
        "type": "FoodNutrient",
        "nutrient": {
            "id": 1087,
            "number": "301",
            "name": "Calcium, Ca",
            "rank": 5300,
            "unitName": "mg"
        },
        "foodNutrientDerivation": {
            "id": 1,
            "code": "A",
            "description": "Analytical",
            "foodNutrien

***
So it looks like the nutrient data is pretty similar to that available in the search endpoint. However, one noteable difference is that the details data includes the `min`, `max`, and `dataPoints` values for a nutrient when available.
<br><br>Let's look at some of the other data available.
***

In [45]:
print(av_details['foodClass'])
print(av_details['scientificName'])
print(av_details['foodCategory'])

FinalFood
Persea americana
{'id': 9, 'code': '0900', 'description': 'Fruits and Fruit Juices'}


In [46]:
print(json.dumps(av_details['foodPortions'], indent=4))

[
    {
        "id": 89229,
        "dataPoints": 22,
        "gramWeight": 136.0,
        "sequenceNumber": 2,
        "amount": 1.0,
        "modifier": "fruit, without skin and seed",
        "measureUnit": {
            "id": 9999,
            "name": "undetermined",
            "abbreviation": "undetermined"
        }
    },
    {
        "id": 89228,
        "gramWeight": 230.0,
        "sequenceNumber": 1,
        "amount": 1.0,
        "modifier": "cup, pureed",
        "measureUnit": {
            "id": 9999,
            "name": "undetermined",
            "abbreviation": "undetermined"
        }
    },
    {
        "id": 89230,
        "gramWeight": 50.0,
        "sequenceNumber": 3,
        "amount": 1.0,
        "modifier": "NLEA serving",
        "measureUnit": {
            "id": 9999,
            "name": "undetermined",
            "abbreviation": "undetermined"
        }
    }
]


***
So it seems like the `foodPortions` field may be the biggest value add from the details endpoint. This contains conversions from grams to other commonly-used measurement units, which will allow a calculation of nutrients based off the 100-gram baselines given in the nutrient information.
***

### /foods/list

In [51]:
params_list = dict()
# params_list['nutrients'] = '203,301'
params_list['dataType'] = 'SR Legacy'
params_list['pageSize'] = 3
params_list['api_key'] = api_key

url_list = base_url + "/foods/list?" + urlencode(params_list)
print(url_list)

with urlopen(url_list) as httpcon:
    payload = httpcon.read().decode()

food_list = json.loads(payload)

https://api.nal.usda.gov/fdc/v1/foods/list?dataType=SR+Legacy&pageSize=3&api_key=jUaOrG7dHdKMiz71EWhzJAIHwEldl2Lu9O9viofQ


***
So this could be helpful if you wanted to return a top 10 list of something ranked by a certain nutrient.
It will return a list of foods and you can specify how many to return and how the query should be sorted.
<br>
<div style="color:red">EDIT -- mmm nevermind. After reviewing documentation it looks like we can only sort on name/description, id, or published date. Not very useful for me</div>

***

## Play with Search Operators

In [2]:
# original query from above
params = dict()
params['query'] = "Avocado"
params['dataType'] = "SR Legacy"
params['pageSize'] = 4
params['api_key'] = api_key
# create endpoint url and get search results for 'Avocado'
url = base_url + "/foods/search?" + urlencode(params)

with urlopen(url) as httpcon:
    payload = httpcon.read().decode()

av = json.loads(payload)

for food in av['foods']:
    print(food['description'])

Oil, avocado
Avocados, raw, California
Avocados, raw, Florida
Avocados, raw, all commercial varieties


In [3]:
# updated query to exclude results containing 'oil' in description
params['query'] = "Avocado -oil"
url = base_url + "/foods/search?" + urlencode(params)

with urlopen(url) as httpcon:
    payload = httpcon.read().decode()

av = json.loads(payload)

for food in av['foods']:
    print(food['description'])

Avocados, raw, California
Avocados, raw, Florida
Avocados, raw, all commercial varieties


In [24]:
# find all results with descriptions containing 'berries' and 'raw'
params['query'] = '+*berries* +raw'
params['pageSize'] = ""
url = base_url + "/foods/search?" + urlencode(params)
print(url)
with urlopen(url) as httpcon:
    payload = httpcon.read().decode()

av = json.loads(payload)

for food in av['foods']:
    print(food['description'])

https://api.nal.usda.gov/fdc/v1/foods/search?query=%2B%2Aberries%2A+%2Braw&dataType=SR+Legacy&pageSize=&api_key=jUaOrG7dHdKMiz71EWhzJAIHwEldl2Lu9O9viofQ
Blackberries, raw
Blueberries, raw
Cranberries, raw
Elderberries, raw
Gooseberries, raw
Mulberries, raw
Oheloberries, raw
Raspberries, raw
Strawberries, raw
Cloudberries, raw (Alaska Native)
Huckleberries, raw (Alaska Native)
Salmonberries, raw (Alaska Native)
Blackberries, wild, raw (Alaska Native)
Blueberries, wild, raw (Alaska Native)
Groundcherries, (cape-gooseberries or poha), raw
Cranberries, wild, bush, raw (Alaska Native)


In [31]:
# find all results with common names containing 'burger'
params['query'] = 'commonNames:"hamburger"'
params['pageSize'] = ""
url = base_url + "/foods/search?" + urlencode(params)
print(url)
with urlopen(url) as httpcon:
    payload = httpcon.read().decode()

av = json.loads(payload)

for food in av['foods']:
    print(food['description'])

https://api.nal.usda.gov/fdc/v1/foods/search?query=commonNames%3A%22hamburger%22&dataType=SR+Legacy&pageSize=&api_key=jUaOrG7dHdKMiz71EWhzJAIHwEldl2Lu9O9viofQ
Beef, Australian, imported, grass-fed, ground, 85% lean / 15% fat, raw
Beef, grass-fed, ground, raw
Beef, ground, 70% lean meat / 30% fat, crumbles, cooked, pan-browned
Beef, ground, 70% lean meat / 30% fat, loaf, cooked, baked
Beef, ground, 70% lean meat / 30% fat, patty cooked, pan-broiled
Beef, ground, 70% lean meat / 30% fat, patty, cooked, broiled
Beef, ground, 70% lean meat / 30% fat, raw
Beef, ground, 75% lean meat / 25% fat, crumbles, cooked, pan-browned
Beef, ground, 75% lean meat / 25% fat, loaf, cooked, baked
Beef, ground, 75% lean meat / 25% fat, patty, cooked, broiled
Beef, ground, 75% lean meat / 25% fat, patty, cooked, pan-broiled
Beef, ground, 75% lean meat / 25% fat, raw
Beef, ground, 80% lean meat / 20% fat, crumbles, cooked, pan-browned
Beef, ground, 80% lean meat / 20% fat, loaf, cooked, baked
Beef, ground, 80