# Food item lookup in the USDA SR dataset

This notebook contains experiments with food item lookups against the USDA SR dataset.

The dataset can be downloaded from the USDA FoodDataCentral site: https://fdc.nal.usda.gov/download-datasets.html

In [1]:
import pandas as pd

# for regex matching
import re

# for fuzzy string matching
from fuzzywuzzy import fuzz, process

Food item descriptions are contained in the `FOOD_DES` table (https://www.ars.usda.gov/ARSUserFiles/80400525/Data/SR-Legacy/SR-Legacy_Doc.pdf Section 4.1). This contains fields holding various names / descriptions that can be searched against.

In [17]:
FOOD_DES_filepath = "USDA_SR/FOOD_DES.txt"

FOOD_DES_schema = [
    "NDB_No",
    "FdGrp_Cd",
    "Long_Desc",
    "Shrt_Desc",
    "ComName",
    "ManufacName",
    "Survey",
    "Ref_desc",
    "Refuse",
    "SciName",
    "N_Factor",
    "Pro_Factor",
    "Fat_Factor",
    "CHO_Factor"
]

SR_foods = pd.read_csv(
        FOOD_DES_filepath,
        delimiter = "^",
        quotechar = "~",
        names=FOOD_DES_schema
    )
SR_foods[0:20]

Unnamed: 0,NDB_No,FdGrp_Cd,Long_Desc,Shrt_Desc,ComName,ManufacName,Survey,Ref_desc,Refuse,SciName,N_Factor,Pro_Factor,Fat_Factor,CHO_Factor
0,1001,100,"Butter, salted","BUTTER,WITH SALT",,,Y,,0.0,,6.38,4.27,8.79,3.87
1,1002,100,"Butter, whipped, with salt","BUTTER,WHIPPED,W/ SALT",,,Y,,0.0,,6.38,,,
2,1003,100,"Butter oil, anhydrous","BUTTER OIL,ANHYDROUS",,,Y,,0.0,,6.38,4.27,8.79,3.87
3,1004,100,"Cheese, blue","CHEESE,BLUE",,,Y,,0.0,,6.38,4.27,8.79,3.87
4,1005,100,"Cheese, brick","CHEESE,BRICK",,,Y,,0.0,,6.38,4.27,8.79,3.87
5,1006,100,"Cheese, brie","CHEESE,BRIE",,,Y,,0.0,,6.38,4.27,8.79,3.87
6,1007,100,"Cheese, camembert","CHEESE,CAMEMBERT",,,Y,,0.0,,6.38,4.27,8.79,3.87
7,1008,100,"Cheese, caraway","CHEESE,CARAWAY",,,,,0.0,,6.38,4.27,8.79,3.87
8,1009,100,"Cheese, cheddar (Includes foods for USDA's Foo...","CHEESE,CHEDDAR",,,Y,,0.0,,,,,
9,1010,100,"Cheese, cheshire","CHEESE,CHESHIRE",,,,,0.0,,6.38,4.27,8.79,3.87


We can see here that candidate fields for searching against include:
* `Long_Desc`
* `Short_Desc`
* `ComName`
* `ManufacName`
* `SciName`

Let's focus first on matching against the `Long_Desc` field. The most basic search method would be to simply query for names that contain the search term exactly (but ignoring case):

In [18]:
SR_foods[SR_foods['Long_Desc'].str.contains('apple', case=False)][0:20]

Unnamed: 0,NDB_No,FdGrp_Cd,Long_Desc,Shrt_Desc,ComName,ManufacName,Survey,Ref_desc,Refuse,SciName,N_Factor,Pro_Factor,Fat_Factor,CHO_Factor
249,1304,100,"Yogurt, Greek, 2% fat, pineapple, CHOBANI","YOGURT,GREEK,2% FAT,PNAPPL,CHOBANI",,Chobani,,,0.0,,6.38,,,
344,3022,300,"Babyfood, GERBER, 2nd Foods, apple, carrot and...","BABYFOOD,GERBER,2ND FOODS,APPL,CARROT & SQUASH...",,,Y,,0.0,,6.25,,,
345,3023,300,"Babyfood, finger snacks, GERBER, GRADUATES, PU...","BABYFOOD,FINGER SNACKS,GERBER,GRADUATES,PUFFS,...",,GERBER,Y,,0.0,,6.25,3.36,6.37,3.6
347,3025,300,"Babyfood, GERBER, 3rd Foods, apple, mango and ...","BABYFOOD,GERBER,3RD FOODS,APPL,MANGO & KIWI",,GERBER,Y,,0.0,,6.25,3.6,8.37,3.6
394,3115,300,"Babyfood, apples, dices, toddler","BABYFOOD,APPLS,DICES,TODD",,,Y,,0.0,,6.25,,,
395,3116,300,"Babyfood, fruit, applesauce, strained","BABYFOOD,FRUIT,APPLSAUC,STR",,,Y,,0.0,,6.25,3.36,8.37,3.6
396,3117,300,"Babyfood, fruit, applesauce, junior","BABYFOOD,FRUIT,APPLSAUC,JR",,,Y,,0.0,,6.25,3.36,8.37,3.6
416,3142,300,"Babyfood, fruit, applesauce and apricots, stra...","BABYFOOD,FRUIT,APPLSAUC&APRICOTS,STR",,,Y,,0.0,,6.25,3.36,8.37,3.6
417,3143,300,"Babyfood, fruit, applesauce and apricots, junior","BABYFOOD,FRUIT,APPLSAUC&APRICOTS,JR",,,Y,,0.0,,6.25,3.36,8.37,3.6
418,3144,300,"Babyfood, fruit, applesauce and cherries, stra...","BABYFOOD,FRUIT,APPLSAUC&CHERRIES,STR",,,Y,,0.0,,6.25,3.36,8.37,3.6


As the results demonstrate, this method is a bit too simplistic:
* what if there's a mis-spelling, either in the search term or in the record? Then this method will of course fail to find the desired results
* what if the search term is contained as part of another word? For example `apple` matches with both `pineapple` and `applebee's` as much as it matches with literally `apple`, and there's no prioritization of search results to account for this kind of "common sense" considerations.

We can address the first problem by using fuzzy string matching algorithms. There are different fuzzy matching algorithms with different pros and cons. Here are 5 different ones from Python's `fuzzywuzzy` package:
* `fuzz.ratio`
* `fuzz.partial_ratio`
* `fuzz.token_sort_ratio`
* `fuzz.token_set_ratio`
* `process.extract`

In [19]:
SR_foods['score'] = SR_foods['Long_Desc'].apply(lambda x : fuzz.ratio('apple', x.lower()))
SR_foods.sort_values('score', ascending=False)[0:20]

Unnamed: 0,NDB_No,FdGrp_Cd,Long_Desc,Shrt_Desc,ComName,ManufacName,Survey,Ref_desc,Refuse,SciName,N_Factor,Pro_Factor,Fat_Factor,CHO_Factor,score
5349,18354,1800,"Strudel, apple","STRUDEL,APPLE",,,Y,,0.0,,6.0,4.2,8.9,3.9,53
1533,7951,700,"Scrapple, pork","SCRAPPLE,PORK",,,Y,,0.0,,6.25,4.0,9.0,4.0,53
1813,9077,900,"Crabapples, raw","CRABAPPLES,RAW",,,,Core and stems,8.0,Malus spp.,6.25,3.36,8.37,3.6,50
2001,9312,900,"Rose-apples, raw","ROSE-APPLES,RAW",,,,Caps and pits,33.0,Syzygium jambos,6.25,3.36,8.37,3.6,48
7481,36019,3600,"APPLEBEE'S, chili","APPLEBEE'S,CHILI","family style, applebees",Applebee's,,,0.0,,,,,,45
5258,18240,1800,"Croissants, apple","CROISSANTS,APPLE",,,,,0.0,,5.9,4.0,8.8,4.0,45
5778,19340,1900,"Sugars, maple","SUGARS,MAPLE",,,Y,,0.0,,6.25,3.36,8.37,3.87,44
5785,19353,1900,"Syrups, maple","SYRUPS,MAPLE",,,Y,,0.0,,6.25,3.36,8.37,3.87,44
5987,20069,2000,Triticale,TRITICALE,,,,,0.0,X Triticosecale spp.,5.83,3.32,8.37,3.82,43
2598,11233,1100,"Kale, raw","KALE,RAW",,,,"Stem ends, tough stems and tough midrib parts",28.0,Brassica oleracea (Acephala Group),6.25,2.44,8.37,3.57,43


In [20]:
SR_foods['score'] = SR_foods['Long_Desc'].apply(lambda x : fuzz.partial_ratio('apple', x.lower()))
SR_foods.sort_values('score', ascending=False)[0:20]

Unnamed: 0,NDB_No,FdGrp_Cd,Long_Desc,Shrt_Desc,ComName,ManufacName,Survey,Ref_desc,Refuse,SciName,N_Factor,Pro_Factor,Fat_Factor,CHO_Factor,score
5741,19294,1900,"Fruit butters, apple","FRUIT BUTTERS,APPLE",,,Y,,0.0,,6.25,4.0,9.0,4.0,100
396,3117,300,"Babyfood, fruit, applesauce, junior","BABYFOOD,FRUIT,APPLSAUC,JR",,,Y,,0.0,,6.25,3.36,8.37,3.6,100
456,3191,300,"Babyfood, cereal, oatmeal, with applesauce and...","BABYFOOD,CRL,OATMEAL,W/ APPLSAUC & BANANAS,STR",,,Y,,0.0,,6.04,3.5,8.4,4.1,100
3904,14238,1400,"Beverages, cranberry-apple juice drink, bottled","BEVERAGES,CRANBERRY-APPLE JUC DRK,BTLD",,,,,0.0,,6.25,3.36,8.37,3.9,100
457,3192,300,"Babyfood, cereal, oatmeal, with applesauce and...","BABYFOOD,CRL,OATMEAL,W/ APPLSAUC & BANANAS,JR,...",,,Y,,0.0,,6.04,3.5,8.4,4.1,100
442,3174,300,"Babyfood, juice, orange and apple and banana","BABYFOOD,JUC,ORANGE&APPL&BANANA",,,Y,,0.0,,6.25,3.36,8.37,3.92,100
460,3195,300,"Babyfood, cereal, rice, with applesauce and ba...","BABYFOOD,CRL,RICE,W/ APPLSAUC & BANANAS,STR",,,Y,,0.0,,6.0,3.8,8.4,4.2,100
3941,14286,1400,"Beverages, MOTTS, Apple juice light, fortified...","BEVERAGES,MOTTS,APPL JUC LT,FORT W/ VIT C",,Mott's INC.,,,0.0,,6.25,,,,100
1688,8493,800,"Cereals ready-to-eat, MALT-O-MEAL, Apple ZINGS","CEREALS RTE,MALT-O-MEAL,APPL ZINGS",,MOM Brands,,,0.0,,6.25,,,,100
3944,14291,1400,"Beverages, SNAPPLE, tea, black and green, read...","BEVERAGES,SNAPPLE,TEA,BLACK & GRN,READY TO DRK...",,Snapple Beverage Corporation,,,0.0,,6.25,,,,100


In [21]:
SR_foods['score'] = SR_foods['Long_Desc'].apply(lambda x : fuzz.token_sort_ratio('apple', x.lower()))
SR_foods.sort_values('score', ascending=False)[0:20]

Unnamed: 0,NDB_No,FdGrp_Cd,Long_Desc,Shrt_Desc,ComName,ManufacName,Survey,Ref_desc,Refuse,SciName,N_Factor,Pro_Factor,Fat_Factor,CHO_Factor,score
1533,7951,700,"Scrapple, pork","SCRAPPLE,PORK",,,Y,,0.0,,6.25,4.0,9.0,4.0,56
5349,18354,1800,"Strudel, apple","STRUDEL,APPLE",,,Y,,0.0,,6.0,4.2,8.9,3.9,56
1813,9077,900,"Crabapples, raw","CRABAPPLES,RAW",,,,Core and stems,8.0,Malus spp.,6.25,3.36,8.37,3.6,53
2001,9312,900,"Rose-apples, raw","ROSE-APPLES,RAW",,,,Caps and pits,33.0,Syzygium jambos,6.25,3.36,8.37,3.6,50
7481,36019,3600,"APPLEBEE'S, chili","APPLEBEE'S,CHILI","family style, applebees",Applebee's,,,0.0,,,,,,48
5258,18240,1800,"Croissants, apple","CROISSANTS,APPLE",,,,,0.0,,5.9,4.0,8.8,4.0,48
5778,19340,1900,"Sugars, maple","SUGARS,MAPLE",,,Y,,0.0,,6.25,3.36,8.37,3.87,47
5785,19353,1900,"Syrups, maple","SYRUPS,MAPLE",,,Y,,0.0,,6.25,3.36,8.37,3.87,47
2598,11233,1100,"Kale, raw","KALE,RAW",,,,"Stem ends, tough stems and tough midrib parts",28.0,Brassica oleracea (Acephala Group),6.25,2.44,8.37,3.57,46
5793,19366,1900,"Toppings, pineapple","TOPPINGS,PINEAPPLE",,,Y,,0.0,,6.25,3.36,8.37,3.8,43


In [22]:
SR_foods['score'] = SR_foods['Long_Desc'].apply(lambda x : fuzz.token_set_ratio('apple', x.lower()))
SR_foods.sort_values('score', ascending=False)[0:20]

Unnamed: 0,NDB_No,FdGrp_Cd,Long_Desc,Shrt_Desc,ComName,ManufacName,Survey,Ref_desc,Refuse,SciName,N_Factor,Pro_Factor,Fat_Factor,CHO_Factor,score
3904,14238,1400,"Beverages, cranberry-apple juice drink, bottled","BEVERAGES,CRANBERRY-APPLE JUC DRK,BTLD",,,,,0.0,,6.25,3.36,8.37,3.9,100
3891,14220,1400,"Beverages, OCEAN SPRAY, Cranberry-Apple Juice ...","Beverages, OCEAN SPRAY, Cranberry-Apple Juice ...",,"Ocean Spray Cranberries, Inc.",,,0.0,,6.25,,,,100
3941,14286,1400,"Beverages, MOTTS, Apple juice light, fortified...","BEVERAGES,MOTTS,APPL JUC LT,FORT W/ VIT C",,Mott's INC.,,,0.0,,6.25,,,,100
534,3711,300,"Babyfood, cereal, high protein, with apple and...","BABYFOOD,CRL,HI PROT,W/APPL&ORANGE,PREP W/WHL ...",,,,,0.0,,0.0,,,,100
424,3153,300,"Babyfood, fruit, apple and raspberry, junior","BABYFOOD,FRUIT,APPL & RASPBERRY,JR",,,Y,,0.0,,6.25,3.36,8.37,3.6,100
423,3152,300,"Babyfood, fruit, apple and raspberry, strained","BABYFOOD,FRUIT,APPL & RASPBERRY,STR",,,Y,,0.0,,6.25,3.36,8.37,3.6,100
347,3025,300,"Babyfood, GERBER, 3rd Foods, apple, mango and ...","BABYFOOD,GERBER,3RD FOODS,APPL,MANGO & KIWI",,GERBER,Y,,0.0,,6.25,3.6,8.37,3.6,100
345,3023,300,"Babyfood, finger snacks, GERBER, GRADUATES, PU...","BABYFOOD,FINGER SNACKS,GERBER,GRADUATES,PUFFS,...",,GERBER,Y,,0.0,,6.25,3.36,6.37,3.6,100
5258,18240,1800,"Croissants, apple","CROISSANTS,APPLE",,,,,0.0,,5.9,4.0,8.8,4.0,100
5264,18246,1800,"Danish pastry, fruit, enriched (includes apple...","DANISH PASTRY,FRUIT,ENR",,,Y,,0.0,,5.8,4.1,8.8,3.9,100


In [24]:
match_list = process.extract('apple',SR_foods['Long_Desc'], limit=None)
match_list[0:50]

[('Babyfood, apples, dices, toddler', 90, 394),
 ('Babyfood, fruit, applesauce, strained', 90, 395),
 ('Babyfood, fruit, applesauce, junior', 90, 396),
 ('Babyfood, juice, apple', 90, 436),
 ('Babyfood, apple-banana juice', 90, 437),
 ('Babyfood, juice, apple and peach', 90, 438),
 ('Babyfood, juice, apple and prune', 90, 439),
 ('Babyfood, juice, orange and apple', 90, 441),
 ('Babyfood, juice, orange and pineapple', 90, 445),
 ('Babyfood, dessert, dutch apple, strained', 90, 475),
 ('Babyfood, dessert, dutch apple, junior', 90, 476),
 ('Babyfood, juice, apple and grape', 90, 491),
 ('Babyfood, juice, apple, with calcium', 90, 493),
 ('Babyfood, apples with ham, strained', 90, 502),
 ('Scrapple, pork', 90, 1533),
 ('Apples, raw, without skin', 90, 1748),
 ('Apples, dried, sulfured, uncooked', 90, 1754),
 ('Crabapples, raw', 90, 1813),
 ("Custard-apple, (bullock's-heart), raw", 90, 1821),
 ('Mammy-apple, (mamey), raw', 90, 1891),
 ('Pineapple, raw, all varieties', 90, 1960),
 ('Pineapp

`process.extract` maybe performs the best at getting literal `apples` to be near the top of the search results but still has some trouble favoring that over results like `pineapple`, `applesauce` and `applebee's`. Rather than digging into the fuzzy matching algorithms which are pretty complex, a simpler solution to improve the results would be to cascade other comparisons going from more exact to less exact matching.

In [9]:
# everything is case-insensitive
def new_score(search, name, score):
    # exact match
    if search.lower() == name.lower():
        modifier = 3
    # exact match of substring, *isolated*
    # for example 'apple' should match with 'apple, raw' but not 'pineapple'
    elif re.match(r'.*(^|[^a-zA-Z0-9]+)'+search+r'($|[^a-zA-Z0-9]+)',name, re.IGNORECASE):
        modifier = 2
    # simple '-s' plurals
    elif re.match(r'.*(^|[^a-zA-Z0-9]+)'+search+r's($|[^a-zA-Z0-9]+)',name, re.IGNORECASE):
        modifier = 1
    else:
        modifier = 0
    return modifier + score/100

[(name, new_score('egg',name,20)) for name in ['egg','legg','eggs','egg, white', 'eggs, white', ' egg ', ' egg', 'eggo', 'EGG', 'LEGG', 'EGGS', 'EGG, S', ' EGG ', 'EGGO']]

[('egg', 3.2),
 ('legg', 0.2),
 ('eggs', 1.2),
 ('egg, white', 2.2),
 ('eggs, white', 1.2),
 (' egg ', 2.2),
 (' egg', 2.2),
 ('eggo', 0.2),
 ('EGG', 3.2),
 ('LEGG', 0.2),
 ('EGGS', 1.2),
 ('EGG, S', 2.2),
 (' EGG ', 2.2),
 ('EGGO', 0.2)]

With this new cascading scoring method we can see that `egg` matches better than `egg, white` which matches better than `eggs, white` which matches better than `eggo` and `legg`.

In [25]:
def sort_criteria(item):
    _, score, _ = item
    return score

new_matches = [(name, new_score('egg', name, score), idx) for name, score, idx in match_list]
new_matches.sort(key=sort_criteria, reverse=True)
new_matches[0:50]

[('Egg, whole, cooked, scrambled', 2.54, 122),
 ('Fast foods, egg, scrambled', 2.54, 6117),
 ('Egg rolls, vegetable, frozen, prepared', 2.54, 6446),
 ('Babyfood, cereal, with egg yolks, junior', 2.51, 462),
 ('Wonton wrappers (includes egg roll wrappers)', 2.48, 5361),
 ('Bagels, egg', 2.38, 5084),
 ('Egg, whole, raw, fresh', 2.36, 113),
 ('Egg, yolk, raw, frozen, pasteurized', 2.36, 116),
 ('Egg, whole, cooked, fried', 2.36, 118),
 ('Egg, whole, cooked, hard-boiled', 2.36, 119),
 ('Egg, whole, cooked, omelet', 2.36, 120),
 ('Egg, whole, cooked, poached', 2.36, 121),
 ('Egg, whole, dried', 2.36, 123),
 ('Egg, duck, whole, fresh, raw', 2.36, 128),
 ('Egg, goose, whole, fresh, raw', 2.36, 129),
 ('Egg, quail, whole, fresh, raw', 2.36, 130),
 ('Egg, turkey, whole, fresh, raw', 2.36, 131),
 ('Egg substitute, powder', 2.36, 132),
 ('Egg, yolk, raw, frozen, salted, pasteurized', 2.36, 144),
 ('Egg, white, raw, frozen, pasteurized', 2.36, 154),
 ('Egg, whole, raw, frozen, salted, pasteurized'

In [26]:
match_list2 = process.extract('apple',SR_foods['Long_Desc'], limit=None)
match_list2[0:50]

[('Babyfood, apples, dices, toddler', 90, 394),
 ('Babyfood, fruit, applesauce, strained', 90, 395),
 ('Babyfood, fruit, applesauce, junior', 90, 396),
 ('Babyfood, juice, apple', 90, 436),
 ('Babyfood, apple-banana juice', 90, 437),
 ('Babyfood, juice, apple and peach', 90, 438),
 ('Babyfood, juice, apple and prune', 90, 439),
 ('Babyfood, juice, orange and apple', 90, 441),
 ('Babyfood, juice, orange and pineapple', 90, 445),
 ('Babyfood, dessert, dutch apple, strained', 90, 475),
 ('Babyfood, dessert, dutch apple, junior', 90, 476),
 ('Babyfood, juice, apple and grape', 90, 491),
 ('Babyfood, juice, apple, with calcium', 90, 493),
 ('Babyfood, apples with ham, strained', 90, 502),
 ('Scrapple, pork', 90, 1533),
 ('Apples, raw, without skin', 90, 1748),
 ('Apples, dried, sulfured, uncooked', 90, 1754),
 ('Crabapples, raw', 90, 1813),
 ("Custard-apple, (bullock's-heart), raw", 90, 1821),
 ('Mammy-apple, (mamey), raw', 90, 1891),
 ('Pineapple, raw, all varieties', 90, 1960),
 ('Pineapp

In [27]:
new_matches2 = [(name, new_score('apple', name, score), idx) for name, score, idx in match_list2]
new_matches2.sort(key=sort_criteria, reverse=True)
new_matches2[0:50]

[('Babyfood, juice, apple', 2.9, 436),
 ('Babyfood, apple-banana juice', 2.9, 437),
 ('Babyfood, juice, apple and peach', 2.9, 438),
 ('Babyfood, juice, apple and prune', 2.9, 439),
 ('Babyfood, juice, orange and apple', 2.9, 441),
 ('Babyfood, dessert, dutch apple, strained', 2.9, 475),
 ('Babyfood, dessert, dutch apple, junior', 2.9, 476),
 ('Babyfood, juice, apple and grape', 2.9, 491),
 ('Babyfood, juice, apple, with calcium', 2.9, 493),
 ("Custard-apple, (bullock's-heart), raw", 2.9, 1821),
 ('Mammy-apple, (mamey), raw', 2.9, 1891),
 ('Croissants, apple', 2.9, 5258),
 ('Pie, apple, prepared from recipe', 2.9, 5308),
 ('Strudel, apple', 2.9, 5349),
 ('Pie, Dutch Apple, Commercially Prepared', 2.9, 5476),
 ('Fruit butters, apple', 2.9, 5741),
 ('Pie fillings, apple, canned', 2.9, 5755),
 ('Babyfood, apple yogurt dessert, strained', 2.9, 7590),
 ('Babyfood, juice, apple-sweet potato', 2.9, 7623),
 ('Babyfood, juice, apple - cherry', 2.9, 7759),
 ('Babyfood, banana apple dessert, stra

Does pretty good, it eliminates matches like `pineapple` and `applebee's` for the search term `apple` but now we should prioritize those that match more primary categories as well as those that match isolated terms rather than compound terms, ex.. the following is an appropriate ordering for the search `apple`:

* `Apples, raw, without skin`
* `Apple juice, canned or bottled, unsweetened, without added ascorbic acid`
* `Babyfood, juice, apple`

In [28]:
SR_foods[SR_foods['Long_Desc'].str.contains('apple', case=False)][0:20]

Unnamed: 0,NDB_No,FdGrp_Cd,Long_Desc,Shrt_Desc,ComName,ManufacName,Survey,Ref_desc,Refuse,SciName,N_Factor,Pro_Factor,Fat_Factor,CHO_Factor,score
249,1304,100,"Yogurt, Greek, 2% fat, pineapple, CHOBANI","YOGURT,GREEK,2% FAT,PNAPPL,CHOBANI",,Chobani,,,0.0,,6.38,,,,24
344,3022,300,"Babyfood, GERBER, 2nd Foods, apple, carrot and...","BABYFOOD,GERBER,2ND FOODS,APPL,CARROT & SQUASH...",,,Y,,0.0,,6.25,,,,100
345,3023,300,"Babyfood, finger snacks, GERBER, GRADUATES, PU...","BABYFOOD,FINGER SNACKS,GERBER,GRADUATES,PUFFS,...",,GERBER,Y,,0.0,,6.25,3.36,6.37,3.6,100
347,3025,300,"Babyfood, GERBER, 3rd Foods, apple, mango and ...","BABYFOOD,GERBER,3RD FOODS,APPL,MANGO & KIWI",,GERBER,Y,,0.0,,6.25,3.6,8.37,3.6,100
394,3115,300,"Babyfood, apples, dices, toddler","BABYFOOD,APPLS,DICES,TODD",,,Y,,0.0,,6.25,,,,29
395,3116,300,"Babyfood, fruit, applesauce, strained","BABYFOOD,FRUIT,APPLSAUC,STR",,,Y,,0.0,,6.25,3.36,8.37,3.6,26
396,3117,300,"Babyfood, fruit, applesauce, junior","BABYFOOD,FRUIT,APPLSAUC,JR",,,Y,,0.0,,6.25,3.36,8.37,3.6,27
416,3142,300,"Babyfood, fruit, applesauce and apricots, stra...","BABYFOOD,FRUIT,APPLSAUC&APRICOTS,STR",,,Y,,0.0,,6.25,3.36,8.37,3.6,19
417,3143,300,"Babyfood, fruit, applesauce and apricots, junior","BABYFOOD,FRUIT,APPLSAUC&APRICOTS,JR",,,Y,,0.0,,6.25,3.36,8.37,3.6,20
418,3144,300,"Babyfood, fruit, applesauce and cherries, stra...","BABYFOOD,FRUIT,APPLSAUC&CHERRIES,STR",,,Y,,0.0,,6.25,3.36,8.37,3.6,19


In [29]:
SR_foods[SR_foods['Shrt_Desc'].str.contains('apple', case=False)][0:20]

Unnamed: 0,NDB_No,FdGrp_Cd,Long_Desc,Shrt_Desc,ComName,ManufacName,Survey,Ref_desc,Refuse,SciName,N_Factor,Pro_Factor,Fat_Factor,CHO_Factor,score
436,3166,300,"Babyfood, juice, apple","BABYFOOD,JUICE,APPLE",,,Y,,0.0,,6.25,3.36,8.37,3.92,100
437,3167,300,"Babyfood, apple-banana juice","BABYFOOD,APPLE-BANANA JUC",,,Y,,0.0,,,,,,100
1533,7951,700,"Scrapple, pork","SCRAPPLE,PORK",,,Y,,0.0,,6.25,4.0,9.0,4.0,56
1747,9003,900,"Apples, raw, with skin (Includes foods for USD...","APPLES,RAW,WITH SKIN",,,Y,Core and stem,10.0,Malus domestica,6.25,3.36,8.37,3.6,13
1748,9004,900,"Apples, raw, without skin","APPLES,RAW,WITHOUT SKIN",,,Y,"10% core and stem, 13% skin",23.0,Malus domestica,6.25,3.36,8.37,3.6,36
1749,9005,900,"Apples, raw, without skin, cooked, boiled","APPLES,RAW,WO/SKN,CKD,BLD",,,Y,,0.0,,6.25,3.36,8.37,3.6,24
1750,9006,900,"Apples, raw, without skin, cooked, microwave","APPLES,RAW,WO/ SKN,CKD,MICROWAVE",,,Y,,0.0,,6.25,3.36,8.37,3.6,22
1751,9008,900,"Apples, canned, sweetened, sliced, drained, he...","APPLES,CND,SWTND,SLICED,DRND,HTD",,,Y,,0.0,,6.25,3.36,8.37,3.7,20
1752,9009,900,"Apples, dehydrated (low moisture), sulfured, u...","APPLES,DEHYD (LO MOIST),SULFURED,UNCKD",,,Y,,0.0,,6.25,3.36,8.37,3.6,19
1753,9010,900,"Apples, dehydrated (low moisture), sulfured, s...","APPLES,DEHYD (LO MOIST),SULFURED,STWD",,,,,0.0,,6.25,3.36,8.37,3.6,20


In [30]:
SR_foods[SR_foods['ComName'].fillna("").str.contains('apple', case=False)][0:20]

Unnamed: 0,NDB_No,FdGrp_Cd,Long_Desc,Shrt_Desc,ComName,ManufacName,Survey,Ref_desc,Refuse,SciName,N_Factor,Pro_Factor,Fat_Factor,CHO_Factor,score
7317,35027,3500,"Cloudberries, raw (Alaska Native)","CLOUDBERRIES,RAW (ALASKA NATIVE)","baked apple berry, yellowberry, salmonberry",,,,0.0,Rubus chamaemorus L.,5.3,,,,17
7462,36000,3600,"APPLEBEE'S, 9 oz house sirloin steak","APPLEBEE'S,9 OZ HOUSE SIRLOIN STEAK","family style, applebees",Applebee's,,,0.0,,,,,,25
7463,36001,3600,"APPLEBEE'S, Double Crunch Shrimp","APPLEBEE'S,DOUBLE CRUNCH SHRIMP","family style, applebees",Applebee's,,,0.0,,,,,,28
7464,36002,3600,"APPLEBEE'S, french fries","APPLEBEE'S,FRENCH FR","family style, applebees",Applebee's,,,0.0,,,,,,36
7465,36003,3600,"APPLEBEE'S, KRAFT, Macaroni & Cheese, from kid...","APPLEBEE'S,KRAFT,MACARONI & CHS,FROM KID'S MENU","family style, applebees",Applebee's,,,0.0,,,,,,20
7466,36004,3600,"APPLEBEE'S, mozzarella sticks","APPLEBEE'S,MOZZARELLA STKS","family style, applebees",Applebee's,,,0.0,,,,,,30
7467,36005,3600,"APPLEBEE'S, chicken tenders, from kids' menu","APPLEBEE'S,CHICK TENDERS,FROM KIDS' MENU","family style, applebees",Applebee's,,,0.0,,,,,,22
7480,36018,3600,"APPLEBEE'S, fish, hand battered","APPLEBEE'S,FISH,HAND BATTERED","applebees, family style",Applebee's,,,0.0,,,,,,29
7481,36019,3600,"APPLEBEE'S, chili","APPLEBEE'S,CHILI","family style, applebees",Applebee's,,,0.0,,,,,,48
7483,36021,3600,"APPLEBEE'S, coleslaw","APPLEBEE'S,COLESLAW","applebees, family style",Applebee's,,,0.0,,,,,,42
