# Test Notebook for Image Classification Project

## 1. Importing Libraries

This project relies on the following libraries:
- tensorflow - to access deep learning models required for image classification
- pillow - to process images
- numpy - to use the array format where needed
- pandas - to parse and manage the database
- requests - for API access

In [8]:
import tensorflow as tf
import numpy as np
import pandas as pd
import json
import requests
from PIL import Image
from time import sleep

## 2. Generating a Set of Labels

Here, I try three strategies for generating a set of labels, to be used for matching the output of the neural network that performs the image classification. Many of the labels for this project come from Open Food Facts, a database containing over 3,800,000 labeled food/home products. You can download the database here:

https://static.openfoodfacts.org/data/openfoodfacts-products.csv.gz

The strategies I adopted are as follows:

1. Processing a downloaded version of the Open Food Facts (OFF) database. This produced too large of a label set.

2. Processing the OFF database using their API. This caused rate-limiting and resulted in too small of a label set.

3. Generating a custom set of labels using a LLM (DeepSeek). This landed me in the Goldilocks zone, producing a label set just the right size for this project.

It is also possible to access items in the database via API. Both the download strategy and the API strategy are presented below.

Note that, for the purposes of this project, the machine learning model used to classify images and the image database on which that model was trained do *not* come from Open Food Facts. Various iterations of this program use MobileNet, ResNet, and Google's vision-classifier-food-v1 to classify images. I only use the Open Food Facts database to generate a set of potential labels that qualify as 'edible'.

### 2.1 Processing the Downloaded Database

First, let's write some code to extract unique food categories we need from the downloaded database.

In [3]:
def extract_edible_categories(csv_path, chunksize=100000):
    """Process Open Food Facts CSV in chunks with row-based updates"""
    try:
        # Initialize
        edible_categories = set()
        non_food = {'pet food', 'cosmetics', 'kitchen', 'tobacco'}
        processed_rows = 0
        update_interval = 100000
        
        # Get total rows for progress tracking (optional)
        with open(csv_path, 'rb') as f:
            total_rows = sum(1 for _ in f) - 1  # Subtract header
        
        print(f"Processing {total_rows:,} total rows...")
        
        # Configure chunked reading
        reader = pd.read_csv(
            csv_path,
            sep='\t',
            usecols=['categories'],
            chunksize=chunksize,
            low_memory=False,
            on_bad_lines='warn'
        )
        
        # Process chunks
        for chunk in reader:
            # Extract and clean categories
            for cat_list in chunk['categories'].dropna().str.split(','):
                edible_categories.update(
                    c.strip().lower() 
                    for c in cat_list 
                    if isinstance(c, str)
                )
            
            processed_rows += len(chunk)
            if processed_rows % update_interval == 0:
                print(f"  Processed {processed_rows:,} rows | {len(edible_categories):,} categories found")
        
        # Final filtering and results
        edible_categories -= non_food
        print(f"\nCompleted! Processed {processed_rows:,} total rows")
        print(f"Found {len(edible_categories):,} unique edible categories")
        
        return sorted(edible_categories)
    
    except Exception as e:
        print(f"\nError: {str(e)}")
        return None

# Usage
categories = extract_edible_categories(
    "../data/openfoodfacts/en.openfoodfacts.org.products.csv",
    chunksize=100000
)

if categories:
    # Save results
    with open("off_edible_labels.py", "w", encoding='utf-8') as f:
        f.write("# Auto-generated from Open Food Facts\n")
        f.write(f"EDIBLE_CATEGORIES = {categories}\n")
    print("Saved to off_edible_labels.py")

Processing 3,890,800 total rows...
  Processed 100,000 rows | 6,916 categories found
  Processed 200,000 rows | 9,640 categories found
  Processed 300,000 rows | 12,567 categories found
  Processed 400,000 rows | 14,991 categories found
  Processed 500,000 rows | 17,173 categories found
  Processed 600,000 rows | 19,550 categories found
  Processed 700,000 rows | 22,388 categories found
  Processed 800,000 rows | 24,184 categories found
  Processed 900,000 rows | 25,904 categories found
  Processed 1,000,000 rows | 26,966 categories found
  Processed 1,100,000 rows | 30,936 categories found
  Processed 1,200,000 rows | 38,566 categories found
  Processed 1,300,000 rows | 40,400 categories found
  Processed 1,400,000 rows | 42,586 categories found
  Processed 1,500,000 rows | 47,725 categories found
  Processed 1,600,000 rows | 52,433 categories found
  Processed 1,700,000 rows | 55,185 categories found
  Processed 1,800,000 rows | 58,464 categories found
  Processed 1,900,000 rows | 60

### 2.2 Generate a List of Items Using the Open Food Facts API
The previous cell generated an *enormous* list of food items with 119,787 categories of edible food items. You can view this list in the file off_edible_labels.py. Now, let's see if we can do the same thing using the OpenFoodFacts API. This method could be better, since it would enable us to avoid downloading a 10 GB dataset. It could also be worse due to latency when contacting the server.

In [13]:

def get_off_categories():
    """Fetch only food-related categories from Open Food Facts API"""
    categories = set()

    print("Fetching categories from Open Food Facts API...")

    try:
        url = "https://world.openfoodfacts.org/categories.json"
        headers = {
            "User-Agent": (
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/114.0.0.0 Safari/537.36"
            )
        }
        response = requests.get(url, headers=headers, timeout=10)
        response.raise_for_status()
        if response.headers.get("Content-Type", "").startswith("text/html"):
            print("Blocked by server. Response looks like HTML.")
            print(response.text[:300])
            return []

        data = response.json()
        all_tags = data.get('tags', [])
        print(f"Retrieved {len(all_tags)} raw categories")

        # Manual filtering
        non_food_keywords = {
            'cosmetic', 'pet', 'cleaning', 'baby food', 'tobacco',
            'hygiene', 'detergent', 'soap', 'shampoo', 'diaper',
            'beauty', 'makeup', 'vitamin', 'supplement', 'disinfectant',
            'sanitizer', 'candle', 'perfume', 'appliance', 'accessory'
        }

        for tag in all_tags:
            name = tag['name'].lower().strip()
            # If none of the bad words are in the name, keep it
            if not any(bad_word in name for bad_word in non_food_keywords):
                categories.add(name)

        print(f"Filtered down to {len(categories)} likely food-related categories")

    except requests.exceptions.RequestException as e:
        print(f"Error fetching categories: {str(e)}")

    return sorted(categories)


def save_categories(categories, filename="off_edible_labels_api.py"):
    """Save categories to a Python file"""
    with open(filename, 'w', encoding='utf-8') as f:
        f.write("# Auto-generated from Open Food Facts API\n")
        f.write("# Last updated: " + pd.Timestamp.now().strftime("%Y-%m-%d %H:%M:%S") + "\n\n")
        f.write("EDIBLE_CATEGORIES = [\n")
        
        # Write categories in alphabetical order, 8 per line
        for i in range(0, len(categories), 8):
            line_categories = categories[i:i+8]
            f.write("    " + ", ".join(f"'{cat}'" for cat in line_categories) + ",\n")
        
        f.write("]\n")
    print(f"Saved to {filename}")

# Execute
if __name__ == "__main__":
    try:
        categories = get_off_categories()
        
        # Optional: Filter non-food items
        non_food = {'pet food', 'cosmetics', 'tobacco'}
        categories = [cat for cat in categories if cat not in non_food]
        
        save_categories(categories)
    except Exception as e:
        print(f"Failed: {str(e)}")

Fetching categories from Open Food Facts API...
Retrieved 100 raw categories
Filtered down to 98 likely food-related categories
Saved to off_edible_labels_api.py


The results here are worth noting: the API rate-limits the request to 100 categories. As a result, the list the program produces is nowhere near big enough. We can try to avoid limiting by cycling through the paginated API pages. Let's see if that works below. 

In [14]:

def get_off_categories():
    """Fetch only food-related categories from Open Food Facts API (all pages)"""
    categories = set()
    headers = {
        "User-Agent": (
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
            "AppleWebKit/537.36 (KHTML, like Gecko) "
            "Chrome/114.0.0.0 Safari/537.36"
        )
    }
    page = 1
    print("Fetching categories from Open Food Facts API...")

    while True:
        try:
            url = f"https://world.openfoodfacts.org/categories/{page}.json"
            response = requests.get(url, headers=headers, timeout=10)
            response.raise_for_status()

            data = response.json()
            tags = data.get("tags", [])
            if not tags:
                break  # No more categories

            print(f"Page {page} | Retrieved {len(tags)} tags")

            non_food_keywords = {
                'cosmetic', 'pet', 'cleaning', 'baby food', 'tobacco',
                'hygiene', 'detergent', 'soap', 'shampoo', 'diaper',
                'beauty', 'makeup', 'vitamin', 'supplement', 'disinfectant',
                'sanitizer', 'candle', 'perfume', 'appliance', 'accessory'
            }

            for tag in tags:
                name = tag['name'].lower().strip()
                if not any(bad_word in name for bad_word in non_food_keywords):
                    categories.add(name)

            page += 1
            sleep(0.5)  # Rate limit

        except requests.exceptions.RequestException as e:
            print(f"Error on page {page}: {str(e)}")
            break

    print(f"Total food-related categories: {len(categories)}")
    return sorted(categories)



def save_categories(categories, filename="off_edible_labels_api_paginated.py"):
    """Save categories to a Python file"""
    with open(filename, 'w', encoding='utf-8') as f:
        f.write("# Auto-generated from Open Food Facts API\n")
        f.write("# Last updated: " + pd.Timestamp.now().strftime("%Y-%m-%d %H:%M:%S") + "\n\n")
        f.write("EDIBLE_CATEGORIES = [\n")
        
        # Write categories in alphabetical order, 8 per line
        for i in range(0, len(categories), 8):
            line_categories = categories[i:i+8]
            f.write("    " + ", ".join(f"'{cat}'" for cat in line_categories) + ",\n")
        
        f.write("]\n")
    print(f"Saved to {filename}")

# Execute
if __name__ == "__main__":
    try:
        categories = get_off_categories()
        
        # Optional: Filter non-food items
        non_food = {'pet food', 'cosmetics', 'tobacco'}
        categories = [cat for cat in categories if cat not in non_food]
        
        save_categories(categories)
    except Exception as e:
        print(f"Failed: {str(e)}")

Fetching categories from Open Food Facts API...
Page 1 | Retrieved 100 tags
Page 2 | Retrieved 100 tags
Error on page 3: 429 Client Error: Too Many Requests for url: https://world.openfoodfacts.org/facets/categories/3.json
Total food-related categories: 195
Saved to off_edible_labels_api_paginated.py


As expected, we get throttled using this approach as well. Looks like OpenFoodFacts just does not want people parsing their whole database all at once without downloading it. We can conclusively state that the API approach will not work. Now that this is known, let's move on to a final labeling method: a list of food items curated by an LLM.

### 2.3 Generating a Label Set Using an LLM

Finally, I took the DeepSeek LLM up on its offer to generate a custom set of labels for me. This produced the results shown below.

In [None]:
"""
Curated list of 5,216 edible items (foods, ingredients, and dishes)
Sources: Open Food Facts (OFF), Food-101, and USDA
Last updated: June 2024
"""

EDIBLE_LABELS = {
    
    # Fruits (342 items)
    'apple', 'banana', 'orange', 'strawberry', 'blueberry', 'mango', 'pineapple',
    'watermelon', 'kiwi', 'pear', 'peach', 'plum', 'grape', 'cherry', 'raspberry',
    'blackberry', 'cantaloupe', 'honeydew', 'apricot', 'nectarine', 'pomegranate',
    'fig', 'persimmon', 'guava', 'passion fruit', 'dragon fruit', 'lychee', 'star fruit',
    'kiwano', 'soursop', 'breadfruit', 'jackfruit', 'durian', 'rambutan', 'longan',
    'ackee', 'sapodilla', 'mangosteen', 'carambola', 'custard apple', 'tamarind',
    'date', 'prune', 'raisin', 'currant', 'goji berry', 'elderberry', 'boysenberry',
    'loganberry', 'mulberry', 'cranberry', 'lingonberry', 'cloudberry', 'salmonberry',
    'huckleberry', 'kiwi berry', 'marionberry', 'olallieberry', 'tayberry', 'youngberry',
    'jostaberry', 'wineberry', 'thimbleberry', 'serviceberry', 'buffaloberry', 'chokeberry',
    'barberry', 'gooseberry', 'sea buckthorn', 'kiwano melon', 'canary melon', 'crenshaw melon',
    'galia melon', 'horned melon', 'persian melon', 'santa claus melon', 'sharlyn melon',
    'casaba melon', 'charentais melon', 'gac melon', 'honey globe melon', 'jade dew melon',
    'korean melon', 'moon and stars melon', 'muskmelon', 'oriental melon', 'piel de sapo melon',
    'sugar melon', 'tigger melon', 'touchon melon', 'yubari melon', 'black sapote', 'mamey sapote',
    'white sapote', 'canistel', 'abiu', 'cempedak', 'chempedak', 'cupuacu', 'durian', 'guanabana',
    'ilama', 'jackfruit', 'keppel fruit', 'langsat', 'longan', 'loquat', 'lucuma', 'mangosteen',
    'marang', 'paniala', 'peanut butter fruit', 'pulasan', 'rambutan', 'salak', 'santol',
    'soursop', 'sugar apple', 'tamarillo', 'ugli fruit', 'wampee', 'yangmei', 'yellow mombin',
    'zapote', 'black mulberry', 'white mulberry', 'red mulberry', 'texas mulberry', 'himalayan mulberry',
    'pakistani mulberry', 'persian mulberry', 'russian mulberry', 'weeping mulberry', 'white shatoot',
    'black currant', 'red currant', 'white currant', 'pink currant', 'buffalo currant', 'clove currant',
    'golden currant', 'missouri currant', 'american black elderberry', 'blue elderberry', 'european elderberry',
    'red elderberry', 'dwarf elderberry', 'black chokeberry', 'purple chokeberry', 'red chokeberry',
    'american mayapple', 'himalayan mayapple', 'american persimmon', 'black sapote', 'date plum',
    'japanese persimmon', 'mabolo', 'texas persimmon', 'velvet apple', 'white sapote', 'yellow sapote',
    'african cucumber', 'bitter melon', 'cucamelon', 'ivy gourd', 'kiwano', 'snake gourd', 'tinda',
    'apple berry', 'blue sausage fruit', 'che', 'chilean guava', 'finger lime', 'fuchsia berry',
    'jabuticaba', 'miracle fruit', 'muntingia', 'naranjilla', 'pitomba', 'rose apple', 'rumberry',
    'safou', 'sapodilla', 'sweet granadilla', 'tamarind', 'wax jambu', 'white jaboticaba', 'yellow mombin',
    'abiu', 'acerola', 'akee', 'amla', 'aronia', 'bacuri', 'biriba', 'black apple', 'burdekin plum',
    'caimito', 'camu camu', 'cape gooseberry', 'carissa', 'cempedak', 'cocona', 'dabai', 'damson',
    'elephant apple', 'emblic', 'grumichama', 'imbe', 'june plum', 'kabosu', 'kakadu plum', 'karonda',
    'kepel apple', 'korlan', 'kumquat', 'lanzones', 'lemon aspen', 'lucuma', 'madrono', 'malay apple',
    'mamoncillo', 'mangaba', 'marula', 'maqui berry', 'midgen berry', 'mombin', 'monstera deliciosa',
    'mora de castilla', 'morinda', 'mountain soursop', 'mundu', 'muscadine', 'nance', 'noni', 'oil palm fruit',
    'pequi', 'pili nut', 'poha berry', 'pomelo', 'pulasan', 'quince', 'riberry', 'rollinia', 'sageretia',
    'santol', 'sapote', 'saskatoon berry', 'soncoya', 'sugar palm fruit', 'surinam cherry', 'tamarind',
    'ugni', 'velvet tamarind', 'wampee', 'white sapote', 'yangmei', 'yellow mombin', 'zapote', 'ziziphus',
    'ackee', 'african cucumber', 'amazon grape', 'arctic bramble', 'ataulfo mango', 'babaco', 'bacupari',
    'bali citrus', 'batuan', 'bignay', 'bilimbi', 'binjai', 'black apple', 'black raspberry', 'blood lime',
    'blue marble fruit', 'bolivian mountain coconut', 'bottle gourd', 'brazilian grape', 'burmese grape',
    'calabash', 'calamansi', 'calamondin', 'cambuca', 'camu camu', 'candlenut', 'cape gooseberry',
    'carambola', 'cashew apple', 'cattley guava', 'ceriman', 'charichuelo', 'cherimoya', 'chico',
    'chinese mulberry', 'cocoplum', 'coffee cherry', 'corossol', 'cranberry hibiscus', 'cupuacu',
    'custard apple', 'dabai', 'date', 'desert lime', 'djenkol', 'doubah', 'duku', 'elephant apple',
    'emerald apple', 'feijoa', 'fibrous satinash', 'finger lime', 'fuji apple', 'gac', 'galia melon',
    'garcinia', 'genip', 'golden apple', 'gooseberry', 'goraka', 'green apple', 'grumichama', 'guavaberry',
    'hala fruit', 'hog plum', 'honeyberry', 'horned melon', 'ice cream bean', 'illawarra plum', 'imbe',
    'indian almond', 'indian gooseberry', 'indian jujube', 'indian prune', 'jaboticaba', 'jambolan',
    'japanese raisin', 'jocote', 'junglesop', 'kabosu', 'kakadu plum', 'kaki', 'kapundung', 'karonda',
    'kasturi', 'kawista', 'kepel', 'ketembilla', 'key lime', 'kitembilla', 'kokum', 'kumquat', 'kundong',
    'kwai muk', 'lakoocha', 'lanzones', 'lemonade fruit', 'lilly pilly', 'longan', 'loquat', 'lucuma',
    'macadamia nut', 'madrono', 'malay apple', 'mamey', 'mamoncillo', 'mandarin', 'mangosteen', 'marang',
    'marula', 'maypop', 'medlar', 'melinjo', 'mexican lime', 'midyim', 'monkey orange', 'monstera deliciosa',
    'morinda', 'mountain papaya', 'muntingia', 'muscadine', 'nagami', 'nance', 'naranjilla', 'natal plum',
    'neem', 'noni', 'oil palm fruit', 'oregon grape', 'palmyra', 'pawpaw', 'peach palm', 'pear', 'pequi',
    'persimmon', 'pigeon plum', 'pili nut', 'pineapple guava', 'pitomba', 'poha', 'pomelo', 'pulasan',
    'quandong', 'quince', 'rambai', 'rangpur', 'red banana', 'riberry', 'rose apple', 'rowal', 'safou',
    'salak', 'santol', 'sapodilla', 'saskatoon', 'sea buckthorn', 'soncoya', 'soursop', 'spanish lime',
    'star apple', 'sugar apple', 'surinam cherry', 'sweet granadilla', 'sweetie', 'tamarillo', 'tangelo',
    'ugni', 'velvet apple', 'wampee', 'water apple', 'white sapote', 'wild orange', 'yangmei', 'yuzu',
    'zapote', 'ziziphus',

    # Vegetables (418 items)
    'artichoke', 'arugula', 'asparagus', 'aubergine', 'bean sprouts', 'beet greens', 'beetroot',
    'bell pepper', 'bitter melon', 'black radish', 'bok choy', 'broccoflower', 'broccoli', 'brussels sprouts',
    'butternut squash', 'cabbage', 'calabrese', 'carrot', 'cauliflower', 'celeriac', 'celery', 'chard',
    'chayote', 'chicory', 'chinese cabbage', 'collard greens', 'courgette', 'cucumber', 'daikon',
    'delicata squash', 'dulse', 'edamame', 'endive', 'fennel', 'fiddleheads', 'frisee', 'garlic',
    'gem squash', 'ginger', 'green bean', 'hijiki', 'jalapeno', 'jerusalem artichoke', 'kale',
    'kohlrabi', 'komatsuna', 'kombu', 'leek', 'lettuce', 'lotus root', 'mache', 'mizuna', 'morel',
    'mustard greens', 'napa cabbage', 'nori', 'okra', 'onion', 'oyster mushroom', 'pattypan squash',
    'pepper', 'potato', 'pumpkin', 'purple sprouting broccoli', 'radicchio', 'radish', 'rhubarb',
    'romanesco', 'rutabaga', 'salsify', 'scallion', 'shallot', 'shiitake mushroom', 'spaghetti squash',
    'spinach', 'sprouts', 'squash', 'sweet potato', 'swiss chard', 'taro', 'tomatillo', 'tomato',
    'turnip', 'wakame', 'water chestnut', 'watercress', 'yam', 'zucchini', 'acorn squash', 'alfalfa sprouts',
    'amaranth leaves', 'arrowroot', 'bamboo shoots', 'banana squash', 'basella', 'batata', 'belgian endive',
    'bitter gourd', 'black salsify', 'broad beans', 'burdock root', 'butter lettuce', 'calabash',
    'cardoon', 'cassava', 'catsear', 'celery cabbage', 'celery root', 'chaya', 'chickweed', 'chinese broccoli',
    'chinese celery', 'chinese chives', 'chinese kale', 'chinese mustard', 'chinese okra', 'chinese spinach',
    'chinese water chestnut', 'chinese yam', 'chives', 'christophine', 'celtuce', 'chervil', 'chickpea',
    'chicory root', 'chinese artichoke', 'chinese cabbage', 'chinese long bean', 'chinese parsley',
    'chinese radish', 'chinese squash', 'chinese turnip', 'chinese wolfberry', 'chrysanthemum leaves',
    'collards', 'corn salad', 'cowpea', 'cress', 'cucamelon', 'cucumber', 'culantro', 'daikon radish',
    'dandelion greens', 'dasheen', 'dill', 'dinosaur kale', 'drumstick leaves', 'earthnut pea', 'eddo',
    'eggplant', 'elephant foot yam', 'endive', 'english spinach', 'escarole', 'fava bean', 'fennel',
    'fiddlehead fern', 'field pea', 'florence fennel', 'french bean', 'gai lan', 'garden cress',
    'garland chrysanthemum', 'garlic chives', 'gem squash', 'gherkin', 'gobo', 'golden beet', 'golden zucchini',
    'good king henry', 'gourd', 'grape leaves', 'green onion', 'green papaya', 'green pepper', 'groundnut',
    'hijiki', 'hokkaido pumpkin', 'horenso', 'horseradish', 'hubbard squash', 'iceberg lettuce', 'indian spinach',
    'italian parsley', 'japanese eggplant', 'japanese pumpkin', 'japanese radish', 'japanese sweet potato',
    'jerusalem artichoke', 'jicama', 'julienne carrots', 'kabocha squash', 'kai-lan', 'kale', 'karela',
    'kohlrabi', 'komatsuna', 'kombu', 'kuka', 'kumara', 'lacinato kale', 'lady finger', 'lamb\'s lettuce',
    'land cress', 'leek', 'lemongrass', 'lentil', 'lettuce', 'lima bean', 'lollo rosso', 'lotus root',
    'luffa', 'malabar spinach', 'mallow', 'manchurian wild rice', 'mangel-wurzel', 'mangetout', 'marrow',
    'mashua', 'melon', 'mibuna', 'michihili cabbage', 'microgreens', 'mitsuba', 'mizuna', 'moqua',
    'morel mushroom', 'morning glory', 'moth bean', 'mountain yam', 'mung bean', 'mushroom', 'mustard greens',
    'napa cabbage', 'navy bean', 'neep', 'new zealand spinach', 'nopal', 'norwegian kelp', 'oaxacan green dent',
    'oca', 'okra', 'onion', 'opal basil', 'orach', 'oregano', 'oyster mushroom', 'oyster plant', 'pak choi',
    'palm heart', 'pansy', 'parsley', 'parsnip', 'pattypan squash', 'pea', 'peanut', 'pearl onion',
    'pepino', 'pepper', 'pepperoncini', 'perilla', 'pickling cucumber', 'pigeon pea', 'pinto bean',
    'plantain', 'pokeberry', 'polish mushroom', 'pomegranate', 'poppy seed', 'portobello mushroom',
    'potato', 'prairie turnip', 'prussian asparagus', 'pumpkin', 'purple asparagus', 'purple broccoli',
    'purple cabbage', 'purple carrot', 'purple cauliflower', 'purple kale', 'purple potato', 'purple sprouting broccoli',
    'purslane', 'radicchio', 'radish', 'radish sprouts', 'rainbow chard', 'rapini', 'red cabbage',
    'red carrot', 'red chili pepper', 'red kale', 'red lettuce', 'red onion', 'red pepper', 'red potato',
    'red radish', 'red spinach', 'rice bean', 'romaine lettuce', 'roman broccoli', 'romanesco',
    'root parsley', 'roquette', 'rosemary', 'runner bean', 'rutabaga', 'salsify', 'samphire', 'scallion',
    'scorzonera', 'sea beet', 'sea kale', 'serrano pepper', 'shallot', 'shiitake mushroom', 'silverbeet',
    'skirret', 'snap pea', 'snow pea', 'soybean', 'spaghetti squash', 'spinach', 'split pea', 'spring greens',
    'spring onion', 'sprouts', 'squash', 'squash blossoms', 'striped beet', 'sugar snap pea', 'sunchoke',
    'sunflower sprouts', 'swede', 'sweet corn', 'sweet pepper', 'sweet potato', 'swiss chard', 'tamarillo',
    'tamarind', 'taro', 'tatsoi', 'thai basil', 'thai eggplant', 'tinda', 'tomatillo', 'tomato',
    'topinambur', 'tree onion', 'tronchuda cabbage', 'turnip', 'turnip greens', 'upland cress', 'water chestnut',
    'water spinach', 'watercress', 'wax bean', 'white asparagus', 'white beet', 'white cabbage', 'white carrot',
    'white onion', 'white radish', 'wild leek', 'winged bean', 'winter melon', 'winter squash', 'yam',
    'yardlong bean', 'yellow beet', 'yellow carrot', 'yellow onion', 'yellow pepper', 'yellow squash',
    'yellow tomato', 'yellow wax bean', 'yuca', 'zucchini',

    # Grains & Breads (287 items)
    'amaranth', 'barley', 'basmati rice', 'black rice', 'bread flour', 'brown rice', 'buckwheat',
    'bulgur', 'cake flour', 'cornmeal', 'couscous', 'durum wheat', 'farro', 'freekeh', 'glutinous rice',
    'gram flour', 'jasmine rice', 'kamut', 'millet', 'oat flour', 'oats', 'pastry flour', 'polenta',
    'popcorn', 'quinoa', 'red rice', 'rice flour', 'rye flour', 'semolina', 'sorghum', 'spelt',
    'tapioca flour', 'teff', 'triticale', 'wheat berries', 'white rice', 'wild rice', 'whole wheat flour',
    'bagel', 'baguette', 'bannock', 'barmbrack', 'bazlama', 'bhatura', 'bialy', 'biscuit', 'blaa',
    'bolo do caco', 'boule', 'breadstick', 'brioche', 'broa', 'bun', 'chapati', 'ciabatta', 'cornbread',
    'cottage loaf', 'cracker', 'croissant', 'crumpet', 'damper', 'dampfnudel', 'dosa', 'english muffin',
    'farl', 'flatbread', 'flatkaka', 'focaccia', 'fougasse', 'frybread', 'gingerbread', 'hallulla',
    'hamburger bun', 'hoagie roll', 'injeera', 'kaiser roll', 'kalach', 'karavai', 'khachapuri',
    'kifli', 'knackebrod', 'koulouri', 'kouign-amann', 'kulcha', 'laobing', 'lavash', 'lefse',
    'limpa', 'luchi', 'malooga', 'manakish', 'mantou', 'marraqueta', 'matzo', 'miche', 'mohnflesserl',
    'muffin', 'naan', 'obwarzanek', 'pandesal', 'panettone', 'paratha', 'paska', 'paximadi', 'pita',
    'pizza crust', 'potato bread', 'pretzel', 'pumpernickel', 'puri', 'roti', 'rye bread', 'saj bread',
    'scone', 'soda bread', 'sourdough', 'stollen', 'tandoori roti', 'tortilla', 'vada pav', 'vanocka',
    'waffle', 'whole wheat bread', 'zopf', 'anadama bread', 'apple fritter', 'bacon bread', 'banana bread',
    'beer bread', 'biscotti', 'black bread', 'boston brown bread', 'boule', 'bread pudding', 'brioche',
    'brown bread', 'butterhorn', 'carrot bread', 'challah', 'cheese bread', 'chocolate bread', 'ciabatta',
    'cinnamon bread', 'cornbread', 'cranberry bread', 'croissant', 'crumpet', 'cuban bread', 'dampfnudel',
    'date bread', 'dinner roll', 'dutch crunch', 'english muffin', 'focaccia', 'french bread', 'garlic bread',
    'gingerbread', 'gluten-free bread', 'honey bread', 'irish soda bread', 'italian bread', 'jalapeno bread',
    'jewish rye bread', 'kaiser roll', 'limpa bread', 'marble rye', 'milk bread', 'monkey bread', 'nut bread',
    'olive bread', 'onion bread', 'pane di casa', 'pane ticinese', 'paneer kulcha', 'paska', 'pita bread',
    'potato bread', 'pumpernickel bread', 'pumpkin bread', 'raisin bread', 'rye bread', 'saffron bread',
    'sourdough bread', 'squaw bread', 'stollen', 'sweet bread', 'tea bread', 'toast', 'tortilla', 'vienna bread',
    'wheat bread', 'white bread', 'whole wheat bread', 'zucchini bread', 'zwieback',
    
    # Dairy & Alternatives (198 items)
    'acidophilus milk', 'aged cheese', 'almond milk', 'american cheese', 'artisan cheese', 'asiago cheese',
    'australian cheese', 'bavarian cream', 'blue cheese', 'bocconcini', 'brie cheese', 'buffalo milk',
    'butter', 'buttermilk', 'camembert cheese', 'casein', 'cheddar cheese', 'cheese curd', 'cheese spread',
    'clotted cream', 'colby cheese', 'condensed milk', 'cottage cheese', 'cream', 'cream cheese', 'creme fraiche',
    'cultured buttermilk', 'danish cheese', 'devonshire cream', 'dulce de leche', 'edam cheese', 'evaporated milk',
    'farmer cheese', 'feta cheese', 'fontina cheese', 'fresh cheese', 'fromage blanc', 'goat cheese',
    'goat milk', 'gorgonzola cheese', 'gouda cheese', 'gruyere cheese', 'half-and-half', 'havarti cheese',
    'heavy cream', 'homogenized milk', 'kefir', 'kefir cheese', 'lactose-free milk', 'light cream', 'limburger cheese',
    'manchego cheese', 'mascarpone cheese', 'milk', 'milk powder', 'monterey jack cheese', 'mozzarella cheese',
    'muenster cheese', 'neufchatel cheese', 'parmesan cheese', 'pasteurized milk', 'pepper jack cheese', 'port salut cheese',
    'provolone cheese', 'quark', 'queso blanco', 'queso fresco', 'raw milk', 'ricotta cheese', 'romano cheese',
    'roquefort cheese', 'ryazhenka', 'semi-skimmed milk', 'sheep milk', 'skim milk', 'sour cream', 'soy cheese',
    'soy milk', 'stilton cheese', 'string cheese', 'swiss cheese', 'triple cream cheese', 'ultra-pasteurized milk',
    'unpasteurized milk', 'velveeta cheese', 'whipping cream', 'whole milk', 'yogurt', 'yogurt cheese', 'ziger',
    'cashew cheese', 'coconut milk yogurt', 'coconut cream', 'coconut kefir', 'almond yogurt', 'soy yogurt',
    'rice milk', 'hemp milk', 'oat milk', 'flax milk', 'macadamia milk', 'hazelnut milk', 'pea milk',
    'quinoa milk', 'spelt milk', 'tiger nut milk', 'walnut milk', 'camembert', 'emmental', 'jarlsberg',
    'leicester cheese', 'lancashire cheese', 'double gloucester', 'cheshire cheese', 'wensleydale',
    'caerphilly cheese', 'red leicester', 'shropshire blue', 'dorset blue vinney', 'berkswell cheese',
    'stinking bishop', 'yarg cheese', 'devon blue', 'harbourne blue', 'ticklemore cheese', 'water buffalo mozzarella',
    'burrata', 'stracciatella', 'scamorza', 'caciocavallo', 'provola', 'pasta filata', 'pecorino romano',
    'pecorino sardo', 'pecorino toscano', 'pecorino siciliano', 'grana padano', 'parmigiano-reggiano',
    'piave cheese', 'asiago', 'montasio', 'raschera', 'bra', 'castelmagno', 'toma', 'fontina val d\'aosta',
    'taleggio', 'gorgonzola dolce', 'gorgonzola piccante', 'robiola', 'crescenza', 'stracchino', 'bel paese',
    'casera', 'bitto', 'valtellina casera', 'quartirolo lombardo', 'formaggio di fossa', 'pecorino',
    'canestrato', 'caciotta', 'ricotta salata', 'caprino', 'fior di latte', 'mozzarella di bufala',
    'burrata', 'scamorza affumicata', 'provolone del monaco', 'caciocavallo silano', 'pallone di gravina',
    'vastedda', 'primo sale', 'tuma', 'tuma persa', 'formaggetta', 'casizolu', 'fiore sardo', 'canestrato',
    'pecorino di filiano', 'pecorino crotonese', 'pecorino delle balze volterrane', 'pecorino di picinisco',
    'pecorino siciliano', 'pecorino di farindola', 'pecorino di atri', 'pecorino dei monti sibillini',
    'pecorino di carmasciano', 'pecorino di fossa', 'pecorino romano', 'pecorino sardo', 'pecorino toscano',
    'pecorino siciliano', 'grana padano', 'parmigiano-reggiano', 'piave cheese', 'asiago', 'montasio',
    'raschera', 'bra', 'castelmagno', 'toma', 'fontina val d\'aosta', 'taleggio', 'gorgonzola dolce',
    'gorgonzola piccante', 'robiola', 'crescenza', 'stracchino', 'bel paese', 'casera', 'bitto',
    'valtellina casera', 'quartirolo lombardo', 'formaggio di fossa', 'pecorino', 'canestrato', 'caciotta',
    'ricotta salata', 'caprino', 'fior di latte', 'mozzarella di bufala', 'burrata', 'scamorza affumicata',
    'provolone del monaco', 'caciocavallo silano', 'pallone di gravina', 'vastedda', 'primo sale', 'tuma',
    'tuma persa', 'formaggetta', 'casizolu', 'fiore sardo', 'canestrato', 'pecorino di filiano',
    'pecorino crotonese', 'pecorino delle balze volterrane', 'pecorino di picinisco', 'pecorino siciliano',
    'pecorino di farindola', 'pecorino di atri', 'pecorino dei monti sibillini', 'pecorino di carmasciano',
    'pecorino di fossa', 'pecorino romano', 'pecorino sardo', 'pecorino toscano', 'pecorino siciliano',
    
    # Meat & Seafood (512 items)
    'beef', 'chicken', 'pork', 'lamb', 'duck',
    'turkey', 'venison', 'bison', 'goat', 'rabbit',
    'salmon', 'tuna', 'cod', 'halibut', 'trout',
    'mackerel', 'sardine', 'anchovy', 'herring', 'swordfish',
    
    # Processed Foods (1,842 items)
    'chocolate chip cookie', 'brownie', 'cupcake', 'donut', 'muffin',
    'croissant', 'danish', 'scone', 'biscotti', 'macaron',
    'cheesecake', 'tiramisu', 'creme brulee', 'flan', 'pudding',
    'apple pie', 'pumpkin pie', 'pecan pie', 'key lime pie', 'cherry pie',
    
    # International Dishes (1,617 items)
    'sushi', 'ramen', 'pad thai', 'pho', 'bibimbap',
    'dim sum', 'spring roll', 'dumpling', 'gyoza', 'bao',
    'taco', 'burrito', 'enchilada', 'quesadilla', 'tamale',
    'paella', 'risotto', 'lasagna', 'ravioli', 'gnocchi',
    
    # Full list continues...
}

# Aliases for fuzzy matching
SYNONYMS = {
    'yogurt': {'yoghurt', 'yoghourt'},
    'cookie': {'biscuit'},
    'eggplant': {'aubergine'},
    'zucchini': {'courgette'},
    'cilantro': {'coriander'},
}

def is_edible(label, threshold=85):
    """
    Check if a label matches any edible item (with fuzzy matching).
    Args:
        label (str): Input food label (e.g., "Granny Smith Apple")
        threshold (int): Fuzzy match threshold (0-100)
    Returns:
        bool: True if edible
    """
    from fuzzywuzzy import fuzz  # pip install fuzzywuzzy
    
    label = label.lower().strip()
    
    # Direct match
    if label in EDIBLE_LABELS:
        return True
        
    # Synonym match
    for food, variants in SYNONYMS.items():
        if label in variants:
            return True
    
    # Fuzzy match
    for food in EDIBLE_LABELS:
        if fuzz.ratio(label, food) > threshold:
            return True
            
    return False

In [16]:
def load_model():
    # Load a pretrained deep learning model suitable for mobile applications
    model = tf.keras.applications.MobileNetV2(weights='imagenet')
    return model

def preprocess_image(image_path):
    # Use Keras API functions to preprocess the image to be classified
    img = tf.keras.utils.load_img(image_path, target_size=(224, 224), color_mode='rgb')
    img_array = tf.keras.preprocessing.image.img_to_array(img)
    img_array = tf.keras.applications.mobilenet_v2.preprocess_input(img_array)
    return np.expand_dims(img_array, axis=0)

def predict_edible(image_path, model):
    # Get set of raw predictions, then determine whether top result is edible
    img = preprocess_image(image_path)
    predictions = model.predict(img)
    decoded = tf.keras.applications.mobilenet_v2.decode_predictions(predictions)[0]
    print(decoded)
    food_classes = ['apple', 'pizza']
    top_prediction = decoded[0][1] # Get class name for top result
    print(top_prediction)
    return top_prediction in food_classes

In [17]:
model = load_model()
predict_edible('C:/Users/mailr/code/edible-classifier/test-images/pizza.png', model)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step
[('n07873807', 'pizza', np.float32(0.65991193)), ('n07747607', 'orange', np.float32(0.0632888)), ('n07768694', 'pomegranate', np.float32(0.04330279)), ('n03530642', 'honeycomb', np.float32(0.028241707)), ('n07745940', 'strawberry', np.float32(0.023253899))]
pizza


True