# Keto/Vegan Diet classifier
Argmax, a consulting firm specializing in search and recommendation solutions with offices in New York and Israel, is hiring entry-level Data Scientists and Machine Learning Engineers.

At Argmax, we prioritize strong coding skills and a proactive, “get-things-done” attitude over a perfect resume. As part of our selection process, candidates are required to complete a coding task demonstrating their practical abilities.

In this task, you’ll work with a large recipe dataset sourced from Allrecipes.com. Your challenge will be to classify recipes based on their ingredients, accurately identifying keto (low-carb) and vegan (no animal products) dishes.

Successfully completing this assignment is a crucial step toward joining Argmax’s talented team.

In [1]:
from opensearchpy import OpenSearch
from decouple import config
import pandas as pd

client = OpenSearch(
    hosts=[config('OPENSEARCH_URL', 'http://localhost:9200')],
    http_auth=None,
    use_ssl=False,
    verify_certs=False,
    ssl_show_warn=False,
)

# Recipes Index
Our data is stored in OpenSearch, and you can query it using either Elasticsearch syntax or SQL.
## Elasticsearch Syntax

In [2]:
query = {
    "query": {
        "match": {
            "description": { "query": "egg" }
        }
    }
}

res = client.search(
    index="recipes",
    body=query,
    size=2
)

hits = res['hits']['hits']
hits

[{'_index': 'recipes',
  '_id': 'eXVQeZcBIOqAXBOBXz16',
  '_score': 3.9817066,
  '_source': {'title': 'Genuine Egg Noodles',
   'description': 'These egg noodles are the original egg noodles.  ',
   'instructions': ['Combine flour, salt and baking powder. Mix in eggs and enough water to make the dough workable. Knead dough until stiff. Roll into ball and cut into quarters. Using 1/4 of the dough at a time, roll flat to about 1/8 inch use flour as needed, top and bottom, to prevent sticking. Peel up and roll from one end to the other. Cut roll into 3/8 inch strips. Noodles should be about 4 to 5 inches long depending on how thin it was originally flattened. Let dry for 1 to 3 hours.',
    'Cook like any pasta or, instead of drying first cook it fresh but make sure water is boiling and do not allow to stick. It takes practice to do this right.'],
   'ingredients': ['2 cups Durum wheat flour',
    '1/2 teaspoon salt',
    '1/4 teaspoon baking powder',
    '3 eggs',
    'water as needed'],

## SQL syntax

In [3]:
query = """
SELECT *
FROM recipes
LIMIT 20
OFFSET 1200
"""

res = client.sql.query(body={'query': query})
df = pd.DataFrame(res["datarows"], columns=[c["name"] for c in res["schema"]])
df

Unnamed: 0,description,ingredients,instructions,photo_url,title
0,Squares of cheesy pizza are served up with sli...,"[1 (12 inch) pre-baked pizza crust, 1 1/2 cups...",[Top pizza crust with cheese. Bake crust accor...,http://images.media-allrecipes.com/global/reci...,Johnsonville® Three Cheese Italian Style Chick...
1,Squares of cheesy pizza are served up with sli...,"[1 (12 inch) pre-baked pizza crust, 1 1/2 cups...",[Top pizza crust with cheese. Bake crust accor...,http://images.media-allrecipes.com/global/reci...,Johnsonville® Three Cheese Italian Style Chick...
2,Squares of cheesy pizza are served up with sli...,"[1 (12 inch) pre-baked pizza crust, 1 1/2 cups...",[Top pizza crust with cheese. Bake crust accor...,http://images.media-allrecipes.com/global/reci...,Johnsonville® Three Cheese Italian Style Chick...
3,Squares of cheesy pizza are served up with sli...,"[1 (12 inch) pre-baked pizza crust, 1 1/2 cups...",[Top pizza crust with cheese. Bake crust accor...,http://images.media-allrecipes.com/global/reci...,Johnsonville® Three Cheese Italian Style Chick...
4,Squares of cheesy pizza are served up with sli...,"[1 (12 inch) pre-baked pizza crust, 1 1/2 cups...",[Top pizza crust with cheese. Bake crust accor...,http://images.media-allrecipes.com/global/reci...,Johnsonville® Three Cheese Italian Style Chick...
5,Squares of cheesy pizza are served up with sli...,"[1 (12 inch) pre-baked pizza crust, 1 1/2 cups...",[Top pizza crust with cheese. Bake crust accor...,http://images.media-allrecipes.com/global/reci...,Johnsonville® Three Cheese Italian Style Chick...
6,Squares of cheesy pizza are served up with sli...,"[1 (12 inch) pre-baked pizza crust, 1 1/2 cups...",[Top pizza crust with cheese. Bake crust accor...,http://images.media-allrecipes.com/global/reci...,Johnsonville® Three Cheese Italian Style Chick...
7,Squares of cheesy pizza are served up with sli...,"[1 (12 inch) pre-baked pizza crust, 1 1/2 cups...",[Top pizza crust with cheese. Bake crust accor...,http://images.media-allrecipes.com/global/reci...,Johnsonville® Three Cheese Italian Style Chick...
8,Squares of cheesy pizza are served up with sli...,"[1 (12 inch) pre-baked pizza crust, 1 1/2 cups...",[Top pizza crust with cheese. Bake crust accor...,http://images.media-allrecipes.com/global/reci...,Johnsonville® Three Cheese Italian Style Chick...
9,Squares of cheesy pizza are served up with sli...,"[1 (12 inch) pre-baked pizza crust, 1 1/2 cups...",[Top pizza crust with cheese. Bake crust accor...,http://images.media-allrecipes.com/global/reci...,Johnsonville® Three Cheese Italian Style Chick...


# Task Instructions

Your goal is to implement two classifiers:

1.	Vegan Meal Classifier
1.	Keto Meal Classifier

Unlike typical supervised machine learning tasks, the labels are not provided in the dataset. Instead, you will rely on clear and verifiable definitions to classify each meal based on its ingredients.

### Definitions:

1. **Vegan Meal**: Contains no animal products whatsoever (no eggs, milk, meat, etc.).
1. **Keto Meal**: Contains no ingredients with more than 10g of carbohydrates per 100g serving. For example, eggs are keto-friendly, while apples are not.

Note that some meals may meet both vegan and keto criteria (e.g., meals containing avocados), though most meals typically fall into neither category.

## Example heuristic:

In [4]:
def is_ingredient_vegan(ing):
    for animal_product in "egg meat milk butter veel lamb beef chicken sausage".split():
        if animal_product in ing:
            return False
    return True

def is_vegan_example(ingredients):
    return all(map(is_ingredient_vegan, ingredients))
    
df["vegan"] = df["ingredients"].apply(is_vegan_example)
df

Unnamed: 0,description,ingredients,instructions,photo_url,title,vegan
0,Squares of cheesy pizza are served up with sli...,"[1 (12 inch) pre-baked pizza crust, 1 1/2 cups...",[Top pizza crust with cheese. Bake crust accor...,http://images.media-allrecipes.com/global/reci...,Johnsonville® Three Cheese Italian Style Chick...,True
1,Squares of cheesy pizza are served up with sli...,"[1 (12 inch) pre-baked pizza crust, 1 1/2 cups...",[Top pizza crust with cheese. Bake crust accor...,http://images.media-allrecipes.com/global/reci...,Johnsonville® Three Cheese Italian Style Chick...,True
2,Squares of cheesy pizza are served up with sli...,"[1 (12 inch) pre-baked pizza crust, 1 1/2 cups...",[Top pizza crust with cheese. Bake crust accor...,http://images.media-allrecipes.com/global/reci...,Johnsonville® Three Cheese Italian Style Chick...,True
3,Squares of cheesy pizza are served up with sli...,"[1 (12 inch) pre-baked pizza crust, 1 1/2 cups...",[Top pizza crust with cheese. Bake crust accor...,http://images.media-allrecipes.com/global/reci...,Johnsonville® Three Cheese Italian Style Chick...,True
4,Squares of cheesy pizza are served up with sli...,"[1 (12 inch) pre-baked pizza crust, 1 1/2 cups...",[Top pizza crust with cheese. Bake crust accor...,http://images.media-allrecipes.com/global/reci...,Johnsonville® Three Cheese Italian Style Chick...,True
5,Squares of cheesy pizza are served up with sli...,"[1 (12 inch) pre-baked pizza crust, 1 1/2 cups...",[Top pizza crust with cheese. Bake crust accor...,http://images.media-allrecipes.com/global/reci...,Johnsonville® Three Cheese Italian Style Chick...,True
6,Squares of cheesy pizza are served up with sli...,"[1 (12 inch) pre-baked pizza crust, 1 1/2 cups...",[Top pizza crust with cheese. Bake crust accor...,http://images.media-allrecipes.com/global/reci...,Johnsonville® Three Cheese Italian Style Chick...,True
7,Squares of cheesy pizza are served up with sli...,"[1 (12 inch) pre-baked pizza crust, 1 1/2 cups...",[Top pizza crust with cheese. Bake crust accor...,http://images.media-allrecipes.com/global/reci...,Johnsonville® Three Cheese Italian Style Chick...,True
8,Squares of cheesy pizza are served up with sli...,"[1 (12 inch) pre-baked pizza crust, 1 1/2 cups...",[Top pizza crust with cheese. Bake crust accor...,http://images.media-allrecipes.com/global/reci...,Johnsonville® Three Cheese Italian Style Chick...,True
9,Squares of cheesy pizza are served up with sli...,"[1 (12 inch) pre-baked pizza crust, 1 1/2 cups...",[Top pizza crust with cheese. Bake crust accor...,http://images.media-allrecipes.com/global/reci...,Johnsonville® Three Cheese Italian Style Chick...,True


### Limitations of the Simplistic Heuristic

The heuristic described above is straightforward but can lead to numerous false positives and negatives due to its reliance on keyword matching. Common examples of incorrect classifications include:
- "Peanut butter" being misclassified as non-vegan, as “butter” is incorrectly assumed to imply dairy.
- "eggless" recipes being misclassified as non-vegan, due to the substring “egg.”
- Animal-derived ingredients such as “pork” and “bacon” being incorrectly identified as vegan, as they may not be explicitly listed in the keyword set.


# Submission
## 1. Implement Diet Classifiers
Complete the two classifier functions in the diet_classifiers.py file within this repository. Ensure your implementation correctly identifies “keto” and “vegan” meals. After implementing these functions, verify that the Flask server displays the appropriate badges (“keto” and “vegan”) next to the corresponding recipes.

> **Note**
>
> This repo contains two `diet_classifiers.py` files:
> 1. One in this folder (`nb/src/diet_classifiers.py`)
> 2. One in the Flask web app folder (`web/src/diet_classifiers.py`)
>
> You can develop your solution here in the notebook environment, but to apply your solution 
> to the Flask app you will need to copy your implementation into the `diet_classifiers.py` 
> file in the Flask folder!!!

In [5]:
from keto_helpers import (parse_ingredient, _as_float, pick_usda_hit_cached, fetch_food_cached, get_macronutrients,
                        _unit_to_grams, estimate_weight, _scale_macros)



In [6]:
def is_keto(ingredients, verbose = False):

    try:
    
        row_macros = dict(carbs=0.0, protein=0.0, fat=0.0, fiber=0.0, calories=0.0)
    
        for line in ingredients:
            try:
                parsed = parse_ingredient(line)
                qty = _as_float(parsed["quantity"])
                unit = (parsed["unit"] or "").lower()
                ingredient_name = parsed["ingredient"]
                
                fdc_id, usda_name = pick_usda_hit_cached(ingredient_name)
                info = fetch_food_cached(int(fdc_id))
                macros = get_macronutrients(info)
        
                g_per_unit = _unit_to_grams(info, unit)
                if g_per_unit is None:
                    g_per_unit = estimate_weight(ingredient_name)
        
                grams_needed = qty * g_per_unit
                scaled = _scale_macros(macros, grams_needed)
        
                row_macros["carbs"] += scaled["carbs_g"]
                row_macros["protein"] += scaled["protein_g"]
                row_macros["fat"] += scaled["fat_g"]
                row_macros["fiber"] += scaled["fiber_g"]
                row_macros["calories"] += scaled["calories"]
                if verbose:
                    print("Line Name: ", ingredient_name)
                    print("USDA Name: ", usda_name)
                    print("quantity: ", qty, unit)
                    print("Carbs: ", scaled["carbs_g"])
                    print("Protein: ", scaled["protein_g"])
                    print("Fat: ", scaled["fat_g"])
                    print("Fiber: ", scaled["fiber_g"])
                    print("Calories: ", scaled["calories"])
                    
            except Exception as e:
                print(e)
                continue
    
        net_carbs = max(row_macros["carbs"] - row_macros["fiber"], 0)
        carb_pct = (
            (net_carbs * 4) / row_macros["calories"] * 100 if row_macros["calories"] else 0
        )
        is_this_keto = (carb_pct <= 20)
    
        if verbose:
            print("total carbs: ", row_macros["carbs"])
            print("total protein: ", row_macros["protein"])
            print("total fat: ", row_macros["fat"])
            print("total fiber: ", row_macros["fiber"])
            print("is keto: ", is_this_keto)
            print()
            print()
    
        return is_this_keto
    except Exception as e:
        print(e)
        return False


def is_ingredient_vegan(ingredient):
    # TODO: complete
    return False    

For your convenience, you can sanity check your solution on a subset of labeled recipes by running `diet_classifiers.py`

In [17]:
! python diet_classifiers.py --ground_truth /usr/src/data/ground_truth_sample.csv

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


===Keto===
              precision    recall  f1-score   support

       False       0.85      0.88      0.87        60
        True       0.82      0.78      0.79        40

    accuracy                           0.84       100
   macro avg       0.84      0.83      0.83       100
weighted avg       0.84      0.84      0.84       100

===Vegan===
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
              precision    recall  f1-score   support

       False       0.60      1.00      0.75        60
        True       0.00      0.00      0.00        40

    accuracy                           0.60       100
   macro avg       0.30      0.50      0.38       100
weighted avg       0.36      0.60      0.45       100

== Time taken: 27.566752433776855 seconds ==


## 2. Repository Setup
Create a **private** GitHub repository for your solution, and invite the GitHub user `argmax2025` as a collaborator. **Do not** share your implementation using a **forked** repository.

## 3. Application Form
Once you’ve completed the implementation and shared your private GitHub repository with argmax2025, please fill out the appropriate application form:
1. [US Application Form](https://forms.clickup.com/25655193/f/rexwt-1832/L0YE9OKG2FQIC3AYRR)
2.  [IL Application Form](https://forms.clickup.com/25655193/f/rexwt-1812/IP26WXR9X4P6I4LGQ6)


Your application will not be considered complete until this form is submitted.

## Evaluation process


Your submission will be assessed based on the following criteria:


1.	**Readability & Logic** – Clearly explain your approach, including your reasoning and any assumptions. If you relied on external resources (e.g., ingredient databases, nutrition datasets), be sure to cite them.
2.	**Executability** – Your code should run as is when cloned from your GitHub repository. Ensure that all paths are relative, syntax is correct, and no manual setup is required.
3.	**Accuracy** – Your classifiers will be evaluated against a holdout set of 20,000 recipes with verified labels. Performance will be compared to the ground truth.
data.


## Next steps
If your submission passes the initial review, you’ll be invited to a 3-hour live coding interview, where you’ll be asked to extend and adapt your solution in real time.

Please make sure you join from a quiet environment and have access to a Python-ready workstation capable of running your submitted project.