# NutriLensAI

## Nutrient Analysis from Food Label Images Using Data-Driven Models

**Developed By:**
- Prabin Raj Shrestha ([Email](mailto:prbn.ms@gmail.com) | [Website](https://prbn.info))
- Aryan Nilesh Sadvelkar ([Email](mailto:asadvelk@syr.edu))
- Peiying Chen ([Email](mailto:pchen21@syr.edu))

\

<a href="https://colab.research.google.com/drive/1fvJCff6jsY3x04Y06eaQaxrnkw9XV-rV" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


---

In [None]:
# @title Install Libraries

!pip install easyocr
!pip install aisuite

Collecting easyocr
  Downloading easyocr-1.7.2-py3-none-any.whl.metadata (10 kB)
Collecting python-bidi (from easyocr)
  Downloading python_bidi-0.6.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.9 kB)
Collecting pyclipper (from easyocr)
  Downloading pyclipper-1.3.0.post6-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (9.0 kB)
Collecting ninja (from easyocr)
  Downloading ninja-1.11.1.2-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (5.3 kB)
Downloading easyocr-1.7.2-py3-none-any.whl (2.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.9/2.9 MB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading ninja-1.11.1.2-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (422 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m422.9/422.9 kB[0m [31m15.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyclipper-1.3.0.post6-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (9

In [None]:
# @title Import Libraries

import torch
from transformers import pipeline
from google.colab import userdata
import os
import re
import json
import gdown
from IPython.display import Image

In [None]:
# @title Class: llm_nutritionist
import easyocr
import torch
from transformers import pipeline
from google.colab import userdata
import os
import re
import json
from IPython.display import Image

# os.environ["HF_TOKEN"] = userdata.get('HF_TOKEN')
# Access till deceber 2024


class llm_nutritionist():
  def __init__(self, lang_list = None):
    self.lang_list = lang_list or ['en']
    self.reader = easyocr.Reader(self.lang_list)
    self.__llm_model__()

    self.json_prompt_l = []
    self._prompt_json()

    self.nutritionist_prompt_l = []
    self._prompt_nutritionist()

    self.data_text = None
    self.data_json = None
    self.data_nutritionist = None

  def __llm_model__(self):
    self.model_id = "meta-llama/Llama-3.2-1B-Instruct"
    self.pipe = pipeline(
        "text-generation",
        model=self.model_id,
        torch_dtype=torch.bfloat16,
        device_map="auto",
    )

  def _prompt_json(self):
    prompt_1 = '''
    You are a tool designed to scrape and process food nutrition facts from plain text and output them in a structured JSON format. For any given input text containing nutrition facts, your task is to extract and organize the data into the following structured JSON format:
    {
      "ingredients": ["ingredient1", "ingredient2", "..."],
      "macro_nutrients": {
        "calories_kcal": 0,
        "total_fat_g": 0,
        "saturated_fat_g": 0,
        "trans_fat_g": 0,
        "polyunsaturated_fat_g": 0,
        "monounsaturated_fat_g": 0,
        "cholesterol_mg": 0,
        "sodium_mg": 0,
        "total_carbohydrates_g": 0,
        "dietary_fiber_g": 0,
        "sugars_g": 0,
        "added_sugars_g": 0,
        "protein_g": 0
      },
      "micro_nutrients": {
        "vitamin_d_mcg": 0,
        "calcium_mg": 0,
        "iron_mg": 0,
        "potassium_mg": 0,
        "vitamin_a_mcg": 0,
        "vitamin_c_mg": 0,
        "vitamin_e_mg": 0,
        "vitamin_k_mcg": 0,
        "thiamin_mg": 0,
        "riboflavin_mg": 0,
        "niacin_mg": 0,
        "vitamin_b6_mg": 0,
        "folate_mcg_dfe": 0,
        "vitamin_b12_mcg": 0,
        "biotin_mcg": 0,
        "pantothenic_acid_mg": 0,
        "phosphorus_mg": 0,
        "magnesium_mg": 0,
        "zinc_mg": 0,
        "selenium_mcg": 0,
        "copper_mg": 0,
        "manganese_mg": 0,
        "chromium_mcg": 0,
        "molybdenum_mcg": 0,
        "choline_mg": 0
      },
      "other_components": {
        "caffeine_mg": 0,
        "alcohol_g": 0,
        "water_g": 0
      }
    }
    Instructions:
    1.	Parse the input text to extract all available data.
    2.	If a nutrient is not mentioned in the input text, default its value to 0.
    3.	Maintain the same JSON structure in your response every time, even if the input does not contain certain data.
    4.	Ensure accurate conversion of all quantities, including units such as grams (g), milligrams (mg), and micrograms (mcg).
    '''

    self.json_prompt_l.append(prompt_1)

    return None

  def display_image(self, img_path):
    """Displays an image using IPython.display.Image."""
    try:
      Image(filename=img_path)
    except FileNotFoundError:
      print(f"Error: Image file not found at {img_path}")
    except Exception as e:
      print(f"An error occurred: {e}")

  def _prompt_nutritionist(self):
    prompt_1 = '''
    You are an experienced nutritionist with extensive knowledge of food science, dietary guidelines, and health optimization. Your task is to analyze the nutritional information and ingredients list provided in a JSON file for ready-to-eat food products. Based on this analysis, you will:
    1.	Determine if the food item is generally healthy or not.
    2.	Explain your reasoning, highlighting both positive and negative aspects of the food’s nutritional profile.
    3.	Identify any concerning ingredients or nutritional red flags.
    4.	Suggest healthier alternatives or ways to balance the diet if the item is consumed.
    5.	Provide tips on how to incorporate this food into a balanced diet, if appropriate.
    6.	Offer general advice for maintaining a healthy diet.

    Instructions:
    - Begin your response with the phrase “Nutritional Analysis:” followed by your evaluation.
    - Use the JSON data to extract key information such as ingredients, macro-nutrients, micro-nutrients, and other components.
    - Highlight important values like calories, sugar, sodium, saturated fat, and beneficial nutrients (fiber, vitamins, etc.).
    - Conclude with a “Healthy Eating Tip:” that provides actionable advice for maintaining a nutritious diet.

    Example Response:

    Nutritional Analysis:
    This food item contains 100 calories per serving with no fat or protein but a significant amount of sugar (24g, including added sugars). While it provides 30mg of vitamin C (a positive aspect), the high sugar content makes it less ideal for regular consumption, especially for individuals managing blood sugar levels or aiming for weight control.

    Positive Aspects:

    Negative Aspects:

    Concerning Ingredients:

    Suggestions:

    Healthy Eating Tip:
    '''

    self.nutritionist_prompt_l.append(prompt_1)

    return None


  def read_text(self, img_path):
    return self.reader.readtext(img_path)

  def LLM(self, prompt, query, token_size = 256):
    messages = [
        {"role": "system", "content": prompt},
        {"role": "user", "content": query},
    ]

    text = self.pipe(
        messages,
        max_new_tokens=token_size,
    )

    return text[0]["generated_text"][-1]['content']

  def convert_string_to_json(self, input_string):
    try:
        json_object = json.loads(input_string)
        return json_object
    except json.JSONDecodeError as e:
        print(f"Invalid JSON string: {e}")
        raise e


  def extract_json_from_marked_text(self, text):
    """
    Extract JSON data enclosed between ```json and ```.
    """
    try:
        # Regular expression to match JSON blocks starting with ```json and ending with ```
        json_pattern = r"```json\s*(\{(?:.|\n)*?\})\s*```"
        matches = re.findall(json_pattern, text)

        # Parse each match as JSON
        extracted_jsons = []
        for match in matches:
            try:
                json_data = json.loads(match)  # Parse as JSON
                extracted_jsons.append(json_data)
            except json.JSONDecodeError as e:
                print(f"Invalid JSON found: {e}")
        return extracted_jsons

    except Exception as e:
        print(f"Error during extraction: {e}")
        raise e

  def get_nutrition_text(self, img_path, print_flag = False):
    read = self.read_text(img_path)
    self.data_text = '\n'.join([r[1] for r in read])
    if print_flag: print('OCR', self.data_text)
    return self.data_text

  def get_text_JSON(self, data_text, print_flag = False):
    data_json = None
    for prompt in self.json_prompt_l:
      try:
        data_json = self.LLM(prompt, data_text, 1024)
        if print_flag: print('LLM', data_json)
        data_json = self.extract_json_from_marked_text(data_json)
        if print_flag: print('JSON', data_json)
        break
      except:
        try:
          data_json = self.LLM(prompt, data_text, 1024)
          if print_flag: print('LLM', data_json)
          data_json = self.extract_json_from_marked_text(data_json)
          if print_flag: print('JSON', data_json)
          break
        except:
          try:
            data_json = self.LLM(prompt, data_text, 1024)
            if print_flag: print('LLM', data_json)
          except:
            continue
    return data_json

  def get_nutrition_JSON(self, img_path, print_flag = False):
    data_text = self.get_nutrition_text(img_path)
    self.data_json = self.get_text_JSON(data_text, print_flag)
    return self.data_json

  def analysis(self, img_path, display_flag = True, print_flag = False):
    data_json = self.get_nutrition_JSON(img_path, print_flag)
    data_nutritionist = None
    for prompt in self.nutritionist_prompt_l:
      try:
        data_nutritionist = self.LLM(prompt, data_json, 256)
        break
      except:
        try:
          data_nutritionist = self.LLM(prompt, data_json, 256)
          break
        except:
          continue

    self.data_nutritionist = data_nutritionist

    if display_flag:
      self.display_image(img_path)
      print('')
      print('')
      print('Nutrition Facts:')
      print(data_nutritionist)

  def test_analysis(self, img_path, print_flag = False):
    self.display_image(img_path)
    data_json = self.get_nutrition_JSON(img_path, print_flag)
    data_nutritionist = None
    for prompt in self.nutritionist_prompt_l:
      try:
        data_nutritionist = self.LLM(prompt, data_json, 256)
        break
      except:
        try:
          data_nutritionist = self.LLM(prompt, data_json, 256)
          break
        except:
          continue

    self.data_nutritionist = data_nutritionist

    print('Nutrition Facts:')
    print(data_json)
    print('Nutrition Analysis:')
    print(data_nutritionist)

In [None]:
# @title Class: openai_nutritionist
import easyocr
import torch
from transformers import pipeline
from google.colab import userdata
import os
import re
import json
import aisuite as ai


# Access till deceber 2024
# os.environ["HF_TOKEN"] = userdata.get('HF_TOKEN')
os.environ["HF_TOKEN"] = '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'
os.environ["OPENAI_API_KEY"] = userdata.get('CIS667Test')  # Replace YOUR_API_KEY with your actual API key


class openai_nutritionist(llm_nutritionist):
  def __init__(self, lang_list = None):
    super().__init__(lang_list)

  def __llm_model__(self):
    self.model_id = "openai:gpt-4o"
    self.client = ai.Client()

  def LLM(self, prompt, query, token_size = 256):
    messages = [
        {"role": "system", "content": prompt},
        {"role": "user", "content": query},
    ]
    response = self.client.chat.completions.create(model=self.model_id, messages=messages, temperature=0.15)
    return response.choices[0].message.content


In [None]:
# @title Set OpenAI API Key

os.environ["OPENAI_API_KEY"]

'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'

In [None]:
# @title Load Sample labels for testing

gdown.download(url = 'https://drive.google.com/uc?id=1pqDp2YDRAA4VkeXJwC7mLKJ1D6NCPHtA', output = 'sample_label.zip')

!unzip -o /content/sample_label.zip
!rm sample_label.zip


Downloading...
From: https://drive.google.com/uc?id=1pqDp2YDRAA4VkeXJwC7mLKJ1D6NCPHtA
To: /content/sample_label.zip
100%|██████████| 1.02M/1.02M [00:00<00:00, 9.89MB/s]


Archive:  /content/sample_label.zip
  inflating: img01.jpg               
  inflating: img02.jpg               
  inflating: img03.jpg               
  inflating: img04.jpg               
  inflating: img05.jpg               
  inflating: img06.jpg               
  inflating: img07.jpg               


In [None]:
# @title Initiating Object

nutritionist = llm_nutritionist()

In [None]:
# @title Initiating Object

openai_nutritionist = openai_nutritionist()

# Results

In [None]:
Image('img01.jpg')
nutritionist.analysis('img01.jpg')

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.




Nutrition Facts:
Nutritional Analysis:
This food item contains 150 calories per serving, with a significant amount of sugar (24g, including added sugars). While it provides 30mg of vitamin C, the high sugar content makes it less ideal for regular consumption, especially for individuals managing blood sugar levels or aiming for weight control.

Positive Aspects:

- The food item is relatively low in calories, making it a suitable option for those looking to manage their weight.
- It contains a moderate amount of vitamin C, which is essential for immune function and overall health.

Negative Aspects:

- The high sugar content (24g, including added sugars) is a significant concern, as excessive sugar consumption is linked to various health problems, including obesity, type 2 diabetes, and heart disease.
- The lack of beneficial nutrients like fiber, protein, and healthy fats may not provide the nutritional value that consumers are seeking.
- The presence of wheat starch and wheat ingred

In [None]:
Image('img02.jpg')
openai_nutritionist.analysis('img02.jpg')



Nutrition Facts:
None


We used up all out token credits 😞

---