<a href="https://colab.research.google.com/github/TairCohen/personal-nutritionist-agent/blob/bar's_branch/Copy_of_simple_csv_rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Simple RAG (Retrieval-Augmented Generation) System for CSV Files

## Overview

This code implements a basic Retrieval-Augmented Generation (RAG) system for processing and querying CSV documents. The system encodes the document content into a vector store, which can then be queried to retrieve relevant information.

# CSV File Structure and Use Case
The CSV file contains dummy customer data, comprising various attributes like first name, last name, company, etc. This dataset will be utilized for a RAG use case, facilitating the creation of a customer information Q&A system.

## Key Components

1. Loading and spliting csv files.
2. Vector store creation using [FAISS](https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/) and OpenAI embeddings
3. Retriever setup for querying the processed documents
4. Creating a question and answer over the csv data.

## Method Details

### Document Preprocessing

1. The csv is loaded using langchain Csvloader
2. The data is split into chunks.


### Vector Store Creation

1. OpenAI embeddings are used to create vector representations of the text chunks.
2. A FAISS vector store is created from these embeddings for efficient similarity search.

### Retriever Setup

1. A retriever is configured to fetch the most relevant chunks for a given query.

## Benefits of this Approach

1. Scalability: Can handle large documents by processing them in chunks.
2. Flexibility: Easy to adjust parameters like chunk size and number of retrieved results.
3. Efficiency: Utilizes FAISS for fast similarity search in high-dimensional spaces.
4. Integration with Advanced NLP: Uses OpenAI embeddings for state-of-the-art text representation.

## Conclusion

This simple RAG system provides a solid foundation for building more complex information retrieval and question-answering systems. By encoding document content into a searchable vector store, it enables efficient retrieval of relevant information in response to queries. This approach is particularly useful for applications requiring quick access to specific information within a csv file.

install libries

In [1]:
!pip install -q --upgrade langchain-text-splitters langchain-community langgraph
!pip install -q langchain-openai
!pip install faiss-cpu>=1.7.4

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.5/43.5 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m20.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m137.9/137.9 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.9/41.9 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m47.0/47.0 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.8/194.8 kB[0m [31m10.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m223.7/223.7 kB[0m [31m10.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

import libries

In [2]:
from langchain_community.document_loaders.csv_loader import CSVLoader
# from pathlib import Path
from langchain_openai import ChatOpenAI,OpenAIEmbeddings
import os
from google.colab import userdata

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

llm = ChatOpenAI(model="gpt-3.5-turbo-0125") #todo??

# CSV File Structure and Use Case
The CSV file contains dummy customer data, comprising various attributes like first name, last name, company, etc. This dataset will be utilized for a RAG use case, facilitating the creation of a customer information Q&A system.

In [3]:
from google.colab import drive
import pandas as pd

drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
file_path = "/content/drive/MyDrive/Baot/30_NLP_2025/calories_dataset_consistent_rephrasing.csv"
tabular_data = pd.read_csv(file_path)
print(f"data shape {tabular_data.shape}")
tabular_data.head()

data shape (4623, 5)


Unnamed: 0,shmmitzrach,english_name,food_energy,rephrased_english,rephrased_hebrew
0,"מי גבינה, חומצי, נוזלי","Whey, acid, fluid",24,"A liquid dairy byproduct from cheese-making, a...","תוצר לוואי נוזלי של תהליך ייצור גבינה, חומצי ב..."
1,"בורגול, מבושל עם שעועית לבנה ועגבניות","Bulgur, cooked with white beans and tomatoes",112,"A cooked grain dish containing bulgur, white b...","תבשיל דגן מבושל המכיל בורגול, שעועית לבנה ועגב..."
2,חלב אם,"Milk, human",70,"Breast milk, produced by humans.","חלב אם, מופק על ידי בני אדם."
3,"חלב 3% שומן, תנובה, טרה, הרדוף, יטבתה","Milk, cow, 3% fat",60,Cow's milk with 3% fat content.,חלב פרה עם תכולת שומן של 3%.
4,"חלב 1% שומן בקרטון מועשר ויטמין A,D, וסידן","Milk, cow, 1% fat, fortified with calcium",42,Low-fat cow’s milk (1%) enriched with calcium ...,חלב פרה דל שומן (1%) מועשר בסידן ובוויטמינים A...


In [5]:
tabular_data.head(20)

Unnamed: 0,shmmitzrach,english_name,food_energy,rephrased_english,rephrased_hebrew
0,"מי גבינה, חומצי, נוזלי","Whey, acid, fluid",24,"A liquid dairy byproduct from cheese-making, a...","תוצר לוואי נוזלי של תהליך ייצור גבינה, חומצי ב..."
1,"בורגול, מבושל עם שעועית לבנה ועגבניות","Bulgur, cooked with white beans and tomatoes",112,"A cooked grain dish containing bulgur, white b...","תבשיל דגן מבושל המכיל בורגול, שעועית לבנה ועגב..."
2,חלב אם,"Milk, human",70,"Breast milk, produced by humans.","חלב אם, מופק על ידי בני אדם."
3,"חלב 3% שומן, תנובה, טרה, הרדוף, יטבתה","Milk, cow, 3% fat",60,Cow's milk with 3% fat content.,חלב פרה עם תכולת שומן של 3%.
4,"חלב 1% שומן בקרטון מועשר ויטמין A,D, וסידן","Milk, cow, 1% fat, fortified with calcium",42,Low-fat cow’s milk (1%) enriched with calcium ...,חלב פרה דל שומן (1%) מועשר בסידן ובוויטמינים A...
5,"חלב 3% שומן, מועשר בסידן, תנובה,טרה,יטבתה","Milk, cow, 3% fat, fortified with calcium, Tnu...",58,Cow's milk with 3% fat content.,חלב פרה עם תכולת שומן של 3%.
6,"חלב 3% שומן, מועשר בויטמינים B12, D,E, יטבתה","Milk, cow, 3% fat, fortified with vitamins, Yo...",57,Cow's milk with 3% fat content.,חלב פרה עם תכולת שומן של 3%.
7,"חלב 1% שומן, תנובה, טרה, הרדוף, יטבתה","Milk, cow, 1% fat, Tnuva/Tara/Harduf/Yotvata",43,Low-fat cow’s milk (1%) enriched with calcium ...,חלב פרה דל שומן (1%) מועשר בסידן ובוויטמינים A...
8,"משקה חלב בטעם וניל,3% שומן, טרה","Milk drink, 3% fat, vanilla/banana/mocha, Tara",86,"A flavored milk-based beverage, typically with...","משקה על בסיס חלב, לרוב עם תוספת סוכר או טעמים."
9,"חלב 2% שומן, כולל דל לקטוז, תנובה","Milk, cow, 3% fat, reduced lactose, Tnuva",51,Cow's milk with 3% fat content.,חלב פרה עם תכולת שומן של 3%.


load and process csv data

In [6]:
loader = CSVLoader(file_path=file_path)
docs = loader.load_and_split()
docs[0]

Document(metadata={'source': '/content/drive/MyDrive/Baot/30_NLP_2025/calories_dataset_consistent_rephrasing.csv', 'row': 0}, page_content='shmmitzrach: מי גבינה, חומצי, נוזלי\nenglish_name: Whey, acid, fluid\nfood_energy: 24\nrephrased_english: A liquid dairy byproduct from cheese-making, acidic in nature.\nrephrased_hebrew: תוצר לוואי נוזלי של תהליך ייצור גבינה, חומצי באופיו.')

Initiate faiss vector store and openai embedding

In [7]:
import faiss
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores import FAISS

embeddings = OpenAIEmbeddings()
index = faiss.IndexFlatL2(len(OpenAIEmbeddings().embed_query(" ")))
vector_store = FAISS(
    embedding_function=OpenAIEmbeddings(),
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={}
)

Add the splitted csv data to the vector store

In [8]:
vector_store.add_documents(documents=docs)
len(docs) # doc for each row in table

4623

Create the retrieval chain

Query the rag bot with a question based on the CSV data

In [9]:
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

retriever = vector_store.as_retriever()

# Set up system prompt
system_prompt = (
    "You are an AI nutrition assistant that estimates the total calories in a dish based on a text description or an image.\n\n"
    "### **Estimation Methodology:**\n"
    "1. **Check for an exact match in the database.**\n"
    "   - If an exact match exists, return its calorie count per 100g.\n"
    "2. **If no exact match exists, break the dish into ingredients and estimate calories.**\n"
    "   - Identify the **most relevant base food** (e.g., a plain omelet for 'cheese omelet').\n"
    "   - Check for **similar variations** (e.g., 'Egg or omelet, fried without oil' as the base).\n"
    "   - **Only include ingredients explicitly mentioned in the description.**\n"
    "   - Add ingredients like cheese based on the closest match in the database. **Do not assume any extra ingredients (e.g., mushrooms) unless explicitly mentioned.**\n"
    "   - Adjust calorie estimates proportionally to the expected ingredient ratio.\n"
    "3. **Do NOT assume extra ingredients unless explicitly mentioned.**\n"
    "4. **Do NOT use the calorie value of a mixed dish (e.g., 'omelet with mushrooms and cheese') as a direct replacement for a different variant (e.g., 'cheese omelet').**\n"
    "5. **Clearly explain the steps taken, including any assumptions about portions.**\n"
    "6. **For each ingredient:**\n"
    "   - Provide the closest match from the database (e.g., 'Egg or omelet, fried without oil') and its calorie count per 100g.\n"
    "   - If the exact calorie count for an ingredient is missing (e.g., for cheese), explain that and provide an estimated serving size (e.g., 150g for eggs, 30g for cheese).\n"
    "   - Use the standard serving size to calculate the calories from each ingredient based on the proportion of the total dish.\n"
    "7. **Provide the final total calories for the dish.**\n\n"
    "Use the retrieved database context below to find accurate calorie values:\n"
    "{context}\n\n"
    "If the exact ingredient is not found, use the closest alternative and explain why.\n"
    "If specific calorie counts are missing, make assumptions based on standard serving sizes and ingredient ratios. Always provide the final total calorie estimate."
)


prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}"),
])

# Create the question-answer chain
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

In [10]:
# Example usage
answer = rag_chain.invoke({"input": "Estimate the calories in a cheese omelet."})
print(answer['answer'])

To estimate the calories in a cheese omelet, we will break down the dish into its primary ingredients and calculate the total calories based on the closest matches in the database. 

Considering the dish is a "cheese omelet," we will start with the base ingredients of eggs and cheese. Since the database does not have an exact match for a "cheese omelet," we will separately estimate the calories for eggs and cheese to calculate the total.

Here are the details for each ingredient:
1. **Eggs:** The closest match in the database is "Egg or omelet, fried without oil" with 149 calories per 100g. Assuming an average serving of 150g for an omelet made from 2-3 eggs, we estimate around 223.5 calories from eggs.
2. **Cheese:** Since the database does not have an exact match for the type of cheese in a cheese omelet, we will estimate based on a popular type like yellow cheese. The closest match is "Egg or omelet with mushrooms, yellow cheese, milk 3% fat, and butter" with 189 calories per 100g. 

# Now with a photo

In [20]:
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

retriever = vector_store.as_retriever()

# Set up system prompt
system_prompt = (
    "You are an AI nutrition assistant that estimates the total calories in a dish based on an image.\n\n"
    "### **Estimation Methodology:**\n"
    "1. **Analyze the image to identify the dish and its components.**\n"
    "   - Detect and classify the main ingredients (e.g., eggs, cheese, butter).\n"
    "   - If the dish is unclear, provide a confidence-based guess and request clarification if needed.\n"
    "2. **Find the closest ingredient matches in the database.**\n"
    "   - For each identified component, provide the best database match (e.g., 'Egg or omelet, fried without oil' for an omelet).\n"
    "   - List its calorie count per 100g.\n"
    "3. **Estimate the serving size of each ingredient.**\n"
    "   - Use visual estimation techniques to approximate weight (e.g., a whole omelet is ~150g, a cheese slice ~30g).\n"
    "   - If uncertain, provide a range or ask the user for confirmation.\n"
    "4. **Calculate the calorie breakdown.**\n"
    "   - Compute the calorie contribution from each ingredient based on estimated weight.\n"
    "   - Clearly show the breakdown (e.g., 'Eggs: (150g / 100g) * 162 cal = 243 cal').\n"
    "5. **Provide the final total calorie estimate.**\n"
    "   - Sum up the estimated calories for all ingredients.\n"
    "   - Ensure the final answer is clear and concise.\n"
    "6. **Avoid assumptions about extra ingredients.**\n"
    "   - Only include what is visually identified (e.g., do not add mushrooms unless present in the image).\n"
    "   - Do NOT use pre-mixed dish values (e.g., 'omelet with mushrooms and cheese') as a replacement for the detected ingredients.\n"
    "7. **Use the retrieved database context below to find accurate calorie values:**\n"
    "{context}\n\n"
    "If any ingredient is unclear, use the closest alternative and explain why.\n"
    "If serving size estimation is uncertain, provide a reasonable assumption and mention it in the response.\n"
)



prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}"),
])

# Create the question-answer chain
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

In [18]:
import base64
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

In [19]:
image_path = "/content/drive/MyDrive/Baot/30_NLP_2025/hamburger.jpg"
base64_image = encode_image(image_path)

In [None]:
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text":  "Estimate the calories in this dish.",
                },
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
                },
            ],
        }
    ],
)

In [14]:
with open("/content/drive/MyDrive/Baot/30_NLP_2025/hamburger.jpg", "rb") as image_file:
    answer = rag_chain.invoke({
        "input": "Estimate the calories in this dish.",
        "image": image_file.read()  # Reads binary data of the image
    })

print(answer['answer'])


Sure, I'd be happy to help! Please provide an image of the dish you'd like me to analyze for estimating the total calories.


In [21]:
import openai

# Define system prompt
system_prompt = (
    "You are an AI nutrition assistant that estimates the total calories in a dish based on an image.\n\n"
    "### **Estimation Methodology:**\n"
    "1. **Analyze the image to identify the dish and its components.**\n"
    "   - Detect and classify the main ingredients (e.g., eggs, cheese, butter).\n"
    "   - If the dish is unclear, provide a confidence-based guess and request clarification if needed.\n"
    "2. **Find the closest ingredient matches in the database.**\n"
    "   - For each identified component, provide the best database match (e.g., 'Egg or omelet, fried without oil' for an omelet).\n"
    "   - List its calorie count per 100g.\n"
    "3. **Estimate the serving size of each ingredient.**\n"
    "   - Use visual estimation techniques to approximate weight (e.g., a whole omelet is ~150g, a cheese slice ~30g).\n"
    "   - If uncertain, provide a range or ask the user for confirmation.\n"
    "4. **Calculate the calorie breakdown.**\n"
    "   - Compute the calorie contribution from each ingredient based on estimated weight.\n"
    "   - Clearly show the breakdown (e.g., 'Eggs: (150g / 100g) * 162 cal = 243 cal').\n"
    "5. **Provide the final total calorie estimate.**\n"
    "   - Sum up the estimated calories for all ingredients.\n"
    "   - Ensure the final answer is clear and concise.\n"
    "6. **Avoid assumptions about extra ingredients.**\n"
    "   - Only include what is visually identified (e.g., do not add mushrooms unless present in the image).\n"
    "   - Do NOT use pre-mixed dish values (e.g., 'omelet with mushrooms and cheese') as a replacement for the detected ingredients.\n"
    "7. **Use the retrieved database context below to find accurate calorie values:**\n"
    "{context}\n\n"
    "If any ingredient is unclear, use the closest alternative and explain why.\n"
    "If serving size estimation is uncertain, provide a reasonable assumption and mention it in the response.\n"
)

# Load image and send request to OpenAI
from openai import OpenAI
client = OpenAI()

with open("/content/drive/MyDrive/Baot/30_NLP_2025/hamburger.jpg", "rb") as image_file:

  response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
      {"role": "system", "content": system_prompt},
              {"role": "user", "content": [
                  {"type": "text", "text": "Estimate the calories in this dish."},
                  {"type": "image", "image": image_file.read()}  # Attach image
              ]}
          ]
      )

# Print AI response
print(response["choices"][0]["message"]["content"])


TypeError: Object of type bytes is not JSON serializable

In [23]:
import openai
import base64

# Define system prompt
system_prompt = (
    "You are an AI nutrition assistant that estimates the total calories in a dish based on an image.\n\n"
    "### **Estimation Methodology:**\n"
    "1. **Analyze the image to identify the dish and its components.**\n"
    "   - Detect and classify the main ingredients (e.g., eggs, cheese, butter).\n"
    "   - If the dish is unclear, provide a confidence-based guess and request clarification if needed.\n"
    "2. **Find the closest ingredient matches in the database.**\n"
    "   - For each identified component, provide the best database match (e.g., 'Egg or omelet, fried without oil' for an omelet).\n"
    "   - List its calorie count per 100g.\n"
    "3. **Estimate the serving size of each ingredient.**\n"
    "   - Use visual estimation techniques to approximate weight (e.g., a whole omelet is ~150g, a cheese slice ~30g).\n"
    "   - If uncertain, provide a range or ask the user for confirmation.\n"
    "4. **Calculate the calorie breakdown.**\n"
    "   - Compute the calorie contribution from each ingredient based on estimated weight.\n"
    "   - Clearly show the breakdown (e.g., 'Eggs: (150g / 100g) * 162 cal = 243 cal').\n"
    "5. **Provide the final total calorie estimate.**\n"
    "   - Sum up the estimated calories for all ingredients.\n"
    "   - Ensure the final answer is clear and concise.\n"
    "6. **Avoid assumptions about extra ingredients.**\n"
    "   - Only include what is visually identified (e.g., do not add mushrooms unless present in the image).\n"
    "   - Do NOT use pre-mixed dish values (e.g., 'omelet with mushrooms and cheese') as a replacement for the detected ingredients.\n"
    "7. **Use the retrieved database context below to find accurate calorie values:**\n"
    "{context}\n\n"
    "If any ingredient is unclear, use the closest alternative and explain why.\n"
    "If serving size estimation is uncertain, provide a reasonable assumption and mention it in the response.\n"
)

# Load image and send request to OpenAI
from openai import OpenAI
client = OpenAI()

with open("/content/drive/MyDrive/Baot/30_NLP_2025/hamburger.jpg", "rb") as image_file:
  # Encode the image to base64
  base64_image = base64.b64encode(image_file.read()).decode('utf-8')

  response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
      {"role": "system", "content": system_prompt},
              {"role": "user", "content": [
                  {"type": "text", "text": "Estimate the calories in this dish."},
                  {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}},  # Use image_url and encode image to base64
              ]}
          ]
      )

# Print AI response
print(response["choices"][0]["message"]["content"])

TypeError: 'ChatCompletion' object is not subscriptable

In [26]:
# Print AI response
print(response)

ChatCompletion(id='chatcmpl-BFFZML6keYQSa0DwV2ZuUDeffmCUA', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="To estimate the calories in this dish, let's break it down into its components:\n\n1. **Burger (with bun, meat, lettuce, tomato, bacon):**\n   - **Bun:** ~100g, ~250 calories\n   - **Beef patty:** ~150g, ~300 calories\n   - **Lettuce and tomato:** ~30g, ~5 calories\n   - **Bacon:** ~2 strips (20g), ~100 calories\n\n2. **Fries:**\n   - Looks like a small portion, ~100g, ~300 calories\n\n3. **Beer (small glass):**\n   - ~200ml, ~80 calories\n\n### Calorie Breakdown:\n- **Burger:**\n  - Bun: \\( \\frac{100g}{100g} \\times 250 \\text{ cal} = 250 \\text{ cal} \\)\n  - Beef patty: \\( \\frac{150g}{100g} \\times 200 \\text{ cal} = 300 \\text{ cal} \\)\n  - Lettuce and tomato: \\( \\frac{30g}{100g} \\times 15 \\text{ cal} = 5 \\text{ cal} \\)\n  - Bacon: \\( \\frac{20g}{100g} \\times 500 \\text{ cal} = 100 \\text{ cal} \\)\n\n- **Fries

In [17]:
import openai

# Define system prompt
system_prompt = (
    "You are an AI nutrition assistant that estimates the total calories in a dish based on an image.\n\n"
    "### **Estimation Methodology:**\n"
    "1. **Analyze the image to identify the dish and its components.**\n"
    "   - Detect and classify the main ingredients (e.g., eggs, cheese, butter).\n"
    "   - If the dish is unclear, provide a confidence-based guess and request clarification if needed.\n"
    "2. **Find the closest ingredient matches in the database.**\n"
    "   - For each identified component, provide the best database match (e.g., 'Egg or omelet, fried without oil' for an omelet).\n"
    "   - List its calorie count per 100g.\n"
    "3. **Estimate the serving size of each ingredient.**\n"
    "   - Use visual estimation techniques to approximate weight (e.g., a whole omelet is ~150g, a cheese slice ~30g).\n"
    "   - If uncertain, provide a range or ask the user for confirmation.\n"
    "4. **Calculate the calorie breakdown.**\n"
    "   - Compute the calorie contribution from each ingredient based on estimated weight.\n"
    "   - Clearly show the breakdown (e.g., 'Eggs: (150g / 100g) * 162 cal = 243 cal').\n"
    "5. **Provide the final total calorie estimate.**\n"
    "   - Sum up the estimated calories for all ingredients.\n"
    "   - Ensure the final answer is clear and concise.\n"
    "6. **Avoid assumptions about extra ingredients.**\n"
    "   - Only include what is visually identified (e.g., do not add mushrooms unless present in the image).\n"
    "   - Do NOT use pre-mixed dish values (e.g., 'omelet with mushrooms and cheese') as a replacement for the detected ingredients.\n"
    "7. **Use the retrieved database context below to find accurate calorie values:**\n"
    "{context}\n\n"
    "If any ingredient is unclear, use the closest alternative and explain why.\n"
    "If serving size estimation is uncertain, provide a reasonable assumption and mention it in the response.\n"
)

# Load image and send request to OpenAI
with open("/content/drive/MyDrive/Baot/30_NLP_2025/hamburger.jpg", "rb") as image_file:
    response = openai.chat.completions.create( # Use openai.chat.completions instead of openai.ChatCompletion
        model="gpt-4-vision-system", # Use gpt-4-vision-system instead of the deprecated gpt-4-vision-preview
        messages=[
            {"role": "system", "content": system_prompt},  # Use the defined system prompt
            {"role": "user", "content": [
                {"type": "text", "text": "Estimate the calories in this dish."},
                {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64," + str(image_file.read())}}  # Attach image as image_url
            ]}
        ]
    )

# Print AI response
print(response.choices[0].message.content) # Access content from response object

NotFoundError: Error code: 404 - {'error': {'message': 'The model `gpt-4-vision-system` does not exist or you do not have access to it.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}

In [11]:
food = 'omelet'
tabular_data[tabular_data['rephrased_english'].str.contains(food.lower(), na=False)]

Unnamed: 0,shmmitzrach,english_name,food_energy,rephrased_english,rephrased_hebrew
985,ביצה או חביתה מטוגנת בשמן סויה,"Egg or omelet, fried in soy oil",194,"Food item: Egg or omelet, fried in soy oil",מזון: ביצה או חביתה מטוגנת בשמן סויה
986,ביצה או חביתה מטוגנת במרגרינה,"Egg or omelet, fried in margarine",179,"Food item: Egg or omelet, fried in margarine",מזון: ביצה או חביתה מטוגנת במרגרינה
987,ביצה או חביתה מטוגנת בשמן חמניות,"Egg or omelet, fried in sunflower oil",194,"Food item: Egg or omelet, fried in sunflower oil",מזון: ביצה או חביתה מטוגנת בשמן חמניות
988,ביצה או חביתה מטוגנת בשמן זית,"Egg or omelet, fried in olive oil",194,"Food item: Egg or omelet, fried in olive oil",מזון: ביצה או חביתה מטוגנת בשמן זית
989,ביצה או חביתה מטוגנת בשמן תירס,"Egg or omelet, fried in corn oil",228,"Food item: Egg or omelet, fried in corn oil",מזון: ביצה או חביתה מטוגנת בשמן תירס
990,ביצה או חביתה מטוגנת בשמן קנולה,"Egg or omelet, fried in canola oil",193,"Food item: Egg or omelet, fried in canola oil",מזון: ביצה או חביתה מטוגנת בשמן קנולה
991,ביצה או חביתה מטוגנת ללא שמן,"Egg or omelet, fried without oil",162,"Food item: Egg or omelet, fried without oil",מזון: ביצה או חביתה מטוגנת ללא שמן
999,ביצה או חביתה עם חלב בשמן סויה,Egg or omelet with milk in soy oil,218,Food item: Egg or omelet with milk in soy oil,מזון: ביצה או חביתה עם חלב בשמן סויה
1000,ביצה או חביתה עם פטריות גבינה צהובה חלב 3% ושמ...,"Egg or omelet with mushrooms, yellow cheese, m...",199,"Food item: Egg or omelet with mushrooms, yello...",מזון: ביצה או חביתה עם פטריות גבינה צהובה חלב ...
1001,ביצה או חביתה עם נקניק פסטרמה מטוגן בשמן קנולה,Egg or omelet with pastrami fried in canola oil,185,Food item: Egg or omelet with pastrami fried i...,מזון: ביצה או חביתה עם נקניק פסטרמה מטוגן בשמן...
