Using any OpenAI model or Groq (llama 70b) model, solve the aisle mapping problem.  OpenAI code is provided here. You will do the usual 3 steps in terms of mounting the Google drive, your API key and install one of the LLM models, and import pandas.  Your goal is to do a model comparison and validation.
**The code below works for OpenAI gpt-4o model.  Have not tested it on Llama model.  Also, this code with the dataset  may blow your budget if you are not careful with the size of test dataset, so exercise caution.**

In [1]:
from google.colab import drive
drive.mount('/content/drive/')

Mounted at /content/drive/


In [2]:
!pip install -r "/content/drive/MyDrive/LLM/Groq/requirements.txt"

Collecting groq (from -r /content/drive/MyDrive/LLM/Groq/requirements.txt (line 1))
  Downloading groq-0.9.0-py3-none-any.whl (103 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m103.5/103.5 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain (from -r /content/drive/MyDrive/LLM/Groq/requirements.txt (line 2))
  Downloading langchain-0.2.3-py3-none-any.whl (974 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m974.0/974.0 kB[0m [31m20.4 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from groq->-r /content/drive/MyDrive/LLM/Groq/requirements.txt (line 1))
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
Collecting langchain-core<0.3.0,>=0.2.0 (from langchain->-r /content/drive/MyDrive/LLM/Groq/requirements.txt (line 2))
  Downloading langchain_core-0.2.5-py3-none-any.whl (314 kB)
[2K     [90m━

In [None]:
import os  ## step-3
# Read and set the environment variable from the .bashrc file
with open('/content/drive/MyDrive/LLM/Groq/.bashrc') as file:
    for line in file:
        if line.startswith('export '):
            var, value = line[len('export '):].strip().split('=')
            os.environ[var] = value

# Verify that the environment variable is set
!echo $GROQ_API_KEY



import os  ## step-3
# Read and set the environment variable from the .bashrc file
with open('/content/drive/MyDrive/LLM/.bashrc') as file:
    for line in file:
        if line.startswith('export '):
            var, value = line[len('export '):].strip().split('=')
            os.environ[var] = value

# Verify that the environment variable is set
!echo $OPENAI_API_KEY

In [16]:

from groq import Groq
client = Groq(api_key = os.getenv('GROQ_API_KEY'))

In [17]:
import pandas as pd
df = pd.read_excel('Aisle-Mapping.xlsx')

In [18]:
groceries = df['Grocery ITEM'].dropna().tolist()
aisles = df['Aisle Category'].dropna().tolist()

In [19]:
batch_size = 50  # Adjust the batch size as needed
separator=","

# Function to get the best matches for a batch of keywords using chat completion
def get_best_matches_batch(grocery_batch, aisle_list):
    prompt = "Match each grocery item in the  grocery list with the most appropriate aisle category from the provided list of aisle categories.\n\n"
    prompt += "The grocery list items are separated by commas. The list of aisles are also separated by commas. \n\n"
    prompt += "List of grocery items to match:\n" + "\n"+ "\n"+separator.join(grocery_batch)+  "\n\n"
    prompt += "List of provided aisle categories:\n" + "\n"+ "\n"+separator.join(aisle_list)+ "\n\n"
    prompt += "If an appropriate aisle category is not to be found in the list, use Other \n\n"
    prompt += "Return the matches in the JSON format 'grocery item' : 'aisle category' \n\n"
    prompt+="You must absolutely make sure that each grocery item is mapped to an aisle category"
    #print(prompt)
    response = client.chat.completions.create(
        model="mixtral-8x7b-32768",
        messages=[
            {"role": "system", "content": "You are a helpful assistant that matches grocery items to aisle categories based on a typical grocery store or a supermarket in the USA."},
            {"role": "user", "content": prompt}
        ],
        max_tokens=4096,
        temperature=0.0
    )
    matches = response.choices[0].message.content
    return matches

In [20]:
grocery_batch = groceries[0: batch_size]
batch_matches = get_best_matches_batch(grocery_batch, aisles)
print(batch_matches)




{
"tropical fruit": "Produce",
"whole milk": "Dairy",
"pip fruit": "Produce",
"other vegetables": "Produce",
"rolls/buns": "Bakery",
"pot plants": "Garden",
"beef": "Meat",
"frankfurter": "Meat",
"chicken": "Meat",
"butter": "Dairy",
"fruit/vegetable juice": "Beverages",
"packaged fruit/vegetables": "Produce",
"chocolate": "Candy",
"specialty bar": "Candy",
"butter milk": "Dairy",
"yogurt": "Dairy",
"sausage": "Meat",
"brown bread": "Bakery",
"hamburger meat": "Meat",
"root vegetables": "Produce",
"pork": "Meat",
"pastry": "Bakery",
"canned beer": "Alcohol",
"citrus fruit": "Produce",
"berries": "Produce",
"misc. beverages": "Beverages",
"coffee": "Misc",
"canned goods": "Canned Goods",
"pastry": "Bakery",
"misc. beverages": "Beverages",
"root vegetables": "Produce",
"sausage": "Meat"
}

Please note that some items like "sausage" and "pastry" appear in the grocery list more than once, but they should still be matched to the same aisle category. Also, some items like "misc. beverages" a