<h4>Gemini API Setup</h4>
<p>To compare product suggestions from Bing and ChatGPT, we extract the relevant products either from ChatGPT’s response text or directly from Bing’s search result page, leveraging the Gemini API for efficient processing.</p>

In [1]:
import os
import json
from google import genai
from google.genai import types
from dotenv import load_dotenv

<p>We use the Gemini 2.5 Pro model to extract recommended products from large text sources because it provides advanced natural language understanding and efficient context handling. The model can process long passages, identify relevant entities, and infer relationships between products and user preferences within a single request. </p>

In [2]:
# Gemini Setupt
load_dotenv()
google_api_key = os.getenv("GOOGLE_API_KEY")
client = genai.Client()

# Model
gemini_model = "gemini-2.5-pro"

# Behavior
with open("gemini_behavior.txt", "r", encoding="utf-8") as f:
    gemini_behavior = f.read().strip()

# Rate Limit per Minute
gemini_rpm = 150

<p>
We configure the Gemini 2.5 Pro model to return the extracted recommended products as a JSON list object.  
This ensures the output is structured, machine-readable, and easy to parse for downstream processing.  
Additionally, the model is instructed to preserve the order in which products are mentioned in the text, allowing us to maintain the original sequence of recommendations and capture the natural flow of context or preference.
</p>

In [3]:
def get_recommended_products(product_text: str) -> list[str]:
    try:
        response = client.models.generate_content(
            model = gemini_model,
            config=types.GenerateContentConfig(
                system_instruction = gemini_behavior,
                temperature = 0,
                response_mime_type = "application/json",  # Force jsoon output
                response_schema = {
                    "type": "object",
                    "properties": {
                        "products": {
                            "type": "array",
                            "items": {"type": "string"}
                        }
                    },
                    "required": ["products"]
                }
            ),
            contents = product_text
        )
        
        # parse json response
        result = json.loads(response.text)
        return result.get("products", [])
        
    except Exception as e:
        print(f"Error: {e}")
        return []

<h4>Recommended Products Extraction</h4>

In [4]:
import pandas as pd
import time

In [5]:
raw_chatgpt_path = "../../data/ChatGPT data/raw/chatgpt_chrome_ext.xlsx"
raw_bing_path = "../../data/Bing data/raw/bing_chrome_ext.xlsx"

chatgpt_df = pd.read_excel(raw_chatgpt_path).dropna(how = "all").reset_index(drop = True)
bing_df = pd.read_excel(raw_bing_path).dropna(how = "all").reset_index(drop = True)

chatgpt_df["recommended_products"], bing_df["recommended_products"] = None, None

In [6]:
def extract_recommended_products(df: pd.DataFrame, text_col: str, save_path: str) -> pd.DataFrame:
    delay_per_call = 60.0 / gemini_rpm * 2

    for idx, row in df.iterrows():
        print(f"Processing row {idx + 1} / {len(df)}")

        if pd.isna(row[text_col]) or row[text_col] == "":
            print(f"Skipping empty row {idx}...")
            continue
        
        df.at[idx, "recommended_products"] = get_recommended_products(row[text_col])
        df.to_csv(save_path, index = False)

        # rate limiting delay
        time.sleep(delay_per_call)

    return df

<h4>ChatGPT</h4>

In [7]:
chatgpt_df.columns

Index(['market_type', 'product', 'query_level', 'query_index', 'run_number',
       'query', 'response_text', 'web_search_forced', 'sources_cited',
       'sources_additional', 'recommended_products'],
      dtype='object')

In [8]:
# ChatGPT
# modified_chatgpt_path = "../../data/ChatGPT data/modified/chatgpt.xlsx
# chatgpt_df = extract_recommended_products(df = chatgpt_df, text_col = "response_text", safe_path = modified_chatgpt_path)

<h4>Bing</h4>

In [9]:
bing_df.columns

Index(['market_type', 'product', 'query_level', 'query', 'position', 'page',
       'title', 'url', 'domain', 'display_url', 'snippet', 'content',
       'content_length', 'content_error', 'manual_content_inspection',
       'recommended_products'],
      dtype='object')

In [10]:
# Bing
# modified_bing_path = "../../data/Bing data/modified/bing.xlsx"
# bing_df = extract_recommended_products(df = bing_df, text_col = "content", save_path = modified_bing_path)