In [1]:
import datasets

# Load Geo-Bench
train_dataset = datasets.load_dataset("GEO-optim/geo-bench", 'train')
test_dataset = datasets.load_dataset("GEO-optim/geo-bench", 'test')

ModuleNotFoundError: No module named 'datasets'

In [None]:
# Checking how the dataset looks

print(train_dataset['train'][0])
print(test_dataset['test'][0]['sources'][0])
print(test_dataset['test'][0])
print(test_dataset['test'][0]['query'])
print(test_dataset['test'][1]['sources'])

{'query': "how does the pokemon bank work? What can I do with it? I'm considering getting it", 'dset': 'eli5', 'sugg_idx': 3}
{'raw_text': 'Sports in KarnatakaThis style and content may require cleanup to meet Wikipedia\'s quality standards. (November 2011)Cricket is by far the most popular sport in Karnataka with International cricket matches attracting a sizeable number of spectators who are willing to pay more than the standard ticket price to get a chance to watch the match.[1] The sports related infrastructure is mainly concentrated in Bangalore which also played host to the 4th National Games of India in the year 1997.[2] Bangalore is also the location of the Sports Authority of India (SAI) which is the premier sports institute in the country.[3] Karnataka is sometimes referred to as the cradle of Indian swimming because of high standards in swimming compared to other states.[4]Association Football[edit]Amidst of cricket, which is the most popular sport of Karnataka, football fin

In [None]:
# Experimenting with Gemini's API

from google import genai
from google.genai import types
client = genai.Client()
response = client.models.generate_content(
    model="gemini-2.5-pro", 
    contents="Explain how AI works in short",
)
print(response.text)

Of course. Here is a simple explanation of how AI works.

Think of it like teaching a child to recognize a cat.

**1. You give it lots of examples (Data).**
You don't write a list of rules like "a cat has fur, four legs, and pointy ears." Instead, you just show the child thousands of pictures, pointing out, "This is a cat," "This is also a cat," and "This is a dog, *not* a cat."

**2. It learns to find patterns (Training).**
The child's brain (the AI model) starts to figure out the common features on its own. It notices that things labeled "cat" usually have whiskers, a certain eye shape, and pointy ears. It creates its own complex set of internal "rules" based on these patterns.

**3. It makes a prediction (Inference).**
Now, you show the child a picture of a new cat it has never seen before. Using the patterns it learned, it makes an educated guess: **"That's a cat!"**

---

That’s the core of how most modern AI works.

*   **ChatGPT** was trained on a massive amount of text from the

In [None]:
# Google's latest Gemini SDK is google.genai. The following is the old one.
import google.generativeai as googlegenai
model = 'gemini-2.5-pro'
model_client = googlegenai.GenerativeModel(model)
response = model_client.generate_content(
                    "Explain how AI works in short",
                    generation_config={
                        'temperature': 0.5,
                        'top_p': 1.0,
                        'max_output_tokens': 1024,
                    }
                )
response

response:
GenerateContentResponse(
    done=True,
    iterator=None,
    result=protos.GenerateContentResponse({
      "candidates": [
        {
          "content": {
            "role": "model"
          },
          "finish_reason": "MAX_TOKENS",
          "index": 0
        }
      ],
      "usage_metadata": {
        "prompt_token_count": 7,
        "total_token_count": 1030
      },
      "model_version": "gemini-2.5-pro"
    }),
)

In [None]:
# Pipeline which picks up queries from the test dataset and queries Gemini using the sources mentioned in the query.
import datasets
from google import genai
from google.genai import types

# Load Geo-Bench
train_dataset = datasets.load_dataset("GEO-optim/geo-bench", 'train')
test_dataset = datasets.load_dataset("GEO-optim/geo-bench", 'test')

query_prompt = """Write an accurate and concise answer for the given user question, using _only_ the provided summarized web 
search results. The answer should be correct, high-quality, and written by an expert using an unbiased and journalistic tone. 
The user's language of choice such as English, Français, Español, Deutsch, or 日本語 should be used. The answer should be 
informative, interesting, and engaging. The answer's logic and reasoning should be rigorous and defensible. Every sentence 
in the answer should be _immediately followed_ by an in-line citation to the search result(s). The cited search result(s) 
should fully support _all_ the information in the sentence. Search results need to be cited using [index]. When citing 
several search results, use [1][2][3] format rather than [1, 2, 3]. You can use multiple search results to respond 
comprehensively while avoiding irrelevant search results.

Question: {query}

Search Results:
{source_text}
"""

response = ""

for row in test_dataset['test']:
    query = row['query']
    sources = row['sources']
    print("Query: " + query)
    source_text = ""
    response = ""

    for i, s in enumerate(sources):
        source_text += "Source " + str(i + 1) + ": " + s['cleaned_text']
        source_text += "\n\n\n\n"

    print("Source Text:\n\n" + source_text)

    client = genai.Client()
    response = client.models.generate_content(
        model="gemini-2.5-pro", 
        contents=query_prompt.format(query = query, source_text = source_text),
    )

    print("Response:\n\n" + response.text)

    # Testing only one query
    break

Query: mention the names of any 3 famous folklore sports in karnataka state
Source Text:

Source 1: Sports in Karnataka
Cricket is by far the most popular sport in Karnataka with International cricket matches attracting a sizeable number of spectators who are willing to pay more than the standard ticket price to get a chance to watch the match. The sports related infrastructure is mainly concentrated in Bangalore which also played host to the 4th National Games of India in the year 1997. Bangalore is also the location of the Sports Authority of India (SAI) which is the premier sports institute in the country. Karnataka is sometimes referred to as the cradle of Indian swimming because of high standards in swimming compared to other states.

Amidst of cricket, which is the most popular sport of Karnataka, football finds its way in the state and attracts good amount of spectators during Indian Super League games of the club, Bengaluru FC. The game is also popular in the districts of Balla

ReadError: [Errno 54] Connection reset by peer

In [None]:
# Test the position-adjusted word count metric 
# Observation: it counts the punctuation marks and stylistic characters (such as '*') as words,
# leading to a miscalculation

from utils import extract_citations_new, impression_wordpos_count_simple
x = extract_citations_new(response.text)

print("Response:\n" + response.text)
print("\nExtracted citations:")
for i in x:
    print(i)

print("\nPosition-adjusted word count = " + str(impression_wordpos_count_simple(x)))

Response:
Based on the provided information, three famous folklore sports in the state of Karnataka are Kambala, Kobri Hori Sparde, and Kichchu Haisodu [2].

*   **Kambala** Kambala is an annual buffalo race and a rural folk sport traditionally organized by farming families after the harvest of the rabi crop [2][3]. The event, which takes place in paddy fields filled with slush, is held between November and March [2].
*   **Kobri Hori Sparde** Considered one of the state's most popular rural sports, Kobri Hori Sparde translates to "catch a running bull" [2]. The sport is held during Diwali and involves farmers and youngsters attempting to subdue a bull in order to tie dry coconuts around its sharpened horns [2].
*   **Kichchu Haisodu** This is a ritualistic sport associated with farming that is observed during Makara Sankranti in the old Mysuru region, including areas around Bengaluru, Mandya, and Hassan [2]. In this event, cattle are decorated, their horns are painted, and they are ma

In [None]:
# Now added the helper function to disregard the 

# Previous response:
# Extracted citations:
# [(['Based', 'on', 'the', 'provided', 'information', ',', 'three', 'famous', 'folklore', 'sports', 'in', 'the', 'state', 'of', 'Karnataka', 'are', 'Kambala', ',', 'Kobri', 'Hori', 'Sparde', ',', 'and', 'Kichchu', 'Haisodu', '[', '2', ']', '.'], 'Based on the provided information, three famous folklore sports in the state of Karnataka are Kambala, Kobri Hori Sparde, and Kichchu Haisodu [2].', [2])]
# [(['*', '*', '*', 'Kambala', '*', '*', 'Kambala', 'is', 'an', 'annual', 'buffalo', 'race', 'and', 'a', 'rural', 'folk', 'sport', 'traditionally', 'organized', 'by', 'farming', 'families', 'after', 'the', 'harvest', 'of', 'the', 'rabi', 'crop', '[', '2', ']', '[', '3', ']', '.'], '*   **Kambala** Kambala is an annual buffalo race and a rural folk sport traditionally organized by farming families after the harvest of the rabi crop [2][3].', [2, 3]), (['The', 'event', ',', 'which', 'takes', 'place', 'in', 'paddy', 'fields', 'filled', 'with', 'slush', ',', 'is', 'held', 'between', 'November', 'and', 'March', '[', '2', ']', '.'], 'The event, which takes place in paddy fields filled with slush, is held between November and March [2].', [2]), (['*', '*', '*', 'Kobri', 'Hori', 'Sparde', '*', '*', 'Considered', 'one', 'of', 'the', 'state', "'s", 'most', 'popular', 'rural', 'sports', ',', 'Kobri', 'Hori', 'Sparde', 'translates', 'to', '``', 'catch', 'a', 'running', 'bull', "''", '[', '2', ']', '.'], '*   **Kobri Hori Sparde** Considered one of the state\'s most popular rural sports, Kobri Hori Sparde translates to "catch a running bull" [2].', [2]), (['The', 'sport', 'is', 'held', 'during', 'Diwali', 'and', 'involves', 'farmers', 'and', 'youngsters', 'attempting', 'to', 'subdue', 'a', 'bull', 'in', 'order', 'to', 'tie', 'dry', 'coconuts', 'around', 'its', 'sharpened', 'horns', '[', '2', ']', '.'], 'The sport is held during Diwali and involves farmers and youngsters attempting to subdue a bull in order to tie dry coconuts around its sharpened horns [2].', [2]), (['*', '*', '*', 'Kichchu', 'Haisodu', '*', '*', 'This', 'is', 'a', 'ritualistic', 'sport', 'associated', 'with', 'farming', 'that', 'is', 'observed', 'during', 'Makara', 'Sankranti', 'in', 'the', 'old', 'Mysuru', 'region', ',', 'including', 'areas', 'around', 'Bengaluru', ',', 'Mandya', ',', 'and', 'Hassan', '[', '2', ']', '.'], '*   **Kichchu Haisodu** This is a ritualistic sport associated with farming that is observed during Makara Sankranti in the old Mysuru region, including areas around Bengaluru, Mandya, and Hassan [2].', [2]), (['In', 'this', 'event', ',', 'cattle', 'are', 'decorated', ',', 'their', 'horns', 'are', 'painted', ',', 'and', 'they', 'are', 'made', 'to', 'jump', 'over', 'a', 'fire', ',', 'a', 'practice', 'farmers', 'believe', 'kills', 'insects', 'on', 'the', 'animals', "'", 'bodies', '[', '2', ']', '.'], "In this event, cattle are decorated, their horns are painted, and they are made to jump over a fire, a practice farmers believe kills insects on the animals' bodies [2].", [2])]
# 
# Position-adjusted word count = [0.0, 0.9073443266149541, 0.09265567338504593, 0.0, 0.0]

from utils import extract_citations_new, impression_wordpos_count_simple
response = """Based on the provided information, three famous folklore sports in the state of Karnataka are Kambala, Kobri Hori Sparde, and Kichchu Haisodu [2].

*   **Kambala** Kambala is an annual buffalo race and a rural folk sport traditionally organized by farming families after the harvest of the rabi crop [2][3]. The event, which takes place in paddy fields filled with slush, is held between November and March [2].
*   **Kobri Hori Sparde** Considered one of the state's most popular rural sports, Kobri Hori Sparde translates to "catch a running bull" [2]. The sport is held during Diwali and involves farmers and youngsters attempting to subdue a bull in order to tie dry coconuts around its sharpened horns [2].
*   **Kichchu Haisodu** This is a ritualistic sport associated with farming that is observed during Makara Sankranti in the old Mysuru region, including areas around Bengaluru, Mandya, and Hassan [2]. In this event, cattle are decorated, their horns are painted, and they are made to jump over a fire, a practice farmers believe kills insects on the animals' bodies [2]."""

x = extract_citations_new(response)

print("Response:\n" + response)
print("\nExtracted citations:")
for i in x:
    print(i)

print("\nPosition-adjusted word count = " + str(impression_wordpos_count_simple(x)))

Response:
Based on the provided information, three famous folklore sports in the state of Karnataka are Kambala, Kobri Hori Sparde, and Kichchu Haisodu [2].

*   **Kambala** Kambala is an annual buffalo race and a rural folk sport traditionally organized by farming families after the harvest of the rabi crop [2][3]. The event, which takes place in paddy fields filled with slush, is held between November and March [2].
*   **Kobri Hori Sparde** Considered one of the state's most popular rural sports, Kobri Hori Sparde translates to "catch a running bull" [2]. The sport is held during Diwali and involves farmers and youngsters attempting to subdue a bull in order to tie dry coconuts around its sharpened horns [2].
*   **Kichchu Haisodu** This is a ritualistic sport associated with farming that is observed during Makara Sankranti in the old Mysuru region, including areas around Bengaluru, Mandya, and Hassan [2]. In this event, cattle are decorated, their horns are painted, and they are ma

In [None]:
from utils import extract_citations_new, impression_subjective_impression, impression_wordpos_count_simple, impression_subjpos_detailed, impression_diversity_detailed, impression_uniqueness_detailed, impression_follow_detailed, impression_influence_detailed, impression_relevance_detailed, impression_subjcount_detailed, impression_pos_count_simple, impression_word_count_simple

IMPRESSION_FNS = {
	'simple_wordpos' : impression_wordpos_count_simple, 
	'simple_word' : impression_word_count_simple,
	'simple_pos' : impression_pos_count_simple,
	'subjective_score' : impression_subjective_impression,
	'subjpos_detailed' : impression_subjpos_detailed,
	'diversity_detailed' : impression_diversity_detailed,
	'uniqueness_detailed' : impression_uniqueness_detailed,
	'follow_detailed' : impression_follow_detailed,
	'influence_detailed' : impression_influence_detailed,
	'relevance_detailed' : impression_relevance_detailed,
	'subjcount_detailed' : impression_subjcount_detailed,
}

scores = [
            IMPRESSION_FNS['simple_wordpos'](extract_citations_new(response.text)), 
            IMPRESSION_FNS['subjpos_detailed'](response.text, query), 
            IMPRESSION_FNS['follow_detailed'](response.text, query), 
            IMPRESSION_FNS['influence_detailed'](response.text, query), 
            IMPRESSION_FNS['subjcount_detailed'](response.text, query)
        ]

print(scores)

[[0.0, 0.9073443266149541, 0.09265567338504593, 0.0, 0.0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]


In [1]:
import json

root = '../'
with open(root + 'optimization_dataset.json', 'r') as f:
    data = json.load(f)

print(data[0])
queries = []
tags = dict()
tagId = 0
for d in data:
    for t in d['tags']:
        if t not in tags:
            tags[t] = tagId
            tagId += 1

print(tags)

{'query': 'mention the names of any 3 famous folklore sports in karnataka state', 'sugg_idx': 3, 'tags': ['informational', 'simple', 'sports', 'non-technical', 'command', 'research', 'fact', 'non-sensitive'], 'sources': [{'url': 'https://en.wikipedia.org/wiki/Sports_in_Karnataka', 'raw_text': 'Sports in KarnatakaThis style and content may require cleanup to meet Wikipedia\'s quality standards. (November 2011)Cricket is by far the most popular sport in Karnataka with International cricket matches attracting a sizeable number of spectators who are willing to pay more than the standard ticket price to get a chance to watch the match.[1] The sports related infrastructure is mainly concentrated in Bangalore which also played host to the 4th National Games of India in the year 1997.[2] Bangalore is also the location of the Sports Authority of India (SAI) which is the premier sports institute in the country.[3] Karnataka is sometimes referred to as the cradle of Indian swimming because of hig

In [2]:
# Get queries tagged with a certain tag e.g. all 'shopping queries'
topicWiseQueries = dict()
for i, d in enumerate(data):
    for t in d['tags']:
        if t not in topicWiseQueries:
            topicWiseQueries[t] = []

        topicWiseQueries[t].append(i)

perTopicNumberOfQueries = dict()
for t in tags:
    perTopicNumberOfQueries[t] = len(topicWiseQueries[t])

print(perTopicNumberOfQueries)

for q in topicWiseQueries['food and drink']:
    print(data[q]['query'])

{'informational': 347, 'simple': 180, 'sports': 16, 'non-technical': 324, 'command': 27, 'research': 269, 'fact': 140, 'non-sensitive': 301, 'arts and entertainment': 56, 'question': 325, 'history': 29, 'historical': 49, 'computers and electronics': 12, 'technical': 83, 'statement': 11, 'intermediate': 177, 'autos and vehicles': 2, 'law and government': 29, 'food and drink': 18, 'books and literature': 19, 'navigational': 20, 'games': 2, 'health': 31, 'latest': 15, 'business and industrial': 11, 'science': 51, 'jobs and education': 5, 'people and society': 18, 'open-ended': 5, 'prediction': 11, 'explanation': 58, 'travel': 6, 'reference': 5, 'mathematics': 2, 'list': 1, 'geography': 3, 'complex': 45, 'biology': 10, 'transactional': 44, 'pets and animals': 4, 'purchase': 5, 'sensitive': 42, 'opinion': 108, 'internet and telecom': 2, 'physics': 12, 'philosophy': 3, 'comparison': 2, 'instructional': 1, 'learning': 2, 'guide': 3, 'debate': 73, 'finance': 4, 'home and garden': 1, 'incomplet

In [None]:
from google import genai
import time
import random

def getNumberOfQueriesForTopic(tag):
    return perTopicNumberOfQueries[tag]

def getQueriesForTopic(tag):
    queries = []

    for q in topicWiseQueries[tag]:
        queries.append((q, data[q]))
    
    return queries

def queryLLM(query, sources):
    query_prompt = """Write an accurate and concise answer for the given user question, using _only_ the provided summarized web 
    search results. The answer should be correct, high-quality, and written by an expert using an unbiased and journalistic tone. 
    The user's language of choice such as English, Français, Español, Deutsch, or 日本語 should be used. The answer should be 
    informative, interesting, and engaging. The answer's logic and reasoning should be rigorous and defensible. Every sentence 
    in the answer should be _immediately followed_ by an in-line citation to the search result(s). The cited search result(s) 
    should fully support _all_ the information in the sentence. Search results need to be cited using [index]. When citing 
    several search results, use [1][2][3] format rather than [1, 2, 3]. You can use multiple search results to respond 
    comprehensively while avoiding irrelevant search results.

    Question: {query}

    Search Results:
    {source_text}
    """

    response = ""
    source_text = ""
    client = genai.Client()

    for i, s in enumerate(sources):
        source_text += "Source " + str(i + 1) + ": " + s['cleaned_text']
        source_text += "\n\n\n\n"

    while True:
        try:
            print("Running Gemini")
            response = client.models.generate_content(
                model="gemini-2.5-pro", 
                contents=query_prompt.format(query = query, source_text = source_text),
            )
            break
        except Exception as e:
            # Fallback to 2.5 flash model if the specified one is unavailable
            if '404' in str(e) or 'not found' in str(e).lower():
                model = 'gemini-2.5-flash'
                client = genai.GenerativeModel(model)
                continue
            
            time.sleep(15)
            continue

    return response.text

def generateResponses(queryData):
    responses = []

    for q in queryData:
        content = q[1]
        responses.append(queryLLM(content['query'], content['sources']))

def sampleResponsesForTopic(topic, count):
    queryData = getQueriesForTopic(topic)
    samples = random.choice(queryData, count)
    return generateResponses(samples)

In [None]:
queries = getQueriesForTopic('food and drink')
for q in queries:
    print(q)

# 359, 'Where can I order a birthday cake online?'
# 334, 'What are the processes involved in brewing beer?'
# 323, 'What are the various steps involved in wine making?'
# 388, "Reserve a table at the top sushi restaurant in town for tomorrow's dinner"
# 349, 'What effects does fast food have on the body?'

preferred_queries = set({359, 334, 323, 388, 349})
for q in queries:
    id = q[0]
    content = q[1]

    if id == 359:
        print(queryLLM(content['query'], content['sources']))
        print("\n\n\n")

(8, {'query': 'where do the hearts of palm come from', 'sugg_idx': 1, 'tags': ['informational', 'simple', 'food and drink', 'non-technical', 'question', 'fact', 'research', 'non-sensitive'], 'sources': [{'url': 'https://atlantapalms.com/blogs/blog/where-is-the-palm-heart', 'raw_text': 'Most of us have tried, or at least heard of, a vegetable called the heart of palm. People eat it fresh or preserved all over the world. When eating the heart of a palm, also commonly called a palmito, chonta, swamp cabbage, or palm heart, you are eating the actual heart of the palm tree.Main Physical Characteristics of a PalmPalm trees, belonging to the family genus Arecaceae, are a type of evergreen tree characterized by branchless stems and a crown of fronds. These fronds fall into one of three styles: fan, feather, or entire. The stems of the palms are composed of a trunk or multiple trunks. This is where the heart of the palm is located: at the core of the trunk.The main stem in the palm is what keep

In [None]:
for q in queries:
    id = q[0]
    content = q[1]

    if id == 359:
        # print(len(content['sources']))
        # print(content['sources'][2])
        # print("\n\n\n")
        source_text = ""
        for i, s in enumerate(content['sources']):
            source_text += "Source " + str(i + 1) + ": " + s['cleaned_text']
            source_text += "\n\n\n\n"


{'url': 'https://milkbarstore.com/products/birthday-cake', 'raw_text': 'Birthday CakeThree thick cake layers, dreamy frosting, and colorful rainbow sprinkles unite in one of Chef Christina Tosi’s most iconic creations—the classic Milk Bar Birthday Cake. Order yours today, and get the party shipped right to your front door. All it takes is a few clicks to get delicious Milk Bar birthday cakes delivered in a snap....Read moreMilk Bar’s desserts are inspired by the nostalgic stuff we grew up getting at the supermarket, and the Birthday Cake is no exception. Boxed funfetti goes gourmet in this sweet vanilla treat—top it with candles (avoid a trip to the store by adding them to your birthday cake order online), and you’ve got a beloved memory made new again. Hosting a party? Go big with a ten-inch cake. You can make your birthday cake delivery even better by adding assorted Milk Bar treats, like cookies and truffles.Birthday cake delivery is sweet and simple with Milk Bar. We ship our cakes

In [60]:
import itertools
import math
import glob
import os
import re
import nltk
from glob import glob
# from dotenv import load_dotenv

# load_dotenv()
# genai.configure(api_key=os.environ.get('GEMINI_API_KEY', 'AIzaSyCQ7X7lN0gMWrdYSCM2WRKMToQNvE0E55Y'))

def get_num_words(line):
    return len([x for x in line if len(x)>2])

def extract_citations(text):
    def extract_citation(sentence):
        citation_pattern = r'\[[^\w\s]*\d+[^\w\s]*\]'
        return [int(re.findall(r'\d+', citation)[0]) for citation in re.findall(citation_pattern, sentence)]

    # Helper to clean the sentence string before tokenizing.
    def clean_and_tokenize(sentence):
        # Pattern to find citation markers like [1] or [2, 3].
        citation_pattern = r'\[[^\w\s]*\d+[^\w\s]*\]'
        
        # 1. Remove the citation markers from the string.
        text_no_citations = re.sub(citation_pattern, '', sentence)
        
        # 2. Remove punctuation. The pattern [^\w\s] matches any character
        #    that is NOT a word character (a-z, A-Z, 0-9, _) or whitespace.
        text_no_punctuation = re.sub(r'[^\w\s]', '', text_no_citations)
        
        # 3. Tokenize the fully cleaned string.
        return nltk.word_tokenize(text_no_punctuation)

    paras = re.split(r'\n\n', text)

    # Split each paragraph into sentences
    sentences = [nltk.sent_tokenize(p) for p in paras]

    # Split each sentence into words
    words = [[(clean_and_tokenize(s), s, extract_citation(s)) for s in sentence] for sentence in sentences]
    return list(itertools.chain(*words))

def getVisibilityScores(response, query, sources):
    scores = []
    formattedResponse = extract_citations(response)
    print("Formatted response: " + str(formattedResponse))
    citationCount = len(sources)
    positionAdjustedWordCountScores = impression_wordpos_count_simple(formattedResponse, citationCount)
    print("Position adjusted word count scores: " + str(positionAdjustedWordCountScores))

    subjectiveScores = impression_subjective_impression(formattedResponse, query, 0)
    scores[0].update(subjectiveScores)
    scores[0]['position_adjusted_word_count_score'] = positionAdjustedWordCountScores[0]

    # for citation_index in range(citationCount):
    #     scores.append(dict())
    #     subjectiveScores = impression_subjective_impression(formattedResponse, query, citation_index)
    #     scores[citation_index].update(subjectiveScores)
    #     scores[citation_index]['position_adjusted_word_count_score'] = positionAdjustedWordCountScores[citation_index]
    
    return scores

def impression_wordpos_count_simple(sentences, citationCount = 5, normalize = True):
    print("Calculating position-adjusted word count")
    sentenceCount = len(sentences)
    scores = [0] * citationCount
    print(scores)

    for i, sentence in enumerate(sentences):
        words = sentence[0]
        citations = sentence[2]

        for c in citations:
            score = get_num_words(words)
            score *= math.exp(-1 * i / (sentenceCount - 1)) if sentenceCount > 1 else 1
            score /= len(citations)

            try: scores[c - 1] += score
            except: print(f'Citation Hallucinated: {c}')

    return [x / sum(scores) for x in scores] if normalize and sum(scores) != 0 else [1/citationCount for _ in range(citationCount)] if normalize else scores

def impression_word_count_simple(sentences, citationCount = 5, normalize = True):
    scores = [0] * citationCount

    for sentence in sentences:
        words = sentence[0]
        citations = sentence[2]

        for c in citations:
            score = get_num_words(words)
            score /= len(citations)

            try: scores[c - 1] += score
            except: print(f'Citation Hallucinated: {c}')

    return [x / sum(scores) for x in scores] if normalize and sum(scores) != 0 else [1/citationCount for _ in range(citationCount)] if normalize else scores
    

def impression_pos_count_simple(sentences, citationCount = 5, normalize = True):
    sentenceCount = len(sentences)
    scores = [0] * citationCount

    for i, sentence in enumerate(sentences):
        citations = sentence[2]

        for c in citations:
            score = 1
            score *= math.exp(-1 * i / (sentenceCount - 1)) if sentenceCount > 1 else 1
            score /= len(citations)

            try: scores[c - 1] += score
            except: print(f'Citation Hallucinated: {c}')

    return [x / sum(scores) for x in scores] if normalize and sum(scores) != 0 else [1/citationCount for _ in range(citationCount)] if normalize else scores


def impression_subjective_impression(sentences, query, citation_index = 0):
    print("Calculating subjective impression score")
    def convert_to_number(x, min_value = 1.0):
        try: return max(min(5, float(x)), min_value)
        except: return min_value

    scores = dict()
    client = genai.Client()

    for prompt_file in glob('../geval_prompts/*.txt'):
        metric_name = os.path.split(prompt_file)[-1].split('.')[0]
        print("Running prompt for metric: " + metric_name)
        prompt = open(prompt_file).read()
        prompt = prompt.replace('[1]', f'[{citation_index + 1}]')
        formatted_prompt = prompt.format(query = query, answer = sentences) + "\n\nRespond with only a single number from 1 to 5."

        while True:
            try:
                # Ask Gemini to return a single number 1-5
                print("Running Gemini")
                response = client.models.generate_content(
                    model="gemini-2.5-pro", 
                    contents=formatted_prompt,
                )

                response_text = (response.text or '').strip()
                
                if response_text == '':
                    print('Empty response from Gemini-Eval', e)
                    continue

                print("Response from Gemini: " + response_text)

                try:
                    score = convert_to_number(response_text)
                except:
                    nums = re.findall(r"\d+\.?\d*", response_text)
                    score = convert_to_number(nums[0]) if nums else 3.0

                
                scores[metric_name] = score
                print("Score for metric \"" + metric_name + "\" = " + str(score))

                break
            except Exception as e:
                print('Error in Gemini-Eval', e)
                # Fallback to 2.5 flash model if the specified one is unavailable
                if '404' in str(e) or 'not found' in str(e).lower():
                    print('Shifting to Gemini Flash', e)
                    model = 'gemini-2.5-flash'
                    client = genai.GenerativeModel(model)
                    continue
                
                time.sleep(15)
                continue

    return scores

In [None]:
queries = getQueriesForTopic('food and drink')
# 359, 'Where can I order a birthday cake online?'
# 334, 'What are the processes involved in brewing beer?'
# 323, 'What are the various steps involved in wine making?'
# 388, "Reserve a table at the top sushi restaurant in town for tomorrow's dinner"
# 349, 'What effects does fast food have on the body?'

preferred_queries = set({359, 334, 323, 388, 349})
for query in queries:
    id = q[0]
    content = q[1]

    if id == 359:
        # response = queryLLM(content['query'], content['sources'])
        response = """You can order birthday cakes online from a variety of bakeries and purveyors that ship nationwide [3][5].

Several retailers offer a wide selection of cakes for delivery [1][5]. Harry & David is a one-stop shop for gifts and celebrations, providing an assortment of gourmet cakes, including festive birthday options and cake bites [1][5]. For a particularly large variety, Bake Me a Wish sells classic cakes that can be personalized, as well as coffee cakes and cheesecakes like the Tiramisu Classico [4][5]. Williams Sonoma is a specialty food shop that allows you to order from famous U.S. bakeries like Georgetown Cupcake and We Take the Cake, in addition to its own in-house seasonal options [5].

Some bakeries are well-known for specific, signature cakes [3][5]. Milk Bar, founded by celebrity baker Christina Tosi, is famous for its Birthday Cake and is credited with reviving the popularity of "funfetti" [3][5]. The company ships its cakes, pies, and truffles nationwide using a specialized cryogenic packing method [3][5]. From Spartanburg, South Carolina, Caroline's Cakes is a favorite for classic southern recipes, including a 7 Layer Caramel Cake [5]. The Japanese-French bakery Lady M Confections specializes in Mille Crêpes cakes, which are constructed from twenty thin crepes layered with pastry cream [5].

A number of New York-based bakeries also deliver their products [5]. Junior's Cheesecake, an NYC staple for 70 years, ships its signature spongecake-crusted cheesecakes [5]. Magnolia Bakery, made famous by the show *Sex and the City*, offers a variety of full-sized layer cakes, flourless cakes, and its popular banana pudding [5].

For those with dietary restrictions, Karma Baker offers cakes that are both gluten-free and vegan, with flavors like lemon raspberry and chocolate mousse tuxedo [5]. Gigi's Little Kitchen in Brooklyn also creates dairy-free and vegan cakes in unique flavors such as lavender and matcha, but its delivery is restricted to the New York City area [5]."""
        scores = getVisibilityScores(response, query, content['sources'])
        
        print("Response:\n")
        print(response)
        print(scores)
        print("\n\n\n")
        break

Formatted response: [(['You', 'can', 'order', 'birthday', 'cakes', 'online', 'from', 'a', 'variety', 'of', 'bakeries', 'and', 'purveyors', 'that', 'ship', 'nationwide'], 'You can order birthday cakes online from a variety of bakeries and purveyors that ship nationwide [3][5].', [3, 5]), (['Several', 'retailers', 'offer', 'a', 'wide', 'selection', 'of', 'cakes', 'for', 'delivery'], 'Several retailers offer a wide selection of cakes for delivery [1][5].', [1, 5]), (['Harry', 'David', 'is', 'a', 'onestop', 'shop', 'for', 'gifts', 'and', 'celebrations', 'providing', 'an', 'assortment', 'of', 'gourmet', 'cakes', 'including', 'festive', 'birthday', 'options', 'and', 'cake', 'bites'], 'Harry & David is a one-stop shop for gifts and celebrations, providing an assortment of gourmet cakes, including festive birthday options and cake bites [1][5].', [1, 5]), (['For', 'a', 'particularly', 'large', 'variety', 'Bake', 'Me', 'a', 'Wish', 'sells', 'classic', 'cakes', 'that', 'can', 'be', 'personalized