# Metric for Citation in Generative LLMs

In this notebook, we propose a method to evaluate the responses of LLMs based on their citations.

**_Note:_** We use some referee LLMs (mostly GPT3.5 Turbo), in some sections of this pipeline. To query them and get their response automatically, we use [poe-api-wrapper](https://github.com/snowby666/poe-api-wrapper) Python library.

It has some limitations for querying LLMs. You have 3000 points each day for every unique account. **GPT3.5** costs 20 points per message, **GPT4-o** costs 300 points per message, etc. Therefore, you may reach the limit error for them, so you should change the tokens used in the code, with yours or others:

### How to get your Token

#### Getting p-b and p-lat cookies (*required*)
Sign in at https://poe.com/

F12 for Devtools (Right-click + Inspect)
- Chromium: Devtools > Application > Cookies > poe.com
- Firefox: Devtools > Storage > Cookies
- Safari: Devtools > Storage > Cookies

Copy the values of `p-b` and `p-lat` cookies

## Install Pre-requirements

In [1]:
! sudo apt-get install build-essential libssl-dev libffi-dev python3-dev
! python -m venv myenv
! source myenv/bin/activate
! pip install poe-api-wrapper

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
build-essential is already the newest version (12.9ubuntu3).
libffi-dev is already the newest version (3.4.2-4).
libssl-dev is already the newest version (3.0.2-0ubuntu1.17).
python3-dev is already the newest version (3.10.6-1~22.04).
0 upgraded, 0 newly installed, 0 to remove and 45 not upgraded.
The virtual environment was not created successfully because ensurepip is not
available.  On Debian/Ubuntu systems, you need to install the python3-venv
package using the following command.

    apt install python3.10-venv

You may need to use sudo with that command.  After installing the python3-venv
package, recreate your virtual environment.

Failing command: /content/myenv/bin/python3

/bin/bash: line 1: myenv/bin/activate: No such file or directory


## Extract Cited Sentences with their URLs

Here, we developed methods for **Copilot** and **Perplexity.ai** models to convert their responses to our defined structure, which is explained here.

In [8]:
import re

def copilot_extract_citations(text):
    # Define the pattern to match cited sentences ending with UTF-8 encoded superscript numbers
    sentence_pattern = r'([^!?.]*?[\u00b2\u00b3\u00b9\u2074-\u2079]+)\.'

    # Define the pattern to match URLs with their corresponding superscript numbers
    url_pattern = r'([\u00b2\u00b3\u00b9\u2074-\u2079]): \[.*?\]\((https?://[^\s]+)\)'

    # Find all cited sentences in the text
    cited_sentences = re.findall(sentence_pattern, text)

    # Find all URLs and their superscript numbers in the text
    urls_with_superscripts = re.findall(url_pattern, text)

    # Create a dictionary to map superscript numbers to URLs
    url_dict = {superscript: url for superscript, url in urls_with_superscripts}

    # Create a list of tuples with cited sentences and their corresponding URLs
    citations = []
    for sentence in cited_sentences:
        # Remove leading whitespace and superscripts from the end of the sentence
        cleaned_sentence = re.sub(r'[\u00b2\u00b3\u00b9\u2074-\u2079]+$', '', sentence).strip()

        # Find all superscript numbers in the original sentence
        superscripts = re.findall(r'[\u00b2\u00b3\u00b9\u2074-\u2079]', sentence)

        # Map the superscript numbers to their corresponding URLs
        urls = [url_dict[superscript] for superscript in superscripts if superscript in url_dict]

        citations.append((cleaned_sentence, *urls))

    return citations

def perplexityAI_extract_citations(text):
    # Define the pattern to match paragraphs that contain citation numbers in brackets
    paragraph_pattern = r'([^\n]+?\[(\d+)\](\[\d+\])*)'

    # Define the pattern to match URLs with their corresponding citation numbers
    url_pattern = r'\[(\d+)\] (https?://[^\s]+)'

    # Find all paragraphs with citations in the text
    cited_paragraphs = re.findall(paragraph_pattern, text)

    # Find all URLs and their citation numbers in the text
    urls_with_numbers = re.findall(url_pattern, text)

    # Create a dictionary to map citation numbers to URLs
    url_dict = {number: url for number, url in urls_with_numbers}

    # Create a list of tuples with cleaned paragraphs and their corresponding URLs
    citations = []
    for paragraph in cited_paragraphs:
        paragraph = paragraph[0]
        # Find all citation numbers in the paragraph
        citation_numbers = re.findall(r'\[(\d+)\]', paragraph)

        # Map the citation numbers to their corresponding URLs
        urls = [url_dict[number] for number in citation_numbers if number in url_dict]

        # Clean the paragraph by removing the citation brackets
        cleaned_paragraph = re.sub(r'\[(\d+)\]', '', paragraph).strip()

        # Append the cleaned paragraph along with all corresponding URLs
        citations.append((cleaned_paragraph, *urls))

    return citations

# Example text
text = """
The operation to kill Osama bin Laden, known as Operation Neptune Spear, was a significant and covert military mission carried out by the United States. Here is a detailed account of the operation:

### Planning and Intelligence
The operation was the culmination of years of intelligence work. In September 2010, the CIA identified a compound in Abbottabad, Pakistan, believed to be housing bin Laden. This was based on surveillance photos and intelligence reports indicating that a known al-Qaeda courier was visiting the compound. Despite the lack of conclusive evidence that bin Laden was present, the intelligence was deemed strong enough to justify an operation[5].

### Execution of the Raid
The mission was executed by the Red Squadron of U.S. Navy SEAL Team Six, chosen for their extensive experience and specialized skills. The SEALs were transported by two helicopters piloted by Army aviators from a U.S. base in Jalalabad, Afghanistan, to the compound in Pakistan. The mission commenced on May 1, 2011, at 10:30 p.m. local time[2].

Upon arrival, one helicopter experienced instability and made a hard landing inside the compound, but the SEALs continued the mission without injury. The team engaged in a firefight as they moved through the compound. They encountered resistance and killed several combatants, including bin Laden's couriers and his son, Khalid[2][5].

### Killing of Osama bin Laden
Osama bin Laden was found on the third floor of the main building. He was killed in his bedroom, where he was found with at least one weapon nearby. The SEALs collected documents and electronics for intelligence purposes before destroying the downed helicopter to protect its stealth technology[2].

### Aftermath and Confirmation
The SEALs spent about 45 minutes on the ground before departing with bin Laden's body. In Afghanistan, his identity was confirmed through DNA analysis, fingerprinting, and facial recognition. The body was then flown to the USS Carl Vinson in the Arabian Sea, where bin Laden was buried at sea following Islamic funeral rites to prevent his gravesite from becoming a shrine[2][5].

### Significance
The operation marked a defining moment in U.S. military history, ending the life of the mastermind behind the September 11, 2001, terror attacks. It was a testament to the intelligence and military collaboration that enabled the successful execution of such a high-stakes mission[5].

Citations:
[1] https://www.fbi.gov/history/famous-cases/osama-bin-laden
[2] https://www.911memorial.org/learn/resources/digital-exhibitions/digital-exhibition-revealed-hunt-bin-laden/operation-neptune-spear
[3] https://www.defense.gov/News/News-Stories/Article/Article/2234142/ai-gleaned-information-about-emerging-threats-future-plots-from-bin-laden-raid/
[4] https://www.dni.gov/index.php/features/bin-laden-s-bookshelf
[5] https://www.military.com/history/osama-bin-laden-operation-neptune-spear
"""

citations = perplexityAI_extract_citations(text)

for item in citations:
  print("LLM's Sentences:")
  print(item[0])
  print('\nURLs of Corresponding References:')
  for url in item[1:]:
    print(url)

  print('\n\n')

LLM's Sentences:
The operation was the culmination of years of intelligence work. In September 2010, the CIA identified a compound in Abbottabad, Pakistan, believed to be housing bin Laden. This was based on surveillance photos and intelligence reports indicating that a known al-Qaeda courier was visiting the compound. Despite the lack of conclusive evidence that bin Laden was present, the intelligence was deemed strong enough to justify an operation

URLs of Corresponding References:
https://www.military.com/history/osama-bin-laden-operation-neptune-spear



LLM's Sentences:
The mission was executed by the Red Squadron of U.S. Navy SEAL Team Six, chosen for their extensive experience and specialized skills. The SEALs were transported by two helicopters piloted by Army aviators from a U.S. base in Jalalabad, Afghanistan, to the compound in Pakistan. The mission commenced on May 1, 2011, at 10:30 p.m. local time

URLs of Corresponding References:
https://www.911memorial.org/learn/resour

## Extract Atomic Facts

We convert each couple of continuous sentneces with same citations to its atomic facts, using a specific prompt to get our pre-defined structure for atomic facts.

In [9]:
from poe_api_wrapper import AsyncPoeApi, PoeApi
import asyncio
import time

#tokens = {
#    'p-b': 'fmU9cikJdfOeXdy8aT7YpQ%3D%3D',
#    'p-lat': 'nwDTrz49byQ0Bzn7B5C8me|2024-08-09T02:40:26.818Z',
#}

tokens = {
    'p-b': 'vncQfzRB2bhM4GZ87IkEWQ%3D%3D',
    'p-lat': 't4gMytQTSDP%2Fls9Cvq3DbTXsBbC%2Ft8FO57HLLiwd2g%3D%3D',
}

client = await AsyncPoeApi(tokens=tokens).create()
bot = "gpt4_o_mini"

async def process_atomic_facts(text: str) -> str:
    """
    Extracts atomic fatcs of the sentence(s).
    Uses 'GPT3_5' model'.
    """
    response = ""  # Initialize an empty string to store the response

    # Prepare the prompts for evaluation
    message = f"The text below is a response stated by a language model. Please extract all of its atomic facts, such that atomic fact sentences are independent from each other. Your response should just have atomic facts in each line and no extra sentence or character:\nText:\n\n{text}"

    # Send the message and collect the response chunks
    async for chunk in client.send_message(bot=bot, message=message):
        response += chunk["response"]  # Append each chunk to the response variable

    # Delete all chats of a bot
    client.delete_chat(bot, del_all=True)

    return response # Return the response for further processing

def convert_to_numbered_list(text):
    # Split text into lines
    lines = text.strip().split('\n')

    # Initialize an empty list to store the processed lines
    processed_lines = []

    # Iterate over the lines
    for idx, line in enumerate(lines):
        # Remove bullet points or numbering if present
        line = re.sub(r"^\s*[-\d\.]+\s*", "", line)

        # Add the current line with numbering
        processed_lines.append(f"{idx + 1}. {line.strip()}")

    # Join the processed lines with newlines and return the result
    return '\n'.join(processed_lines)


def atomic_facts_replacer(citations):
  atomic_replaced = []
  for cited_sentences in citations:
    sentences = cited_sentences[0]
    atomic_facts = asyncio.run(process_atomic_facts(sentences))
    atomic_facts = convert_to_numbered_list(atomic_facts)
    urls = cited_sentences[1:]
    atomic_replaced.append((atomic_facts, *urls))
    time.sleep(5)

  return atomic_replaced

# Delete all chats of a bot
client.delete_chat(bot, del_all=True)

atomic_replaced = atomic_facts_replacer(citations)

for i, item in enumerate(atomic_replaced):
  print("LLM's Sentences:")
  print(citations[i][0])
  print('\nCorresponding Atomic Facts:')
  for fact in item[0].splitlines():
    print(fact)

  print('\n\n')

[32m2024-08-16 16:50:53.792[0m | [1mINFO    [0m | [36mpoe_api_wrapper.bundles[0m:[36minit_window[0m:[36m21[0m - [1mInitializing web data[0m
[32m2024-08-16 16:50:54.182[0m | [1mINFO    [0m | [36mpoe_api_wrapper.bundles[0m:[36minit_window[0m:[36m41[0m - [1mWeb data initialized[0m
[32m2024-08-16 16:50:54.188[0m | [1mINFO    [0m | [36mpoe_api_wrapper.bundles[0m:[36mget_form_key[0m:[36m82[0m - [1mRetrieved formkey successfully: e067c7f772f914a38f8423f4aff15e8a[0m
[32m2024-08-16 16:50:54.948[0m | [1mINFO    [0m | [36mpoe_api_wrapper.async_api[0m:[36mcreate[0m:[36m89[0m - [1mAsync instance created[0m
  client.delete_chat(bot, del_all=True)
[32m2024-08-16 16:50:55.904[0m | [1mINFO    [0m | [36mpoe_api_wrapper.async_api[0m:[36msend_message[0m:[36m782[0m - [1mNew Thread created | 3imbjb05k820aawm7qu[0m
  client.delete_chat(bot, del_all=True)
[32m2024-08-16 16:51:04.081[0m | [1mINFO    [0m | [36mpoe_api_wrapper.async_api[0m:[36mse

LLM's Sentences:
The operation was the culmination of years of intelligence work. In September 2010, the CIA identified a compound in Abbottabad, Pakistan, believed to be housing bin Laden. This was based on surveillance photos and intelligence reports indicating that a known al-Qaeda courier was visiting the compound. Despite the lack of conclusive evidence that bin Laden was present, the intelligence was deemed strong enough to justify an operation

Corresponding Atomic Facts:
1. The operation was the culmination of years of intelligence work.
2. In September 2010, the CIA identified a compound in Abbottabad, Pakistan.
3. The compound was believed to be housing bin Laden.
4. The identification was based on surveillance photos.
5. The identification was based on intelligence reports.
6. The reports indicated that a known al-Qaeda courier was visiting the compound.
7. There was a lack of conclusive evidence that bin Laden was present.
8. The intelligence was deemed strong enough to jus

## Extract Text Contents of Webpages

In this section, we scrape each URL, and replace that with the text content of its webpage. It may has some advertising, or other unrelated texts.

In [10]:
import requests
from bs4 import BeautifulSoup, Comment
import re
from typing import Optional

class Webscraper:
    def __init__(self):
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:79.0) Gecko/20100101 Firefox/79.0',
            'Referer': 'https://www.google.com/'
        }
        self.google_headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:79.0) Gecko/20100101 Firefox/79.0',
            'Host': 'www.google.com',
            'Referer': 'https://www.google.com/'
        }

    def _get_source(self, url: str, is_google=False) -> requests.Response:
        headers = self.google_headers if is_google else self.headers
        return requests.get(url, headers=headers, timeout=10, allow_redirects=False)

    def get_content(self, url: str) -> Optional[str]:
        try:
            response = self._get_source(url)
            response.raise_for_status()  # Raises HTTPError for bad responses
        except requests.exceptions.RequestException as e:
            print(f"Error fetching the URL: {e}")
            return None

        if response.status_code != 200:
            print(f"Non-200 status code received: {response.status_code}")
            return None

        return self.text_from_html(response.text)

    @classmethod
    def tag_visible(cls, element):
        if element.parent.name in ['style', 'script', 'head', 'title', 'meta', '[document]']:
            return False
        if isinstance(element, Comment):
            return False
        return True

    def text_from_html(self, body):
        soup = BeautifulSoup(body, 'html.parser')
        texts = soup.find_all(string=True)  # Use string=True instead of text=True
        visible_texts = filter(self.tag_visible, texts)
        return re.sub(' +', ' ', " ".join(t.strip() for t in visible_texts)).strip()

In [12]:
# Initialize the Webscraper instance
scraper = Webscraper()

# Function to replace URLs with scraped text content
def replace_urls_with_content(data):
    updated_data = []

    for item in data:
        new_item = []
        # Loop through each element in the tuple
        for element in item:
            if isinstance(element, str) and element.startswith("http"):
                # Scrape the content of the URL
                content = scraper.get_content(element)
                if content:
                    # Replace URL with content
                    new_item.append(content)
                else:
                    new_item.append("")  # Put empty string if scraping failed
            else:
                new_item.append(element)

        updated_data.append(tuple(new_item))

    return updated_data

# Replace URLs with the scraped text content
updated_scraped_data = replace_urls_with_content(atomic_replaced)

for item in updated_scraped_data:
  print('Atomic Facts:')
  for fact in item[0].splitlines():
    print(fact)

  print('\nCorresponding Webpage Contents:')
  for i, content in enumerate(item[1:]):
      print(f'Text Content of Webpage {i+1}:\n{content}')

  print('\n\n')

Atomic Facts:
1. The operation was the culmination of years of intelligence work.
2. In September 2010, the CIA identified a compound in Abbottabad, Pakistan.
3. The compound was believed to be housing bin Laden.
4. The identification was based on surveillance photos.
5. The identification was based on intelligence reports.
6. The reports indicated that a known al-Qaeda courier was visiting the compound.
7. There was a lack of conclusive evidence that bin Laden was present.
8. The intelligence was deemed strong enough to justify an operation.

Corresponding Webpage Contents:
Text Content of Webpage 1:
Profile Profile Resumes Cover Letters Jobs I've Applied To Saved Jobs Saved Searches Subscriptions Log out News News Home Army Navy Air Force Marine Corps Coast Guard Space Force Military Podcasts Opinion Videos Benefits Benefits Home Military Pay and Money GI Bill Veteran Health Care Tricare VA Loans Insurance Retirement VA eBenefits Veteran Jobs Veteran Job Search Military Skills Transl

## Check Remained Number of Messages for each Model

In [13]:
tokens = {
    'p-b': 'vncQfzRB2bhM4GZ87IkEWQ%3D%3D',
    'p-lat': 't4gMytQTSDP%2Fls9Cvq3DbTXsBbC%2Ft8FO57HLLiwd2g%3D%3D',
}

client = PoeApi(tokens=tokens)
bot = 'gpt4_o_mini'
print(client.get_botInfo(handle=bot))

[32m2024-08-16 16:53:48.390[0m | [1mINFO    [0m | [36mpoe_api_wrapper.bundles[0m:[36minit_window[0m:[36m21[0m - [1mInitializing web data[0m
[32m2024-08-16 16:53:48.734[0m | [1mINFO    [0m | [36mpoe_api_wrapper.bundles[0m:[36minit_window[0m:[36m41[0m - [1mWeb data initialized[0m
[32m2024-08-16 16:53:48.738[0m | [1mINFO    [0m | [36mpoe_api_wrapper.bundles[0m:[36mget_form_key[0m:[36m82[0m - [1mRetrieved formkey successfully: e067c7f772f914a38f8423f4aff15e8a[0m


{'handle': 'GPT-4o-Mini', 'model': 'gpt4_o_mini', 'supportsFileUpload': True, 'messageTimeoutSecs': 15, 'displayMessagePointPrice': 15, 'numRemainingMessages': 126, 'viewerIsCreator': False, 'id': 'Qm90OjMwMTc='}


## Verify Atomic Facts by their References

Here, based on some specific prompting, we verify and validate the atomic facts provided by the LLM to see if it is stated from the cited references correctly, or not.
Also, working further on prompting and dividing which model to use for different kinds of queries (based on length)

Finally, for each couple of sentences with the same citation, we output a binary vector, in which each element shows validation of its corresponding atomic fact:

- **0** if the atomic fact is **Falsely** stated from the cited webpages.
- **1** if the atomic fact is **Truely** stated from the cited webpages.

In [15]:
from poe_api_wrapper import AsyncPoeApi
import asyncio

tokens = {
    'p-b': 'vncQfzRB2bhM4GZ87IkEWQ%3D%3D',
    'p-lat': 't4gMytQTSDP%2Fls9Cvq3DbTXsBbC%2Ft8FO57HLLiwd2g%3D%3D',
}

client = await AsyncPoeApi(tokens=tokens).create()

async def process_message(message: str) -> str:
    """
    Sends a message to the Poe API and returns the response.
    Uses 'GPT-4o-mini' if the message length is less than 3000 words,
    otherwise uses 'GPT-4o-Mini-128k'.
    """
    response = ""  # Initialize an empty string to store the response

    # Determine the model to use based on the message length
    if len(message.split()) < 2500:
        bot = "gpt4_o_mini"
    else:
        bot = "gpt4_o_mini_128k"

    # Send the message and collect the response chunks
    async for chunk in client.send_message(bot=bot, message=message):
        response += chunk["response"]  # Append each chunk to the response variable

    # Delete all chats of a bot
    client.delete_chat(bot, del_all=True)

    return response  # Return the response for further processing

def convert_to_binary_list(text):
    # Split the text into lines
    lines = text.strip().split('\n')

    # Initialize an empty list for the binary values
    binary_list = []

    # Iterate through each line
    for line in lines:
        # Check if the line contains 'True' or 'False'
        if 'True' in line:
            binary_list.append(1)
        else:
            binary_list.append(0)

    return binary_list

def validator(response_with_cites):
  validations = []
  for item in response_with_cites:
    atomic_facts = item[0]
    webpage_contents = '\n'.join(list(item[1:]))
    # Example usage
    message = f"""
    Here are some enumerated atomic facts that are extracted from a response of an LLM:
    {atomic_facts}


    The response has some citations. I have extracted the text contents of the cited webpages which may have advertisements, named of icons of the website, and another non-related text contents mixed with related text content. You should explore these text contents and tell me if each atomic fact is stated and confirmed by the text contents or not. Your answer should be exactly like this and has no extra characters:
    1. True
    2. False
    3. True
    which means for example we had 3 atomic facts and first and third are True, but second is False.
    Text Contents:


    {webpage_contents}
    """

    final_response = asyncio.run(process_message(message))
    validations.append(convert_to_binary_list(final_response))
    time.sleep(5)

  return validations

validations = validator(updated_scraped_data)

for i, item in enumerate(updated_scraped_data):
  print('Atomic Facts:')
  for fact in item[0].splitlines():
    print(fact)

  print(f'\nCorresponding Validation Vector:\n{validations[i]}')

  print('\n\n')


[32m2024-08-16 17:10:42.260[0m | [1mINFO    [0m | [36mpoe_api_wrapper.bundles[0m:[36minit_window[0m:[36m21[0m - [1mInitializing web data[0m
[32m2024-08-16 17:10:42.607[0m | [1mINFO    [0m | [36mpoe_api_wrapper.bundles[0m:[36minit_window[0m:[36m41[0m - [1mWeb data initialized[0m
[32m2024-08-16 17:10:42.613[0m | [1mINFO    [0m | [36mpoe_api_wrapper.bundles[0m:[36mget_form_key[0m:[36m82[0m - [1mRetrieved formkey successfully: e067c7f772f914a38f8423f4aff15e8a[0m
[32m2024-08-16 17:10:43.379[0m | [1mINFO    [0m | [36mpoe_api_wrapper.async_api[0m:[36mcreate[0m:[36m89[0m - [1mAsync instance created[0m


2660


[32m2024-08-16 17:10:45.158[0m | [1mINFO    [0m | [36mpoe_api_wrapper.async_api[0m:[36msend_message[0m:[36m782[0m - [1mNew Thread created | 3imagrkfsfdyxarcz7g[0m
  client.delete_chat(bot, del_all=True)


1562


[32m2024-08-16 17:10:52.408[0m | [1mINFO    [0m | [36mpoe_api_wrapper.async_api[0m:[36msend_message[0m:[36m782[0m - [1mNew Thread created | 3imahomjn6swegad8qt[0m


4000


[32m2024-08-16 17:11:00.539[0m | [1mINFO    [0m | [36mpoe_api_wrapper.async_api[0m:[36msend_message[0m:[36m782[0m - [1mNew Thread created | 3imai0zk0kh5mx0kzue[0m


1547


[32m2024-08-16 17:11:08.637[0m | [1mINFO    [0m | [36mpoe_api_wrapper.async_api[0m:[36msend_message[0m:[36m782[0m - [1mNew Thread created | 3imaforeslr0j7i1g6t[0m


4024


[32m2024-08-16 17:11:16.810[0m | [1mINFO    [0m | [36mpoe_api_wrapper.async_api[0m:[36msend_message[0m:[36m782[0m - [1mNew Thread created | 3imaff32ddhvaa1pipn[0m


2623


[32m2024-08-16 17:11:25.126[0m | [1mINFO    [0m | [36mpoe_api_wrapper.async_api[0m:[36msend_message[0m:[36m782[0m - [1mNew Thread created | 3imafz6ofbezr05sp97[0m


Atomic Facts:
1. The operation was the culmination of years of intelligence work.
2. In September 2010, the CIA identified a compound in Abbottabad, Pakistan.
3. The compound was believed to be housing bin Laden.
4. The identification was based on surveillance photos.
5. The identification was based on intelligence reports.
6. The reports indicated that a known al-Qaeda courier was visiting the compound.
7. There was a lack of conclusive evidence that bin Laden was present.
8. The intelligence was deemed strong enough to justify an operation.

Corresponding Validation Vector:
[1, 1, 1, 1, 1, 1, 1, 1]



Atomic Facts:
1. The mission was executed by the Red Squadron of U.S. Navy SEAL Team Six.
2. Red Squadron was chosen for their extensive experience and specialized skills.
3. The SEALs were transported by two helicopters.
4. The helicopters were piloted by Army aviators.
5. The helicopters were flown from a U.S. base in Jalalabad, Afghanistan.
6. The destination was a compound in Pakist

## Determine Final Score

Finally, we calculate our final score to the LLM's response using a linearly decreasing attenstion, for the propagation problem may affect the later sentences of an LLM's response more.

In [16]:
import numpy as np

def weighted_average(validations):
    """Calculate weighted average with linearly decaying attention."""
    scores = [sum(score)/len(score) for score in validations]

    n = len(scores)

    # Create linearly decaying weights (e.g., [n, n-1, ..., 1])
    attention = np.arange(n, 0, -1)

    # Calculate weighted average
    weighted_avg = np.dot(scores, attention) / np.sum(attention)

    return weighted_avg

# Calculate weighted average of the scores
weighted_avg_score = weighted_average(validations)

print(f"Final Score: {weighted_avg_score:.4f}")


Final Score: 1.0000
