# Metric for Citation in Generative LLMs

In this notebook, we propose a method to evaluate the responses of LLMs based on their citations.

**_Note:_** We use some referee LLMs (mostly GPT3.5 Turbo), in some sections of this pipeline. To query them and get their response automatically, we use [poe-api-wrapper](https://github.com/snowby666/poe-api-wrapper) Python library.

It has some limitations for querying LLMs. You have 3000 points each day for every unique account. **GPT3.5** costs 20 points per message, **GPT4-o** costs 300 points per message, etc. Therefore, you may reach the limit error for them, so you should change the tokens used in the code, with yours or others:

### How to get your Token

#### Getting p-b and p-lat cookies (*required*)
Sign in at https://poe.com/

F12 for Devtools (Right-click + Inspect)
- Chromium: Devtools > Application > Cookies > poe.com
- Firefox: Devtools > Storage > Cookies
- Safari: Devtools > Storage > Cookies

Copy the values of `p-b` and `p-lat` cookies

## Install Pre-requirements

In [17]:
! sudo apt-get install build-essential libssl-dev libffi-dev python3-dev
! python -m venv myenv
! source myenv/bin/activate
! pip install poe-api-wrapper

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
build-essential is already the newest version (12.9ubuntu3).
libffi-dev is already the newest version (3.4.2-4).
libssl-dev is already the newest version (3.0.2-0ubuntu1.17).
python3-dev is already the newest version (3.10.6-1~22.04).
0 upgraded, 0 newly installed, 0 to remove and 45 not upgraded.
The virtual environment was not created successfully because ensurepip is not
available.  On Debian/Ubuntu systems, you need to install the python3-venv
package using the following command.

    apt install python3.10-venv

You may need to use sudo with that command.  After installing the python3-venv
package, recreate your virtual environment.

Failing command: /content/myenv/bin/python3

/bin/bash: line 1: myenv/bin/activate: No such file or directory


## Extract Cited Sentences with their URLs

Here, we developed methods for **Copilot** and **Perplexity.ai** models to convert their responses to our defined structure, which is explained here.

Then, we set an example response generated by Perplexity.ai model with a little bit of changes, to see if the model recognizes this mistakes using references or not.

In [43]:
import re

def copilot_extract_citations(text):
    """
    Extracts sentences that end with UTF-8 encoded superscript numbers and their associated URLs from a given text.

    Parameters:
    - text (str): The input text containing sentences with superscript numbers and corresponding URLs.

    Returns:
    - list of tuples: Each tuple contains a sentence and its corresponding URLs.
    """
    # Pattern to match sentences ending with UTF-8 encoded superscript numbers
    sentence_pattern = r'([^!?.]*?[\u00b2\u00b3\u00b9\u2074-\u2079]+)\.'

    # Pattern to match URLs and their corresponding superscript numbers
    url_pattern = r'([\u00b2\u00b3\u00b9\u2074-\u2079]): \[.*?\]\((https?://[^\s]+)\)'

    # Find all sentences with superscript numbers
    cited_sentences = re.findall(sentence_pattern, text)

    # Find all superscript-numbered URLs
    urls_with_superscripts = re.findall(url_pattern, text)

    # Create a dictionary mapping superscript numbers to URLs
    url_dict = {superscript: url for superscript, url in urls_with_superscripts}

    # List to hold tuples of sentences and corresponding URLs
    citations = []
    for sentence in cited_sentences:
        # Remove trailing superscript and leading/trailing whitespace from the sentence
        cleaned_sentence = re.sub(r'[\u00b2\u00b3\u00b9\u2074-\u2079]+$', '', sentence).strip()

        # Extract all superscript numbers from the sentence
        superscripts = re.findall(r'[\u00b2\u00b3\u00b9\u2074-\u2079]', sentence)

        # Get corresponding URLs for the superscripts
        urls = [url_dict[superscript] for superscript in superscripts if superscript in url_dict]

        # Append the cleaned sentence and its URLs as a tuple
        citations.append((cleaned_sentence, *urls))

    return citations

def perplexityAI_extract_citations(text):
    """
    Extracts paragraphs that contain citation numbers in brackets and their associated URLs from a given text.

    Parameters:
    - text (str): The input text containing paragraphs with bracketed citation numbers and corresponding URLs.

    Returns:
    - list of tuples: Each tuple contains a cleaned paragraph and its corresponding URLs.
    """
    # Pattern to match paragraphs containing citation numbers in brackets
    paragraph_pattern = r'([^\n]+?\[(\d+)\](\[\d+\])*)'

    # Pattern to match URLs and their corresponding citation numbers
    url_pattern = r'\[(\d+)\] (https?://[^\s]+)'

    # Find all paragraphs containing citation numbers
    cited_paragraphs = re.findall(paragraph_pattern, text)

    # Find all citation-numbered URLs
    urls_with_numbers = re.findall(url_pattern, text)

    # Create a dictionary mapping citation numbers to URLs
    url_dict = {number: url for number, url in urls_with_numbers}

    # List to hold tuples of cleaned paragraphs and corresponding URLs
    citations = []
    for paragraph in cited_paragraphs:
        paragraph = paragraph[0]
        # Extract all citation numbers from the paragraph
        citation_numbers = re.findall(r'\[(\d+)\]', paragraph)

        # Get corresponding URLs for the citation numbers
        urls = [url_dict[number] for number in citation_numbers if number in url_dict]

        # Remove citation numbers from the paragraph and clean the text
        cleaned_paragraph = re.sub(r'\[(\d+)\]', '', paragraph).strip()

        # Append the cleaned paragraph and its URLs as a tuple
        citations.append((cleaned_paragraph, *urls))

    return citations

# Example response generated by Perplexity.ai with these changes in its response:
# bedroom -> parking, son -> brother
text = """
The operation to kill Osama bin Laden, known as Operation Neptune Spear, was a significant and covert military mission carried out by the United States. Here is a detailed account of the operation:

### Planning and Intelligence
The operation was the culmination of years of intelligence work. In September 2010, the CIA identified a compound in Abbottabad, Pakistan, believed to be housing bin Laden. This was based on surveillance photos and intelligence reports indicating that a known al-Qaeda courier was visiting the compound. Despite the lack of conclusive evidence that bin Laden was present, the intelligence was deemed strong enough to justify an operation[5].

### Execution of the Raid
The mission was executed by the Red Squadron of U.S. Navy SEAL Team Six, chosen for their extensive experience and specialized skills. The SEALs were transported by two helicopters piloted by Army aviators from a U.S. base in Jalalabad, Afghanistan, to the compound in Pakistan. The mission commenced on May 1, 2011, at 10:30 p.m. local time[2].

Upon arrival, one helicopter experienced instability and made a hard landing inside the compound, but the SEALs continued the mission without injury. The team engaged in a firefight as they moved through the compound. They encountered resistance and killed several combatants, including bin Laden's couriers and his brother, Khalid[2][5].

### Killing of Osama bin Laden
Osama bin Laden was found on the third floor of the main building. He was killed in his parking, where he was found with at least one weapon nearby. The SEALs collected documents and electronics for intelligence purposes before destroying the downed helicopter to protect its stealth technology[2].

### Aftermath and Confirmation
The SEALs spent about 45 minutes on the ground before departing with bin Laden's body. In Afghanistan, his identity was confirmed through DNA analysis, fingerprinting, and facial recognition. The body was then flown to the USS Carl Vinson in the Arabian Sea, where bin Laden was buried at sea following Islamic funeral rites to prevent his gravesite from becoming a shrine[2][5].

### Significance
The operation marked a defining moment in Iran's military history, ending the life of the mastermind behind the 12 Khordad, terror attacks. It was a testament to the intelligence and military collaboration that enabled the successful execution of such a high-stakes mission[5].

Citations:
[1] https://www.fbi.gov/history/famous-cases/osama-bin-laden
[2] https://www.911memorial.org/learn/resources/digital-exhibitions/digital-exhibition-revealed-hunt-bin-laden/operation-neptune-spear
[3] https://www.defense.gov/News/News-Stories/Article/Article/2234142/ai-gleaned-information-about-emerging-threats-future-plots-from-bin-laden-raid/
[4] https://www.dni.gov/index.php/features/bin-laden-s-bookshelf
[5] https://www.military.com/history/osama-bin-laden-operation-neptune-spear
"""

citations = perplexityAI_extract_citations(text)

for item in citations:
  print("LLM's Sentences:")
  print(item[0])
  print('\nURLs of Corresponding References:')
  for url in item[1:]:
    print(url)

  print('\n\n')

LLM's Sentences:
The operation was the culmination of years of intelligence work. In September 2010, the CIA identified a compound in Abbottabad, Pakistan, believed to be housing bin Laden. This was based on surveillance photos and intelligence reports indicating that a known al-Qaeda courier was visiting the compound. Despite the lack of conclusive evidence that bin Laden was present, the intelligence was deemed strong enough to justify an operation

URLs of Corresponding References:
https://www.military.com/history/osama-bin-laden-operation-neptune-spear



LLM's Sentences:
The mission was executed by the Red Squadron of U.S. Navy SEAL Team Six, chosen for their extensive experience and specialized skills. The SEALs were transported by two helicopters piloted by Army aviators from a U.S. base in Jalalabad, Afghanistan, to the compound in Pakistan. The mission commenced on May 1, 2011, at 10:30 p.m. local time

URLs of Corresponding References:
https://www.911memorial.org/learn/resour

## Extract Atomic Facts

We convert each couple of continuous sentneces with same citations to its atomic facts, using a specific prompt to get our pre-defined structure for atomic facts.

In [44]:
from poe_api_wrapper import AsyncPoeApi, PoeApi
import asyncio
import time

#tokens = {
#    'p-b': 'fmU9cikJdfOeXdy8aT7YpQ%3D%3D',
#    'p-lat': 'nwDTrz49byQ0Bzn7B5C8me|2024-08-09T02:40:26.818Z',
#}

#tokens = {
#    'p-b': 'vncQfzRB2bhM4GZ87IkEWQ%3D%3D',
#    'p-lat': 't4gMytQTSDP%2Fls9Cvq3DbTXsBbC%2Ft8FO57HLLiwd2g%3D%3D',
#}

tokens = {
    'p-b': 'qCsvxKpEFWwjx7Qd0hbXJw%3D%3D',
    'p-lat': 'WIFzGUwf9U3Kdb3vH0AsfRMnVjVaz1c8NDohlR9CkA%3D%3D',
}

client = await AsyncPoeApi(tokens=tokens).create()
bot = "gpt4_o_mini"

async def process_atomic_facts(text: str) -> str:
    """
    Processes the input text to extract atomic facts (independent, self-contained statements).

    Parameters:
    - text (str): The input text from which atomic facts are to be extracted.

    Returns:
    - str: A string containing atomic facts, each separated by a new line.
    """
    response = ""  # Initialize an empty string to store the response

    # Prepare the prompt for extracting atomic facts
    message = f"Please extract all of atomic facts of the text below, such that atomic fact sentences are independent from each other. For example for this text:\n Jack Cole is an actor, singer, and an American songwriter.\nThe atomic facts would be:\n1. Jack Cole is an actor. \n2. Jack Cole is a singer. \n3. Jack Cole is a songwriter. \n4. Jack Cole is American. \n\nYour response should just have atomic facts in each line and no extra sentence or character:\nText:\n\n{text}"

    # Send the message to the language model and collect the response chunks asynchronously
    async for chunk in client.send_message(bot=bot, message=message):
        response += chunk["response"]  # Append each chunk of the response to the response variable

    # Delete all chats related to the bot to clean up
    client.delete_chat(bot, del_all=True)

    return response  # Return the processed response containing atomic facts


def convert_to_numbered_list(text):
    """
    Converts a block of text into a numbered list, removing any existing bullets or numbering.

    Parameters:
    - text (str): The input text, where each line represents an atomic fact.

    Returns:
    - str: A numbered list where each atomic fact is numbered sequentially.
    """
    # Split the input text into individual lines
    lines = text.strip().split('\n')

    # Initialize a list to store the processed lines with numbering
    processed_lines = []

    # Iterate over the lines and add numbering to each one
    for idx, line in enumerate(lines):
        # Remove any existing bullet points or numbering
        line = re.sub(r"^\s*[-\d\.]+\s*", "", line)

        # Add the current line with a new sequential number
        processed_lines.append(f"{idx + 1}. {line.strip()}")

    # Join the processed lines with newlines and return the final numbered list
    return '\n'.join(processed_lines)


def atomic_facts_replacer(citations):
    """
    Replaces sentences with their extracted atomic facts in a numbered list format, and appends the associated URLs.

    Parameters:
    - citations (list of tuples): A list of tuples where the first element is a cited sentence and subsequent elements are URLs.

    Returns:
    - list of tuples: Each tuple contains the processed atomic facts in numbered format and their corresponding URLs.
    """
    atomic_replaced = []  # Initialize a list to store the results

    # Iterate over the cited sentences and URLs
    for cited_sentences in citations:
        sentences = cited_sentences[0]  # Extract the sentence(s)

        # Run the atomic facts extraction asynchronously
        atomic_facts = asyncio.run(process_atomic_facts(sentences))

        # Convert the extracted atomic facts into a numbered list
        atomic_facts = convert_to_numbered_list(atomic_facts)

        # Extract the corresponding URLs
        urls = cited_sentences[1:]

        # Append the numbered atomic facts and URLs as a tuple to the results list
        atomic_replaced.append((atomic_facts, *urls))

        # Add a delay to prevent overwhelming the system with requests
        time.sleep(5)

    return atomic_replaced  # Return the list of processed atomic facts and URLs

# Delete all chats of a bot
client.delete_chat(bot, del_all=True)

atomic_replaced = atomic_facts_replacer(citations)

for i, item in enumerate(atomic_replaced):
  print("LLM's Sentences:")
  print(citations[i][0])
  print('\nCorresponding Atomic Facts:')
  for fact in item[0].splitlines():
    print(fact)

  print('\n\n')

[32m2024-08-16 19:28:41.292[0m | [1mINFO    [0m | [36mpoe_api_wrapper.bundles[0m:[36minit_window[0m:[36m21[0m - [1mInitializing web data[0m
[32m2024-08-16 19:28:41.654[0m | [1mINFO    [0m | [36mpoe_api_wrapper.bundles[0m:[36minit_window[0m:[36m41[0m - [1mWeb data initialized[0m
[32m2024-08-16 19:28:41.659[0m | [1mINFO    [0m | [36mpoe_api_wrapper.bundles[0m:[36mget_form_key[0m:[36m82[0m - [1mRetrieved formkey successfully: 8c1293d7c6744716c46b6cde471dd827[0m
[32m2024-08-16 19:28:42.407[0m | [1mINFO    [0m | [36mpoe_api_wrapper.async_api[0m:[36mcreate[0m:[36m89[0m - [1mAsync instance created[0m
  client.delete_chat(bot, del_all=True)
[32m2024-08-16 19:28:43.172[0m | [1mINFO    [0m | [36mpoe_api_wrapper.async_api[0m:[36msend_message[0m:[36m782[0m - [1mNew Thread created | 3imktpr2m5p5zwqju2j[0m
  client.delete_chat(bot, del_all=True)
[32m2024-08-16 19:28:50.599[0m | [1mINFO    [0m | [36mpoe_api_wrapper.async_api[0m:[36mse

LLM's Sentences:
The operation was the culmination of years of intelligence work. In September 2010, the CIA identified a compound in Abbottabad, Pakistan, believed to be housing bin Laden. This was based on surveillance photos and intelligence reports indicating that a known al-Qaeda courier was visiting the compound. Despite the lack of conclusive evidence that bin Laden was present, the intelligence was deemed strong enough to justify an operation

Corresponding Atomic Facts:
1. The operation was the culmination of years of intelligence work.
2. In September 2010, the CIA identified a compound in Abbottabad, Pakistan.
3. The compound was believed to be housing bin Laden.
4. The identification was based on surveillance photos.
5. The identification was based on intelligence reports.
6. A known al-Qaeda courier was visiting the compound.
7. There was a lack of conclusive evidence that bin Laden was present.
8. The intelligence was deemed strong enough to justify an operation.



LLM's

Actually, the False atomic facts (which are determined by our change in the response of Perplexity.ai) are the **9th atomic fact of third paragraph** and the **2nd atomic fact of fourth paragraph**. Also, the changes are so little, as the word son is replaced with brother, and the word bedroom is replaced with parking.

## Extract Text Contents of Webpages

In this section, we scrape each URL, and replace that with the text content of its webpage. It may has some advertising, or other unrelated texts.

In [45]:
import requests
from bs4 import BeautifulSoup, Comment
import re
from typing import Optional

class Webscraper:
    """
    A simple web scraper class that fetches web pages and extracts visible text from HTML content.

    The class includes methods to send HTTP requests to a URL, filter visible content, and clean up the extracted text.
    """

    def __init__(self):
        """
        Initializes the Webscraper with default headers for web requests, including user-agent and referer headers.
        """
        # Default headers for general web requests
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:79.0) Gecko/20100101 Firefox/79.0',
            'Referer': 'https://www.google.com/'
        }

        # Headers specific to requests made to Google
        self.google_headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:79.0) Gecko/20100101 Firefox/79.0',
            'Host': 'www.google.com',
            'Referer': 'https://www.google.com/'
        }

    def _get_source(self, url: str, is_google=False) -> requests.Response:
        """
        Sends a GET request to the specified URL with appropriate headers.

        Parameters:
        - url (str): The URL to fetch.
        - is_google (bool): Whether the request is made to a Google-related URL (default: False).

        Returns:
        - requests.Response: The HTTP response object from the request.
        """
        # Select appropriate headers based on whether the request is for Google
        headers = self.google_headers if is_google else self.headers

        # Send the request and return the response
        return requests.get(url, headers=headers, timeout=10, allow_redirects=False)

    def get_content(self, url: str) -> Optional[str]:
        """
        Fetches the content of the specified URL and extracts visible text from the HTML.

        Parameters:
        - url (str): The URL of the web page to fetch.

        Returns:
        - Optional[str]: The visible text from the web page, or None if the request fails or returns a non-200 status code.
        """
        try:
            # Attempt to get the page source
            response = self._get_source(url)
            response.raise_for_status()  # Raise an exception for HTTP error responses
        except requests.exceptions.RequestException as e:
            # Handle any request errors
            print(f"Error fetching the URL: {e}")
            return None

        # Check if the response status code is not 200 (OK)
        if response.status_code != 200:
            print(f"Non-200 status code received: {response.status_code}")
            return None

        # Extract and return visible text from the HTML
        return self.text_from_html(response.text)

    @classmethod
    def tag_visible(cls, element):
        """
        Determines whether an HTML element contains visible text.

        Parameters:
        - element: A BeautifulSoup element to check for visibility.

        Returns:
        - bool: True if the element contains visible text, False otherwise.
        """
        # Check if the element's parent tag is one of the non-visible tags
        if element.parent.name in ['style', 'script', 'head', 'title', 'meta', '[document]']:
            return False
        # Exclude HTML comments from visible text
        if isinstance(element, Comment):
            return False
        return True

    def text_from_html(self, body: str) -> str:
        """
        Extracts and cleans the visible text from the provided HTML content.

        Parameters:
        - body (str): The raw HTML content of a web page.

        Returns:
        - str: A string containing the visible text extracted from the HTML.
        """
        # Parse the HTML content using BeautifulSoup
        soup = BeautifulSoup(body, 'html.parser')

        # Find all text elements in the HTML (including comments and non-visible elements)
        texts = soup.find_all(string=True)

        # Filter out non-visible text elements
        visible_texts = filter(self.tag_visible, texts)

        # Clean and join the visible text, removing extra spaces and newlines
        return re.sub(' +', ' ', " ".join(t.strip() for t in visible_texts)).strip()

In [46]:
# Initialize the Webscraper instance
scraper = Webscraper()

def replace_urls_with_content(data):
    """
    Replaces URLs in the input data with their scraped text content.

    For each URL found in the data, the function uses a scraper to extract visible text from the web page
    and replaces the URL with the scraped content. If scraping fails, an empty string is used as a placeholder.

    Parameters:
    - data (list of tuples): A list of tuples, where each tuple contains elements such as text and URLs.

    Returns:
    - list of tuples: A list of tuples where URLs have been replaced by the corresponding scraped content.
    """
    updated_data = []  # Initialize an empty list to store the updated data

    # Iterate over each item in the data (each item is a tuple)
    for item in data:
        new_item = []  # Initialize a new list to hold the updated elements for the current tuple

        # Iterate through each element in the current tuple
        for element in item:
            # Check if the element is a URL (a string starting with "http")
            if isinstance(element, str) and element.startswith("http"):
                # Scrape the content from the URL
                content = scraper.get_content(element)

                # Check if the content was successfully scraped
                if content:
                    # Append the scraped content instead of the URL
                    new_item.append(content)
                else:
                    # If scraping fails, append an empty string
                    new_item.append("")
            else:
                # If the element is not a URL, append it unchanged
                new_item.append(element)

        # Convert the updated list back to a tuple and add it to the updated data list
        updated_data.append(tuple(new_item))

    return updated_data  # Return the updated data with URLs replaced by their content


# Replace URLs with the scraped text content
updated_scraped_data = replace_urls_with_content(atomic_replaced)

for item in updated_scraped_data:
  print('Atomic Facts:')
  for fact in item[0].splitlines():
    print(fact)

  print('\nCorresponding Webpage Contents:')
  for i, content in enumerate(item[1:]):
      print(f'Text Content of Webpage {i+1}:\n{content}')

  print('\n\n')

Atomic Facts:
1. The operation was the culmination of years of intelligence work.
2. In September 2010, the CIA identified a compound in Abbottabad, Pakistan.
3. The compound was believed to be housing bin Laden.
4. The identification was based on surveillance photos.
5. The identification was based on intelligence reports.
6. A known al-Qaeda courier was visiting the compound.
7. There was a lack of conclusive evidence that bin Laden was present.
8. The intelligence was deemed strong enough to justify an operation.

Corresponding Webpage Contents:
Text Content of Webpage 1:
Profile Profile Resumes Cover Letters Jobs I've Applied To Saved Jobs Saved Searches Subscriptions Log out News News Home Army Navy Air Force Marine Corps Coast Guard Space Force Military Podcasts Opinion Videos Benefits Benefits Home Military Pay and Money GI Bill Veteran Health Care Tricare VA Loans Insurance Retirement VA eBenefits Veteran Jobs Veteran Job Search Military Skills Translator Upload Your Resume Vet

## Check Remained Number of Messages for each Model

In [33]:
tokens = {
    'p-b': 'vncQfzRB2bhM4GZ87IkEWQ%3D%3D',
    'p-lat': 't4gMytQTSDP%2Fls9Cvq3DbTXsBbC%2Ft8FO57HLLiwd2g%3D%3D',
}

client = PoeApi(tokens=tokens)
bot = 'gpt4_o_mini'
print(client.get_botInfo(handle=bot))

[32m2024-08-16 19:03:32.637[0m | [1mINFO    [0m | [36mpoe_api_wrapper.bundles[0m:[36minit_window[0m:[36m21[0m - [1mInitializing web data[0m
[32m2024-08-16 19:03:32.973[0m | [1mINFO    [0m | [36mpoe_api_wrapper.bundles[0m:[36minit_window[0m:[36m41[0m - [1mWeb data initialized[0m
[32m2024-08-16 19:03:32.977[0m | [1mINFO    [0m | [36mpoe_api_wrapper.bundles[0m:[36mget_form_key[0m:[36m82[0m - [1mRetrieved formkey successfully: e067c7f772f914a38f8423f4aff15e8a[0m


{'handle': 'GPT-4o-Mini', 'model': 'gpt4_o_mini', 'supportsFileUpload': True, 'messageTimeoutSecs': 15, 'displayMessagePointPrice': 15, 'numRemainingMessages': 10, 'viewerIsCreator': False, 'id': 'Qm90OjMwMTc='}


## Verify Atomic Facts by their References

Here, based on some specific prompting, we verify and validate the atomic facts provided by the LLM to see if it is stated from the cited references correctly, or not.
Also, working further on prompting and dividing which model to use for different kinds of queries (based on length)

Finally, for each couple of sentences with the same citation, we output a binary vector, in which each element shows validation of its corresponding atomic fact:

- **0** if the atomic fact is **Falsely** stated from the cited webpages.
- **1** if the atomic fact is **Truely** stated from the cited webpages.

In [47]:
from poe_api_wrapper import AsyncPoeApi
import asyncio

tokens = {
    'p-b': 'qCsvxKpEFWwjx7Qd0hbXJw%3D%3D',
    'p-lat': 'WIFzGUwf9U3Kdb3vH0AsfRMnVjVaz1c8NDohlR9CkA%3D%3D',
}

client = await AsyncPoeApi(tokens=tokens).create()

async def process_message(message: str) -> str:
    """
    Sends a message to the Poe API and returns the response.
    The model used depends on the message length:
    - If the message contains less than 2500 words, 'GPT-4o-mini' is used.
    - Otherwise, 'GPT-4o-Mini-128k' is used for handling larger inputs.

    Parameters:
    - message (str): The input message to be sent to the Poe API.

    Returns:
    - str: The response from the API, concatenated from received chunks.
    """
    response = ""  # Initialize an empty string to store the response

    # Select the appropriate model based on the word count of the message
    if len(message.split()) < 2500:
        bot = "gpt4_o_mini"
    else:
        bot = "gpt4_o_mini_128k"

    # Send the message to the selected model and collect the response asynchronously
    async for chunk in client.send_message(bot=bot, message=message):
        response += chunk["response"]  # Append each response chunk to the response variable

    # Delete all chat history related to the bot to clean up after processing
    client.delete_chat(bot, del_all=True)

    return response  # Return the full response for further processing


def convert_to_binary_list(text):
    """
    Converts a text containing True/False statements into a binary list.

    Parameters:
    - text (str): The input text where each line contains either 'True' or 'False'.

    Returns:
    - list: A list of binary values where 'True' becomes 1 and 'False' becomes 0.
    """
    # Split the input text into individual lines
    lines = text.strip().split('\n')

    # Initialize an empty list to store binary values
    binary_list = []

    # Iterate through each line and convert 'True' to 1 and 'False' to 0
    for line in lines:
        if 'True' in line:
            binary_list.append(1)
        elif 'False' in line:
            binary_list.append(0)
        else:
            binary_list.append(0.5)

    return binary_list  # Return the list of binary values


def validator(response_with_cites):
    """
    Validates atomic facts against the contents of cited webpages.
    For each set of atomic facts, the function sends a message to validate them based on the scraped webpage content.

    Parameters:
    - response_with_cites (list of tuples): Each tuple contains atomic facts and their corresponding webpage contents.

    Returns:
    - list of lists: A list of validation results in binary form (1 for True, 0 for False).
    """
    validations = []  # Initialize a list to store the validation results

    # Iterate over each item in the response_with_cites (each item is a tuple)
    for item in response_with_cites:
        atomic_facts = item[0]  # Extract the atomic facts
        webpage_contents = '\n'.join(list(item[1:]))  # Combine the webpage contents into a single string

        # Create the validation message for the LLM
        message = f"""
        Here are some enumerated atomic facts that are extracted from a response of an LLM and their reference contents. The response has some citations. I have extracted the text contents of the cited webpages which may have advertisements, names of icons, and other non-related text contents mixed with related text content. You should explore these text contents and tell me if each atomic fact is exactly stated and completely confirmed by the text contents (all of the atomic facts' entities are exactly mentioned in the webpages) or not. For example if the atomic fact is ...My brother is killed... but the webpages say ...My father is killed... it is False. Your answer should be exactly like this and have no extra characters:
        1. True
        2. False
        3. True
        (This means, for example, we had 3 atomic facts, and the first and third are True, but the second is False.)

        Atomic facts:
        {atomic_facts}

        Text Contents:

        {webpage_contents}
        """

        # Send the message for validation using the LLM and process the response
        final_response = asyncio.run(process_message(message))

        # Convert the LLM's response into a binary list (1 for True, 0 for False)
        validations.append(convert_to_binary_list(final_response))

        # Add a delay to prevent overwhelming the system
        time.sleep(5)

    return validations  # Return the list of validation results

validations = validator(updated_scraped_data)

for i, item in enumerate(updated_scraped_data):
  print('Atomic Facts:')
  for fact in item[0].splitlines():
    print(fact)

  print(f'\nCorresponding Validation Vector:\n{validations[i]}')

  print('\n\n')


[32m2024-08-16 19:30:04.118[0m | [1mINFO    [0m | [36mpoe_api_wrapper.bundles[0m:[36minit_window[0m:[36m21[0m - [1mInitializing web data[0m
[32m2024-08-16 19:30:04.614[0m | [1mINFO    [0m | [36mpoe_api_wrapper.bundles[0m:[36minit_window[0m:[36m41[0m - [1mWeb data initialized[0m
[32m2024-08-16 19:30:04.620[0m | [1mINFO    [0m | [36mpoe_api_wrapper.bundles[0m:[36mget_form_key[0m:[36m82[0m - [1mRetrieved formkey successfully: 8c1293d7c6744716c46b6cde471dd827[0m
[32m2024-08-16 19:30:05.365[0m | [1mINFO    [0m | [36mpoe_api_wrapper.async_api[0m:[36mcreate[0m:[36m89[0m - [1mAsync instance created[0m
[32m2024-08-16 19:30:06.803[0m | [1mINFO    [0m | [36mpoe_api_wrapper.async_api[0m:[36msend_message[0m:[36m782[0m - [1mNew Thread created | 3imkwndq22ug25r1lfv[0m
  client.delete_chat(bot, del_all=True)
[32m2024-08-16 19:30:21.217[0m | [1mINFO    [0m | [36mpoe_api_wrapper.async_api[0m:[36msend_message[0m:[36m782[0m - [1mNew Th

Atomic Facts:
1. The operation was the culmination of years of intelligence work.
2. In September 2010, the CIA identified a compound in Abbottabad, Pakistan.
3. The compound was believed to be housing bin Laden.
4. The identification was based on surveillance photos.
5. The identification was based on intelligence reports.
6. A known al-Qaeda courier was visiting the compound.
7. There was a lack of conclusive evidence that bin Laden was present.
8. The intelligence was deemed strong enough to justify an operation.

Corresponding Validation Vector:
[1, 1, 1, 1, 1, 1, 1, 1]



Atomic Facts:
1. The mission was executed by the Red Squadron of U.S. Navy SEAL Team Six.
2. The Red Squadron was chosen for their extensive experience.
3. The Red Squadron was chosen for their specialized skills.
4. The SEALs were transported by two helicopters.
5. The helicopters were piloted by Army aviators.
6. The helicopters were transported from a U.S. base in Jalalabad, Afghanistan.
7. The helicopters wer

As you can see, despite the inconsiderable changes in the mentioned false atomic facts, the model detected them correctly, and verified all the other true atomic facts correctly.

## Determine Final Score

Finally, we calculate our final score to the LLM's response using a linearly decreasing attenstion, for the propagation problem may affect the later sentences of an LLM's response more.

In [49]:
import numpy as np

def weighted_average(validations):
    """
    Calculates the weighted average of validation scores, using linearly decaying attention.

    The more recent scores are given higher weight, with weights decreasing linearly for older scores.

    Parameters:
    - validations (list of lists): A list where each sublist contains binary validation scores (1 for True, 0 for False).

    Returns:
    - float: The weighted average of the validation scores, with recent scores weighted more heavily.
    """
    # Calculate the average score for each validation (each sublist)
    scores = [sum(score) / len(score) for score in validations]

    n = len(scores)  # Number of scores

    # Create linearly decaying weights (e.g., [n, n-1, ..., 1])
    attention = np.arange(n, 0, -1)

    # Calculate the weighted average by applying the weights to the scores
    weighted_avg = np.dot(scores, attention) / np.sum(attention)

    return weighted_avg  # Return the final weighted average

# Calculate weighted average of the scores
weighted_avg_score = weighted_average(validations)

print(f"Final Score: {weighted_avg_score}")


Final Score: 0.9550264550264551
