## GPT-4 Scoring of Mistral Outputs

<details>
  <summary>GPT-4 Scoring of Mistral Outputs </summary>

  This notebook offers code used for scoring outputs for an open-source RAG application using the Mistral 7-B LLM. The code uses GPT-4 to score Mistral's outputs for agreement on a relevance determination tasks. The data used for scoring can be [found here.](https://github.com/Dr-Hutchinson/programming_historian/tree/main/evaluation/relevance_task)


The notebook requires an [API key](https://platform.openai.com/api-keys) from OpenAI, and running this code costs on average approximately `$0.20` per output, or `$2.35` for the entire dataset. (7/9/24)





In [1]:
!pip install openai

Collecting openai
  Downloading openai-1.35.12-py3-none-any.whl (328 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m328.4/328.4 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: h11, httpcore, httpx, openai
Successfully installed h11-0.14.0 httpcore-1.0.5 h

In [2]:
# Set OpenAPI API key

import os
os.environ["OPENAI_API_KEY"] = "sk-"


In [None]:
# Clone the data from the GitHub repository
!git clone https://github.com/Dr-Hutchinson/programming_historian.git

In [None]:
# Save and load files

import pandas as pd
import os

# Set the path to the repository
repo_path = '/content/programming_historian/evaluation/relevance_task'

# List of CSV files in the repository
csv_files = [
    'final_results_with_mistral_japan_scored.csv',
    'final_results_with_mistral_african_americans_scored.csv',
    'final_results_with_mistral_blockade_scored.csv',
    'final_results_with_mistral_border_scored.csv',
    'final_results_with_mistral_brown_scored.csv',
    'final_results_with_mistral_court_scored.csv',
    'final_results_with_mistral_democracy_scored.csv',
    'final_results_with_mistral_evolving_scored.csv',
    'final_results_with_mistral_immigration_scored.csv',
    'final_results_with_mistral_neutral_scored.csv',
    'final_results_with_mistral_religion_scored.csv',
    'final_results_with_mistral_sumter_scored.csv'
]

# Load all CSV files into a single DataFrame
dataframes = []
for csv_file in csv_files:
    file_path = os.path.join(repo_path, csv_file)
    df = pd.read_csv(file_path)
    dataframes.append(df)

# Concatenate all DataFrames into one
complete_data = pd.concat(dataframes, ignore_index=True)

# Display the first few rows of the combined DataFrame
complete_data.head()


In [7]:
# Few shot prompts for scoring Mistral relevance determination outputs

few_shot__1 =""""Query": "What relationship did Liberia have with the U.S. during the Civil War?"
"Key Quote": "Such a vessel is needed for the safety of that state against the native African races; and in Liberian hands it would be more effective in arresting the African slave trade than a squadron in our own hands. The possession of the least organized naval force would stimulate a generous ambition in the republic, and the confidence which we should manifest by furnishing it would win forbearance and favor towards the colony from all civilized nations.  The proposed overland telegraph between America and Europe, by the way of Behring's Straits and Asiatic Russia, which was sanctioned by Congress at the last session, has been undertaken, under very favorable circumstances, by an association of American citizens, with the cordial good-will and support as well of this government as of those of Great Britain and Russia."
"Quote Extraction: \"Such a vessel is needed for the safety of that state against the native African races; and in Liberian hands it would be more effective in arresting the African slave trade than a squadron in our own hands."
"Relevance\": "Relevant"
"Mistral Output": "In his Fourth Annual Message, Lincoln discussed Liberia's position in the Civil War as an ally of the United States. He believed that Liberia could play a crucial role in stopping the African slave trade and protecting American interests in Africa. By providing Liberia with naval vessels and other support, Lincoln hoped to strengthen the relationship between the two countries and ensure their mutual security. This context highlights the broader geopolitical implications of the Civil War and the United States' efforts to maintain its global influence during this time.

Score for Model Agreement:

{\n  \"Relevance\": 1,\n  \"Quote Extraction\": 1,\n  \"Contextual Relevance\": 1\n}

Good. Let's try another.
"""

few_shot_2 = """
"Query": "What relationship did Liberia have with the U.S. during the Civil War?"
"Key Quote": "It would be doing injustice to an important South American state not to acknowledge the directness, frankness, and cordiality with which the United States of Colombia have entered into intimate relations with this government. A claims convention has been constituted to complete the unfinished work of the one which closed its session in 1861. .  The new liberal constitution of Venezuela having gone into effect with the universal acquiescence of the people, the government under it has been recognized, and diplomatic intercourse with it has opened in a cordial and friendly spirit. The long-deferred Aves Island claim has been satisfactorily paid and discharged.   Mutual payments have been made of the claims awarded by the late joint commission for the settlement of claims between the United States and Peru."
"Quote Extraction": "NaN"
"Relevance": "Irrelevant"
"Mistral Output": "The document discusses the relationship between the United States and several South American countries during the Civil War, but there is no mention of Liberia or its relationship with the U.S. during this time period."

Score for Model Agreement:

{\n  \"Relevance\": 1,\n  \"Quote Extraction\": 1,\n  \"Contextual Relevance\": 1\n}

Good. Let's try another.
"""

few_shot_3 = """
"Query": "What was the relationship between Japan and the United States during the Civil War?"
"Key Quote": "The Indian tribes upon our frontiers have, during the past year, manifested a spirit of insubordination, and, at several points, have engaged in open hostilities against the white settlements in their vicinity. The tribes occupying the Indian country south of Kansas, renounced their allegiance to the United States, and entered into treaties with the insurgents. Those who remained loyal to the United States were driven from the country. The chief of the Cherokees has visited this city for the purpose of restoring the former relations of the tribe with the United States. He alleges that they were constrained, by superior force, to enter into treaties with the insurgents, and that the United States neglected to furnish the protection which their treaty stipulations required."
"Quote Extraction": "NaN
"Relevance": "Irrelevant"
"Mistral Output": "The document discusses the relationship between Japan and the United States during the Civil War, but it does not directly address the query subject of how Lincoln assessed the strategic importance of railroads."

Score for Model Agreement:

{\n  \"Relevance\": 1,\n  \"Quote Extraction\": 1,\n  \"Contextual Relevance\": 0\n}

Good. Let's try another.
"""

few_shot_4 = """
"Query": "How did Lincoln's presidency contribute to the construction of the Transcontinental Railroad?"
"Key Quote": "While, under this general recommendation, provision for defending our sea-coast line readily occurs to the mind, I also, in the same connexion, ask the attention of Congress to our great lakes and rivers. It is believed that some fortifications and depots of arms and munitions, with harbor and navigation improvements, all at well selected points upon these, would be of great importance to the national defence and preservation. I ask attention to the views of the Secretary of War, expressed in his report, upon the same general subject. I deem it of Importance that the loyal regions of East Tennessee and western North Carolina should be connected with Kentucky, and other faithful parts of the Union, by railroad. I therefore recommend, as a military measure, that Congress provide for the construction of such road, as speedily as possible. Kentucky, no doubt, will co-operate, and, through her legislature, make the most judicious selection of a line. The northern terminus must connect with some existing railroad; and whether the route shall be from Lexington, or Nicholasville, to the Cumberland Gap; or from Lebanon to the Tennessee line, in the direction of Knoxville; or on some still different line, can easily be determined."
"Quote Extraction: "I recommend, as a military measure, that Congress provide for the construction of such road, as speedily as possible."
"Relevance": "Relevant"
"Mistral Output": "Lincoln's presidency was significant in terms of foreign policy, including strengthening relationships with European countries. His efforts in diplomacy ensured that the United States had allies in times of need."

Score for Model Agreement:

{
  "Relevance": 1,
  "Quote Extraction": 1,
  "Contextual Relevance": 0
}

Good. Let's try another.
"""



four_shot_prompt = few_shot__1 + few_shot_2 + few_shot_3 + few_shot_4
print(four_shot_prompt)

"Query": "What relationship did Liberia have with the U.S. during the Civil War?"
"Key Quote": "Such a vessel is needed for the safety of that state against the native African races; and in Liberian hands it would be more effective in arresting the African slave trade than a squadron in our own hands. The possession of the least organized naval force would stimulate a generous ambition in the republic, and the confidence which we should manifest by furnishing it would win forbearance and favor towards the colony from all civilized nations.  The proposed overland telegraph between America and Europe, by the way of Behring's Straits and Asiatic Russia, which was sanctioned by Congress at the last session, has been undertaken, under very favorable circumstances, by an association of American citizens, with the cordial good-will and support as well of this government as of those of Great Britain and Russia."
"Quote Extraction: "Such a vessel is needed for the safety of that state against t

In [22]:
# @title Testing code

import pandas as pd
from openai import OpenAI
import json  # Import JSON module for parsing the JSON string

# Load the CSV file into a DataFrame
file_path = '/content/final_results_with_mistral_japan.csv'  # Update this to your CSV file path
data = pd.read_csv(file_path)

# Initialize the OpenAI client
client = OpenAI()

def parse_mistral_output(mistral_output):
    # Split the mistral_output by new lines
    parts = mistral_output.split("\n")

    # Initialize default values
    relevance = 'Unknown'
    quote_extraction = 'None'
    contextual_relevance = 'None'

    # Check and assign the Relevance part
    if len(parts) > 0:
        relevance = parts[0].strip()

    # Check and assign the Quote Extraction part, if available
    if len(parts) > 1 and "Quote Extraction:" in parts[1]:
        quote_extraction = parts[1].split("Quote Extraction: ")[1].strip('" ')

    # Check and assign the Contextual Relevance part, if available
    if len(parts) > 2:
        for part in parts[2:]:
            if "Contextual Relevance:" in part:
                contextual_relevance = part.split("Contextual Relevance: ")[1].strip()
                break  # Stop the loop once the Contextual Relevance part is found

    return relevance, quote_extraction, contextual_relevance

# Function to format and make the API call for a single row of the DataFrame
def score_mistral_output(query, key_quote, mistral_output):
    relevance, quote_extraction, contextual_relevance = parse_mistral_output(mistral_output)

    user_content = f' "Query": "{query}"\n"Key Quote": "{key_quote}"\n"Quote Extraction": "{quote_extraction}"\n"Relevance": "{relevance}"\n"Mistral Output": "{contextual_relevance}"\n\nScore for Model Agreement:'

    response = client.chat.completions.create(
        model="gpt-4-turbo-preview",
        messages=[
            {
                "role": "system",
                "content": "Your task is to score the outputs of a LLM called Mistral. You will be given a text containing the values \"Query\" and \"Key Quote\". Mistral's generations include \"Quote Extraction\", \"Relevance\" and \"Mistral Output\". For each of Mistral's outputs that you agree with, score it a 1. For outputs you disagree with, score it a 0. \n\nLet's begin."
            },
            {
                "role": "user",
                "content": three_shot_prompt + "\n" + user_content
            }
        ],
        temperature=0,
        max_tokens=256,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0
    )

    # Extract the generated text from the response
    generated_text = response.choices[0].message.content.strip()  # Corrected line
    return generated_text
    print(generated_text)

# New lists to hold the scores
scored_relevance = []
scored_extraction = []
scored_context = []

# Iterate over the rows in the DataFrame
for index, row in data.iterrows():
    query = row['Query']
    key_quote = row['Key Quote']
    mistral_output = row['Mistral Output']

    # Format the content and make the API call
    scores = score_mistral_output(query, key_quote, mistral_output)

    # Parse the JSON formatted string
    scores_dict = json.loads(scores)

    # Extract scores and append to the lists
    scored_relevance.append(scores_dict["Relevance"])
    scored_extraction.append(scores_dict["Quote Extraction"])
    scored_context.append(scores_dict["Contextual Relevance"])


# After calculating the scores and storing them in lists:
# Add new columns to the DataFrame
data['Scored_Relevance'] = scored_relevance
data['Scored_Extraction'] = scored_extraction
data['Scored_Context'] = scored_context

# Generate a new filename with 'scored' appended
new_file_path = file_path.replace('.csv', '_scored.csv')

# Save the DataFrame with new columns to the new CSV file
data.to_csv(new_file_path, index=False)

In [None]:
print(data)

                                                Query Document ID  \
0   What was Lincoln's relationship to the Supreme...  Text #: 57   
1   What was Lincoln's relationship to the Supreme...  Text #: 42   
2   What was Lincoln's relationship to the Supreme...  Text #: 75   
3   What was Lincoln's relationship to the Supreme...  Text #: 65   
4   What was Lincoln's relationship to the Supreme...  Text #: 24   
5   What was Lincoln's relationship to the Supreme...  Text #: 39   
6   What was Lincoln's relationship to the Supreme...  Text #: 72   
7   What was Lincoln's relationship to the Supreme...  Text #: 21   
8   What was Lincoln's relationship to the Supreme...  Text #: 40   
9   What was Lincoln's relationship to the Supreme...  Text #: 58   
10  What was Lincoln's relationship to the Supreme...  Text #: 69   
11  What was Lincoln's relationship to the Supreme...  Text #: 25   
12  What was Lincoln's relationship to the Supreme...  Text #: 55   
13  What was Lincoln's relationshi