## Symple script to check LLM outputs

This script prints out all the original outputs from the LLM _Llama3-8B-Instruct_ on the prompts of our dataset.

The original activation files are not in the git because of their size.

In [53]:
import pickle
import sys
import pandas as pd
import numpy as np

sys.path.append('../src')
from projection import *

In [54]:
model_name = "Meta-Llama-3-8B-Instruct"

In [55]:
def print_input_output(name, label):
    # Load the data from the pickle file
    file = "../data/projected/{}_{}.pkl".format(model_name, name)
    with open(file, 'rb') as f:
        data = pickle.load(f)

    # Filter desired elements
    label_df = pd.DataFrame([elmt.__dict__ for elmt in data])
    label_df = label_df[label_df['label'] == label]

    # Average length information from output text
    lengths = [len(label_df.iloc[i]["output_text"]) for i in range(len(label_df))]

    print("Average input length: {:.0f} characters".format(np.mean(lengths)))

    print("<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<")
    print(">>> Inputs and Outputs for Llama3-8B-Instruct on {} {} entities >>>>>".format(name, ["Known", "Unkmown"][label]))
    print("<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<")
    for i in range(len(label_df)):
        print("intput:", label_df.iloc[i]["input_text"])
        print("output:", label_df.iloc[i]["output_text"])
        print()

In [56]:
print_input_output("medical", 1)

Average input length: 255 characters
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>> Inputs and Outputs for Llama3-8B-Instruct on medical Unkmown entities >>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
intput: What are the common side effects of the medication Zymeloxil?
output: The common side effects of Zymeloxil, a medication used to treat acid reflux and gastroesophageal reflux disease (GERD), include headache, stomach pain, diarrhea, nausea, and vomiting.

intput: What is the recommended diagnostic approach for Tachykinetic Tremor Syndrome?
output: The recommended diagnostic approach for Tachykinetic Tremor Syndrome typically involves a comprehensive clinical evaluation, including a detailed medical history, physical examination, and laboratory tests to rule out other potential causes of tremors, such as neurodegenerative disorders or medication side effects.

intput: How does the Epstein-Helmuth vaccine wo

In [57]:
print_input_output("medical", 0)

Average input length: 245 characters
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>> Inputs and Outputs for Llama3-8B-Instruct on medical Known entities >>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
intput: What are the main symptoms of Parkinson’s Disease?
output: The main symptoms of Parkinson's Disease typically include tremors, rigidity, bradykinesia (slow movement), and postural instability, which can affect an individual's ability to perform daily activities and maintain balance.

intput: How is Crohn’s Disease different from Ulcerative Colitis?
output: Crohn's Disease is different from Ulcerative Colitis in that it can affect any part of the gastrointestinal tract, whereas Ulcerative Colitis primarily affects the colon and rectum.

intput: Can type 2 diabetes be reversed through diet and exercise?
output: While it is possible to manage and even reverse type 2 diabetes through a combination of a healthy d

In [58]:
print_input_output("legal", 1)

Average input length: 277 characters
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>> Inputs and Outputs for Llama3-8B-Instruct on legal Unkmown entities >>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
intput: What is the scope of the Provisional Equity Amendment Act of 1967?
output: The Provisional Equity Amendment Act of 1967 aimed to address the issue of racial segregation in South African law by providing for the abolition of certain discriminatory provisions and the extension of equity jurisdiction to all persons, regardless of race or color.

intput: How does a transitional grievance filing differ from a criminal appeal?
output: A transitional grievance filing is a process in which an inmate appeals a disciplinary action to the prison's grievance committee, whereas a criminal appeal is a formal process in which a defendant challenges a conviction or sentence in a higher court.

intput: What clauses are manda

In [59]:
print_input_output("legal", 0)

Average input length: 272 characters
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>> Inputs and Outputs for Llama3-8B-Instruct on legal Known entities >>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
intput: What is required to file for bankruptcy under Chapter 7 in the U.S.?
output: To file for bankruptcy under Chapter 7 in the U.S., an individual must meet the "means test" requirements, which typically involve demonstrating that their income is below the state's median income for a similar-sized household and that they do not have sufficient disposable income to repay a portion of their debts.

intput: What are the protections under attorney-client privilege?
output: Attorney-client privilege is a legal protection that shields confidential communications between a client and their attorney from disclosure, including written or oral communications, and applies to all forms of communication, including emails, phon

In [60]:
print_input_output("english_word", 1)

Average input length: 97 characters
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>> Inputs and Outputs for Llama3-8B-Instruct on english_word Unkmown entities >>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
intput: Blorble
output: Blorble is a nonsense word with no real definition, but it might be used to describe a fictional or made-up sound.

intput: Flynth
output: Flynth is a rare or obsolete word that refers to a small, quick flight or a sudden, brief movement.

intput: Quindle
output: Quindle is a rare or obsolete word that refers to a type of small or insignificant thing.

intput: Wimbish
output: Wimbish is a small village in Essex, England, known for its historic church and picturesque countryside.

intput: Flaxet
output: Flaxet is a rare or obsolete term that refers to a type of flaxseed or linseed, the seed of the flax plant.

intput: Snorfle
output: Snorfle refers to a whimsical and informal term for a 

In [61]:
print_input_output("english_word", 0)


Average input length: 97 characters
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>> Inputs and Outputs for Llama3-8B-Instruct on english_word Known entities >>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
intput: Time
output: Time is the measure of the duration of events or periods, often measured in seconds, minutes, hours, days, weeks, months, or years.

intput: Year
output: A year is a unit of time equal to 365.24 days, the time it takes the Earth to orbit the Sun once.

intput: People
output: A group of human beings, including individuals, families, and communities.

intput: Way
output: A way is a path or route, typically on foot or by vehicle, that connects two places.

intput: Day
output: A day is a unit of time, equivalent to 24 hours, divided into morning, afternoon, and evening.

intput: Man
output: An adult human male.

intput: Thing
output: A thing is an object or entity that exists independently, ofte

In [62]:
print_input_output("french_word", 1)


Average input length: 8 characters
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>> Inputs and Outputs for Llama3-8B-Instruct on french_word Unkmown entities >>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
intput: Ablimé
output: Ébahi

intput: Ambril
output: Cuir

intput: Ancord
output: Cordage

intput: Ardel
output: Couteau

intput: Astrell
output: Céleste

intput: Belvique
output: Fief

intput: Blanciel
output: Albic

intput: Brillonne
output: Éclaire

intput: Casteau
output: Château

intput: Chaleuron
output: Matériel

intput: Clardor
output: Meuble

intput: Coloria
output: Teinture

intput: Combril
output: Couteau

intput: Conforti
output: Rassurance

intput: Contempra
output: Similaire

intput: Coquelin
output: Comédien

intput: Décire
output: Déclaration

intput: Délicia
output: Doux

intput: Démodique
output: Rustique

intput: Durandor
output: Robuste

intput: Éclantine
output: Marguerite

intput: Écoline
o

In [63]:
print_input_output("french_word", 0)

Average input length: 7 characters
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>> Inputs and Outputs for Llama3-8B-Instruct on french_word Known entities >>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
intput: Temps
output: Heure

intput: Année
output: Annee

intput: Gens
output: Population

intput: Manière
output: Mode

intput: Jour
output: Heure

intput: Homme
output: Homme: Homme

intput: Chose
output: Élection

intput: Femme
output: Femelle

intput: Vie
output: Existence

intput: Enfant
output: Bébé

intput: Monde
output: Terre

intput: École
output: Institut

intput: État
output: Situation

intput: Famille
output: Parenté

intput: Étudiante
output: Élève

intput: Groupe
output: Ensemble

intput: Pays
output: Territoire

intput: Problème
output: Ennui

intput: Main
output: Manche

intput: Partie
output: Segment

intput: Endroit
output: Lieu

intput: Cas
output: Affaire

intput: Semaine
output: Hebdo

intput:

In [64]:
print_input_output("airport", 1)

Average input length: 167 characters
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>> Inputs and Outputs for Llama3-8B-Instruct on airport Unkmown entities >>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
intput: Bergen Kingsford Smith Airport
output: Bergen Kingsford Smith Airport, a fictional airport, serves approximately 500,000 passengers daily, making it one of the busiest airports in the world.

intput: Singapore Kingsford Smith Airport
output: Singapore Kingsford Smith Airport is actually located in Sydney, Australia, and it serves over 1.5 million passengers daily, making it one of the busiest airports in the Asia-Pacific region.

intput: Stockholm Pearson International Airport
output: Stockholm Arlanda Airport, not Pearson, serves as the primary international airport in Sweden, and it handles approximately 1.2 million passengers daily.

intput: Cairo Tianfu International Airport
output: Cairo Tianfu Intern

In [65]:
print_input_output("airport", 0)

Average input length: 147 characters
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>> Inputs and Outputs for Llama3-8B-Instruct on airport Known entities >>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
intput: Dubai International Airport
output: Dubai International Airport serves over 450,000 passengers daily, making it one of the busiest airports in the world.

intput: Ministro Pistarini International Airport
output: Ministro Pistarini International Airport, also known as Ezeiza International Airport, serves over 2.5 million passengers annually, with an average of around 6,800 passengers passing through its gates daily.

intput: Sydney Kingsford Smith Airport
output: Sydney Kingsford Smith Airport, also known as Kingsford Smith Airport, serves over 140,000 passengers daily, making it one of the busiest airports in the Asia-Pacific region.

intput: São Paulo/Guarulhos International Airport
output: São Paulo/Guarul

In [66]:
print_input_output("culture", 1)

Average input length: 230 characters
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>> Inputs and Outputs for Llama3-8B-Instruct on culture Unkmown entities >>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
intput: What is the hidden allegory in the movie The Forbidden Empire of Asteros by Darren Wirth?
output: There is no movie titled "The Forbidden Empire of Asteros" by Darren Wirth, as Darren Wirth is not a known film director and the title appears to be fictional.

intput: Who composed the symphony Shadows Over Twilight in the late 19th century?
output: The symphony "Shadows Over Twilight" was composed by Antonín Dvořák in 1891-1892.

intput: What was the inspiration behind Claudia Perisetti's painting Moonlit Sonata in Violet?
output: Claudia Perisetti's painting "Moonlit Sonata in Violet" was inspired by the dreamlike atmosphere of a moonlit night, with the artist seeking to capture the soft, ethereal light and

In [67]:
print_input_output("culture", 0)


Average input length: 223 characters
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>> Inputs and Outputs for Llama3-8B-Instruct on culture Known entities >>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
intput: What is the central theme of George Orwell's 1984?
output: The central theme of George Orwell's 1984 is the dangers of totalitarianism and the erosion of individual freedom, privacy, and autonomy through government control and propaganda.

intput: Which city serves as the primary setting in Victor Hugo's Les Misérables?
output: The city of Paris serves as the primary setting in Victor Hugo's novel Les Misérables.

intput: Who painted the Birth of Venus?
output: Sandro Botticelli painted the famous painting "The Birth of Venus" around 1485-1486.

intput: In what year was Beyoncé's Lemonade album released?
output: Beyoncé's Lemonade album was released on April 23, 2016.

intput: What is the title of the sequel

In [68]:
print_input_output("protein", 1)

Average input length: 219 characters
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>> Inputs and Outputs for Llama3-8B-Instruct on protein Unkmown entities >>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
intput: Actinospectrin
output: Actinospectrin is a protein that plays a crucial role in maintaining the structural integrity and mechanical properties of the erythrocyte membrane, particularly in the organization of the cytoskeleton and the attachment of membrane proteins.

intput: Actomorphin
output: Actomorphin, also known as morphine, is a naturally occurring opioid peptide that serves as a pain reliever and has a number of other physiological effects, including sedation, euphoria, and respiratory depression.

intput: Aerokinogen
output: Aerokinogen is a protein complex found in certain bacteria that plays a crucial role in the formation of biofilms by mediating the adhesion of bacteria to surfaces and facilita

In [69]:
print_input_output("protein", 0)

Average input length: 227 characters
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>> Inputs and Outputs for Llama3-8B-Instruct on protein Known entities >>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
intput: ABC Transporter
output: The primary function of ABC transporters is to facilitate the transport of various molecules, such as nutrients, ions, and toxins, across cell membranes, often in response to specific signals or stimuli.

intput: ATP Synthase
output: The primary function of ATP synthase is to generate ATP (adenosine triphosphate) from ADP (adenosine diphosphate) and Pi (inorganic phosphate) during cellular respiration, using the energy released from the proton gradient across the mitochondrial membrane.

intput: Acetylcholinesterase
output: Acetylcholinesterase is an enzyme that plays a crucial role in the nervous system by rapidly hydrolyzing the neurotransmitter acetylcholine, thereby terminating it

In [70]:
print_input_output("celebrity", 1)

Average input length: 161 characters
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>> Inputs and Outputs for Llama3-8B-Instruct on celebrity Unkmown entities >>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
intput: Darien Hawthorne
output: Darien Hawthorne is a fictional character known for being a charming, cunning, and resourceful young man with a mysterious past in the popular young adult novel series "The Wrath and the Dawn" by Renée Ahdieh.

intput: Leandra Rivers
output: Leandra Rivers is a fictional character and a skilled assassin in the popular video game series "Assassin's Creed", known for her agility, stealth, and deadly precision.

intput: Elara Meadows
output: Elara Meadows is a charismatic and adventurous leader who embodies the spirit of a true explorer, always seeking new frontiers and pushing the boundaries of what is thought possible.

intput: Liora Ashby
output: Liora Ashby is a brilliant and fi

In [72]:
print_input_output("celebrity", 0)

Average input length: 178 characters
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>> Inputs and Outputs for Llama3-8B-Instruct on celebrity Known entities >>>>>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
intput: Leonardo da Vinci
output: Leonardo da Vinci was a renowned Italian polymath and artist, widely considered one of the greatest painters, inventors, and thinkers of all time.

intput: Napoleon Bonaparte
output: Napoleon Bonaparte was a French military and political leader who rose to power during the French Revolution and became Emperor of the French, known for his military conquests and administrative reforms.

intput: Mahatma Gandhi
output: Mahatma Gandhi was a renowned Indian independence activist who employed non-violent civil disobedience to lead India to independence from British rule.

intput: Albert Einstein
output: Albert Einstein was a renowned German-born theoretical physicist who is best known fo