# Tutorial 2: Extracting or highlighting relevant text with LLMs

*This notebook is part of the [LLMCode library](https://github.com/PerttuHamalainen/LLMCode).*

*A note on data privacy: The user experience of this notebook is better on Google Colab, but if you are processing data that cannot be sent to Google and OpenAI servers, you should run this notebook locally using the "Aalto" LLM API.*

**Learning goal:** A key step in qualitative data analysis is extracting or highlighting parts of texts (e.g., interviews, survey responses) that are relevant to one's research question. This typically precedes a subsequent analysis step such as assigning codes to the extracts.

In this notebook, you'll learn how LLMs can be prompted to do the extraction based on your instructions and examples.

**How to use this Colab notebook?**
* If you are not familiar to Colab, please first practice with [Tutorial 1](https://colab.research.google.com/github/PerttuHamalainen/LLMCode/blob/master/data_exploration_and_visualization.ipynb)

* Select the LLM API and model to use below. The default values are recommended, but more expensive models such as GPT-4-Turbo may give better results.

* Select "Run all" from the Runtime menu above.

* Enter your API key below when prompted

* Proceed top-down following the instructions

**New to Colab notebooks?**

Colab notebooks are browser-based learning environments consisting of *cells* that include either text or code. The code is executed in a Google virtual machine instead of your own computer. You can run code cell-by-cell (click the "play" symbol of each code cell), and selecting "Run all" as instructed above is usually the first step to verify that everything works. For more info, see Google's [Intro video](https://www.youtube.com/watch?v=inN8seMm7UI) and [curated example notebooks](https://colab.google/notebooks/)



In [None]:
#Initial setup code. If you opened this notebook in Colab, this code is hidden
#by default to avoid unnecessary user interface clutter

#-------------------------------------------------------
#User-defined parameters. You can freely edit the values
llm_API="OpenAI" # @param ["OpenAI", "Aalto"]
LLM_model="gpt-4o" #@param ["gpt-4o-mini","gpt-4o", "gpt-4-turbo"]


#-------------------------------------------------------------------
#Implementation. Only edit this part if you know what your are doing

#Import packages
import pandas as pd
import numpy as np
from IPython.display import HTML, clear_output
import getpass
import os
import html
import plotly.express as px
import textwrap
import openpyxl
import re

#determine if we are running in Colab
import sys
original_dir = os.getcwd()
RunningInCOLAB = 'google.colab' in sys.modules
if RunningInCOLAB:
  import plotly.io as pio
  pio.renderers.default = "colab"
  if not os.path.exists("LLMCode"):
    if not os.getcwd().endswith("LLMCode"):
      print("Cloning the LLMCode repository...")
      #until the repo is public, we download this working copy instead of cloning
      #(shared as: anyone with the link can view)
      #!wget "https://drive.google.com/uc?export=download&id=1Td6ukrRGK9sUjlH1c6VTAYp2t_E1UNuQ" -O LLMCode.zip
      #!mkdir LLMCode
      #!unzip -q LLMCode.zip -d LLMCode
      !git clone https://github.com/PerttuHamalainen/LLMCode.git
  if not os.getcwd().endswith("LLMCode"):
    os.chdir("LLMCode")
    print("Installing dependencies...")
    !pip install -r requirements_notebooks.txt
import llmcode
os.chdir(original_dir)

#Jupyter is already running an asyncio event loop => need this hack for async OpenAI API calling
import nest_asyncio
nest_asyncio.apply()

#Prompt the user for an API key if not provided via a system variable
if llm_API=="OpenAI":
    if os.environ.get("OPENAI_API_KEY") is None:
        print("Please input an OpenAI API key")
        api_key = getpass.getpass()
        os.environ["OPENAI_API_KEY"] = api_key
elif llm_API=="Aalto":
    if os.environ.get("AALTO_OPENAI_API_KEY") is None:
        print("Please input an Aalto OpenAI API key")
        api_key = getpass.getpass()
        os.environ["AALTO_OPENAI_API_KEY"] = api_key
else:
    print(f"Invalid API type: {llm_API}")

#Initialize the LLMCode library
llmcode.init(API=llm_API)

Installing dependencies...
Please input an OpenAI API key
··········


# Preliminaries
First, let's have some quick tests/demonstrations about how we can prompt an LLM using Python code.

### Prompting a LLM using Python
Prompting a LLM is straighforward, as shown below. Note that **the lines starting with "#" are not code.** Instead, they are comments that describe with the code below them does.

If you want to learn more about Python basics such as variables and functions, check out this [YouTube playlist](https://www.youtube.com/playlist?list=PLUaB-1hjhk8GHKfndKjyDMHPg_HlQ4vpK).

In [None]:
#Define the prompt and store it in a variable (a container for some data)
#called "my_prompt".
my_prompt="Hi!"

#Call the query_LLM() function from the LLMCode library.
#Functions are pieces of Python code that perform some functionality.
#Here, the query_LLM() function takes in the "prompts" and "model" parameters and
#and sends the prompts to the LLM. The "LLM_model" is the model you defined above.
#The LLM response is is stored in the "response" variable"
response = llmcode.query_LLM(prompts=my_prompt,
                             model=LLM_model)

#Print out the response.
print("LLM response:")
print(response)

LLM response:
Hello! How can I assist you today?


### Prompting a LLM to highlight or extract relevant text

Let's first test a simple prompt for producing highlights relating to a research question.

**Exercise 1**

Modify the prompt to test the highlighting with some other description from the [Games as Art data](https://github.com/PerttuHamalainen/LLMCode/blob/master/test_data/bopp_test.csv).

Here's one example that you can use:

*Aside from the countless vistas the game provides, there is a moment at the very ended that affected me so profoundly that it couldn't think of any better way to explain it as art. You have spent the entire game nurturing this relationship with Delilah, as well as trying to handle Henry's own trauma. So much of their conversations reflecated how I felt, handled things. I wanted there to be a happy ending, a way in which both characters step away contented. But in truth, the ending is almost hallow, a real gut punch. I remember sitting as the credits role, wishing I could go back or somehow change the events, but they needed to be as they were to be that impactful.*

**Exercise 2:**
Edit the examples in the prompt to define a different highlighting style and see how the LLM adapts. For instance, you might try highlighting only the feeling and emotion words instead of full sentences. You can also add or change the examples if you like.

In [None]:
#Define the prompt.
prompt="""
Below, I will give you a game experience description from a research experiment about experiencing video games as art. Your task is to assist in analyzing the experience description.

The research question is: What feelings, emotions and sensations do players feel when experiencing video games as art?

Please carry out the following task:
- Identify and highlight statements relevant to the research question.

- Respond by repeating the original text, but surrounding the statements with double asterisks (**), as if they were bolded text in a Markdown document.

Below, I first give you an example of the output you should produce given the input.

After that, I give you the actual input to process.


EXAMPLE INPUT:

the experience was one of curiosity, uncertainty and puzzeling. i felt calmness, clarity and beauty. at first i did not know what to expect exactly, but with time, i learned that this was part of the experience. moving around pieces of art made me feel as if i am part of the art, i felt, the art was part of the interaction: everything came to live through my interaction, which made me feel part of it.

EXAMPLE OUTPUT:

**the experience was one of curiosity, uncertainty and puzzeling. i felt calmness, clarity and beauty.** at first i did not know what to expect exactly, but with time, i learned that this was part of the experience. **moving around pieces of art made me feel as if i am part of the art, i felt, the art was part of the interaction: everything came to live through my interaction, which made me feel part of it.**

ACTUAL INPUT:

The Shapeshifting Detective is a supernatural murder mystery game with three potential culprits, one of whom is the tarot reader Rayne. Possessed by an interdimensional being known as a traveller, Rayne is trying to cover up the murder the traveller forced him to commit. If the player doesn't have Rayne jailed, the game concludes with Rayne kidnapping you, planning to murder you so you cannot have him put in prison for the traveller's crime. In response, you can shapeshift into his closest friend, Bronwyn, to his shock. Resigned, he tells you he has no choice but to leave, in essence exiling himself so the traveller cannot return and he is not imprisoned. With a sad smile, he says "I'll miss you the most" and walks away, never to be seen again. I found this scene to be incredibly touching; Rayne is driven by his self-preservation and terror, yet is simultaneously unable to hurt Bronwyn even if refusing to do so puts him at risk. He cuts himself off from who he cares about most to protect them, and I think there's a beauty in that tragedy.

"""

#Query the LLM
response=llmcode.query_LLM(prompt,model=LLM_model)

#Print the LLM output formatted so that the highlights are in bold
print("LLM output:\n")
display(HTML(llmcode.extracts_to_html(response)))

LLM output:



# Processing multiple texts

The simple prompting approach above breaks down if you have many texts to process. You can of course just copy-paste a whole document to the prompt, but the longer the document, the more likely it is that the LLM makes errors.

To mitigate the above, LLMCode provides the following:
- A simple interface for processing the data in multiple chunks
- Automatic checking for LLM hallucinations - one wants to be sure that the LLM outputs the correct text instead of omitting anything or inventing new text.
- Automatic correcting of minor hallucinations - for instance, it is quite common that the LLM automatically corrects spelling of the highlighted text. LLMCode detects that and resubstitutes the original text.

To use LLMCode to highlight relevant passages, you need to:
- Define the prompt beginning that describes the data and research question
- Define a number of examples

The cells below show you how.

### Load data
Running the code below loads test data and prints a number of highlighted examples.

By default, this notebook uses the [Games As Art](https://osf.io/ryvt6/) open dataset from a survey about how and why people experience video games as art. For testing this notebook, we have annotated the freeform artistic game experience descriptions by highlighting parts of that describe feelings, emotions, and sensations experienced by the player.

**User-Defined Parameters**

*data_filename_or_URL* : Filename or URL of the data to analyze. This notebook supports either spreadsheets (.csv or .xlsx) with texts in a single column or Word (.docx) with highlighted texts marked with the commenting functionality. Note that for .docx, URLs are not currently supported.

*examples_filename_or_URL* : Filename or URL of examples of what to highlight.  If this is empty, it is assumed that the analyzed data also contains example annotations. This is the case with our default test data, but the option to use a separate example file is provided in case one wants to process multiple data files using the same examples. Note that for .docx, URLs are not currently supported.

*data_column*: The name of the column containing the analyzed text. For .docx files, don't change the defaults.

*ground_truth_column*: The name of the column containing ground-truth human highlights. For .docx files, don't change the defaults.

*validation_data* : How many first texts of the dataset to use for so-called validation data (explained later). A smaller value makes working with this tutorial faster and cheaper but produces less reliable quality metrics. We recommend testing the notebook with the defaults.

*test_data* : How many first texts of the dataset after the validation data to use for so-called test data (explained later). A smaller value makes working with this tutorial faster and cheaper but produces less reliable quality metrics. We recommend testing the notebook with the defaults.

*examples_to_view* : How many examples to print out


**How to use your own data?**

If you have your own data as a column of texts in an Excel or csv file, you can either 1) upload it to Colab using the file browser on the left and input its filename, or 2) input a download URL for your data. Remember to specify which data columns to use!

In [None]:
#-------------------------------------------------------
#User-defined parameters. You can freely edit the values
data_filename_or_URL="LLMCode/test_data/bopp_test_augmented_feelings2.docx" #@param {type:"string"}
examples_filename_or_URL="" #@param {type:"string", placeholder:"leave this empty to use examples from the data file"}
data_column="text" #@param {type:"string"}
ground_truth_column="coded_text" #@param {type:"string"}
validation_data=50 #@param {type:"integer"}
test_data=50 #@param {type:"integer"}
examples_to_view=10 #@param {type:"integer"}

#-------------------------------------------------------------------
#Implementation. Only edit this part if you know what your are doing

#data load helper function
def load_data(filename_or_URL):
  #Load the file
  if filename_or_URL.endswith(".xlsx"):
    df = pd.read_excel(filename_or_URL)
  elif filename_or_URL.endswith(".docx"):
    df = llmcode.open_docx_and_process_codes(filename_or_URL)
  elif filename_or_URL.endswith(".csv"):
    df = pd.read_csv(filename_or_URL)
  else:
    raise Exception("File type not supported.")

  #Fix a possible Excel import issue
  df[data_column]=df[data_column].astype(str).apply(openpyxl.utils.escape.unescape)
  if ground_truth_column in df.columns:
    df[ground_truth_column]=df[ground_truth_column].astype(str).apply(openpyxl.utils.escape.unescape)

  #In this notebook, we only focus on the highlights
  #Thus, we remove any codes defined for the highlights enclosed between <sup> and </sup>
  if ground_truth_column in df.columns:
    df[ground_truth_column]=df[ground_truth_column].str.replace(r'<sup>.*?</sup>', '', regex=True)
  return df

#load data file
df=load_data(data_filename_or_URL)

#do the validation-test data split
df_test=df.iloc[validation_data:validation_data+test_data]
df=df.head(validation_data)

#load example file if defined. if not, we take a copy of the validation data
if examples_filename_or_URL:
  df_examples=load_data(examples_filename_or_URL)
else:
  df_examples=df.copy()

#Print examples formatted so that the highlights are in bold
print(f"{examples_to_view} first rows of the example data:")
html_text=llmcode.extracts_to_html(df_examples.head(examples_to_view)[ground_truth_column])
display(HTML(html_text))

10 first rows of the example data:


### Processing the data

The code below does the following:
- Process the loaded data
- If the data contains ground truth human highlights, calculate human-LLM agreement calculates their overlap using **Intersection over Union (IoU, a.k.a. Jaccard Index)** in range 0...1, where 0 means no overlap and 1 means perfect overlap, i.e., identical human and LLM highlights. More info on IoU: https://en.wikipedia.org/wiki/Jaccard_index.
- If IoU was calculated, we print it along with a table of human and LLM highlights side-by-side, sorted by IoU.

More info on IoU: https://en.wikipedia.org/wiki/Jaccard_index.


**How to examine the output**

To get an idea of the worst-case errors made by the LLM, check out the last rows of the output table which have the lowest IoU values.

**Exercise 1**

Increase the number of examples used in the prompt using the provided slider. How do the results change?

**Exercise 2**

Inspect the results table. Do the rows with lowest IoU reveal errors or inconsistencies in the human-annotated ground truth highlights? This is more common than one might think. Although qualitative data analysis can be subjective and it may not make sense to compare the codes and highlights of two human coders, we have found the human-LLM comparison to be revealing one's own inconsistencies.

**Exercise 3**

Click to show the code and try to edit the prompt to further improve the results. Can you add more instructions or change the wording of the prompt to be more direct and unambiguous? Remember that a good LLM prompt always provides 1) precise instructions and 2) enough high-quality examples.

*A model solution to Exercise 3 is given in the cell below the code, but hidden by default.*


In [None]:
#-------------------------------------------------------
#User-defined parameters. You can freely edit the values

#Number of examples to use from the example data
#Note that the numbering is 0-based, i.e.,
number_of_examples=2 #@param {type:"slider",min:"1",max:"10"}

#Define the prompt beginning. The code below will automatically add the examples.
prompt="""Below, I will give you a game experience description from a research experiment about experiencing video games as art. Your task is to assist in analyzing the experience description.

The research question is: What feelings, emotions and sensations do players feel when experiencing video games as art?

Please carry out the following task:
- Identify and highlight statements relevant to the research question.

- Respond by repeating the original text, but surrounding the statements with double asterisks (**), as if they were bolded text in a Markdown document.
"""

#-------------------------------------------------------------------
#Implementation. Only edit this part if you know what your are doing

#Add the examples to the prompt
prompt+="""

Below, I first give you an example of the output you should produce given the input.

After that, I give you the actual input to process.

"""

for example in range(number_of_examples):
  prompt+=f"EXAMPLE INPUT:\n\n{df_examples.iloc[example][data_column]}\n\n"
  prompt+=f"EXAMPLE OUTPUT:\n\n{df_examples.iloc[example][ground_truth_column]}\n\n"
prompt+="ACTUAL INPUT:\n\n"


#call the extract_relevant method with the prompt and data
df_extracts=llmcode.extract_relevant(prompt,
                          df,
                          data_col=data_column,
                          extracts_col="llm_extracts",
                          model=LLM_model
                         )

#calculate the IoU
IoU,html_report=llmcode.extract_IoU(df_extracts,
                                    extracts_col="llm_extracts",
                                    reference_col=ground_truth_column)

#display the quality report and print out the average IoU
display(HTML(html_report))
print(f"Average IoU = {np.mean(IoU)}")


 |████████████████████--------------------------------------------------------------------------------| 20.0%  |████████████████████████████████████████------------------------------------------------------------| 40.0%  |████████████████████████████████████████████████████████████----------------------------------------| 60.0%  |████████████████████████████████████████████████████████████████████████████████--------------------| 80.0%  |████████████████████████████████████████████████████████████████████████████████████████████████████| 100.0% 

to whom does the unearthed art of the past belong? Does an institution's moral imperative to preserve outweigh the rights of those living on the land

Highlighting the whole text to avoid false negatives:


**An experience that shows the distortion of time and how we interpret the past. A lesson in how we canwhile dealing woth the morality of archaeology as a field, to whom does the unearthed art of the past belong? Does an institution's 

Unnamed: 0,coded_text,llm_extracts,IoU
0,"the experience was one of curiosity, uncertainty and puzzeling. i felt calmness, clarity and beauty. at first i did not know what to expect exactly, but with time, i learned that this was part of the experience. moving around pieces of art made me feel as if i am part of the art, i felt, the art was part of the interaction: everything came to live through my interaction, which made me feel part of it.","the experience was one of curiosity, uncertainty and puzzeling. i felt calmness, clarity and beauty. at first i did not know what to expect exactly, but with time, i learned that this was part of the experience. moving around pieces of art made me feel as if i am part of the art, i felt, the art was part of the interaction: everything came to live through my interaction, which made me feel part of it.",1.0
23,"you start in an empty screen, everything is white. if you shoot, your gun fires paint splotches that make the room and later a medium sized map visible to you. there were multiple levels all with a different artistic choice, but the paint-the-world level was the most memorable to be because everyone world will look different depending where you want to see something. there was a sense of wonder, how clear you could see the world with just black paint splotches in a completely white map.","you start in an empty screen, everything is white. if you shoot, your gun fires paint splotches that make the room and later a medium sized map visible to you. there were multiple levels all with a different artistic choice, but the paint-the-world level was the most memorable to be because everyone world will look different depending where you want to see something. there was a sense of wonder, how clear you could see the world with just black paint splotches in a completely white map.",1.0
37,The game was very much like a classic story. It could only progress when you did. The writing style was very touching,The game was very much like a classic story. It could only progress when you did. The writing style was very touching,1.0
25,The immersion made me feel like I had been transported to another world. The life lime environment scared me a couple of times and I felt genuine fear going through some levels,The immersion made me feel like I had been transported to another world. The life lime environment scared me a couple of times and I felt genuine fear going through some levels.,0.958333
16,The latest experience I can think of as about art is playing The Witcher 3. The landscapes are absolutely amazing especially when the sunsets and even the way Geralt's hair is moving is thought about very well. When the horse comes (your buddy) it's just an amazing feeling.,The latest experience I can think of as about art is playing The Witcher 3. The landscapes are absolutely amazing especially when the sunsets and even the way Geralt's hair is moving is thought about very well. When the horse comes (your buddy) it's just an amazing feeling.,0.95625
39,"The game made me feel a range of emotions throughout. The base one being a feeling of accomplishment as I completed puzzles and figured out solutions to all the different “experiments”. I also felt joy and happiness, both in completing puzzles that I struggled with and during the moments of dark comedy and deadpan humor. The sadness and feeling of “Oh it’s on now” upon seeing and figuring out that everyone in the enrichment center is dead and Doug Ratman is the sole survivor.","The game made me feel a range of emotions throughout. The base one being a feeling of accomplishment as I completed puzzles and figured out solutions to all the different “experiments”. I also felt joy and happiness, both in completing puzzles that I struggled with and during the moments of dark comedy and deadpan humor. The sadness and feeling of “Oh it’s on now” upon seeing and figuring out that everyone in the enrichment center is dead and Doug Ratman is the sole survivor.",0.888889
20,The last level of Journey going up the mountain. The music and visuals mix so well with the euphoric nature of the scene. The gameplay becomes free flowing and effortless and you have left is feelings. I cried. I cry every time I play that level and every time I hear the soundtrack of that part. The game reached the air to move me and comfort me.,The last level of Journey going up the mountain. The music and visuals mix so well with the euphoric nature of the scene. The gameplay becomes free flowing and effortless and you have left is feelings. I cried. I cry every time I play that level and every time I hear the soundtrack of that part. The game reached the air to move me and comfort me.,0.858696
13,i finished a game and felt like i had been through a journey of emotions through its narrative.,i finished a game and felt like i had been through a journey of emotions through its narrative.,0.779221
44,"I became heavily invested in the story and atmosphere of Red Dead Redemption 2, to the extent that I engaged in as many quests as possible to experience as much character interaction as possible. I would also spend hours simply wandering the countryside to take in the beautiful scenery. When the story ended, I was heartbroken at the conclusion and it affected me deeply. I cried through the credits, and I felt like I had lost a close friend. Then, the epilogue began and I felt like it was too soon; I felt like I needed more time to mourn the ending of the game. As I played through the epilogue, I continued to feel sad for the loss of the main character, even as I was happy to see the evolution of the other characters.","I became heavily invested in the story and atmosphere of Red Dead Redemption 2, to the extent that I engaged in as many quests as possible to experience as much character interaction as possible. I would also spend hours simply wandering the countryside to take in the beautiful scenery. When the story ended, I was heartbroken at the conclusion and it affected me deeply. I cried through the credits, and I felt like I had lost a close friend. Then, the epilogue began and I felt like it was too soon; I felt like I needed more time to mourn the ending of the game. As I played through the epilogue, I continued to feel sad for the loss of the main character, even as I was happy to see the evolution of the other characters.",0.736842
34,"The game provided an engaging narrative that I was emotionally invested in, and was also enjoyable to play due to the beautiful environment it took place in. I valued being able to explore the space and become better acquainted with it, as if I really was spending an extended period of time at a real location. Timed dialogue choices allowed me to step into the shoes of the protagonist and try to understand how he was feeling.","The game provided an engaging narrative that I was emotionally invested in, and was also enjoyable to play due to the beautiful environment it took place in. I valued being able to explore the space and become better acquainted with it, as if I really was spending an extended period of time at a real location. Timed dialogue choices allowed me to step into the shoes of the protagonist and try to understand how he was feeling.",0.72


Average IoU = 0.44177757787924277


### Solution to Exercise 3 (Click > to expand)

The code below is the same as above but the prompt has more explicit bullet-point instructions that address some of the common LLM errors.

To further improve the IoU, you could:
- Correct the inconsistencies in the ground truth data. For instance, the human labeler has missed the statement *The experiences with the game's characters up to that point (in overall plot scenes, single-character support dialogues, and even small in-battle dialogue), overarching plot, and even visual/music cues made it feel like a very significant decisive moment.* Furthermore, experiencing beauty is something that repeats in the data and it might also be considered a feeling. Therefore, passages such as "I think there's a beauty in that tragedy." should maybe be highlighted, as the LLM highlights suggest.
- Add some of the worst-case results to the examples. However, note the closing remarks below on overfitting.
- Try a better but more expensive LLM such as GPT-4-turbo

In [None]:
#-------------------------------------------------------
#User-defined parameters. You can freely edit the values

#Number of examples to use from the example data
#Note that the numbering is 0-based, i.e.,
num_examples=10

#Define the prompt beginning. The code below will automatically add the examples.
improved_prompt="""Below, I will give you a game experience description from a research experiment about experiencing video games as art. Your task is to assist in analyzing the experience description.

The research question is: What feelings, emotions and sensations do players feel when experiencing video games as art?

Please carry out the following task:
- Identify and highlight statements relevant to the research question.

- Respond by repeating the original text, but highlighting the relevant statements by surrounding the statements with double asterisks, as if they were bolded text in a Markdown document.

- Ignore other text, e.g., text that talks about what art is in general or only describes the game but not aspects of the player's subjective experience such as what they feel or think. However, note that a description of a game can indirectly describe the experience; for example, "The game combined enchanting graphics with a calming soundtrack" indicates the player feeling enchanted and calm.

- Preserve exact formatting of the original text. Do not correct typos or remove unnecessary spaces.

- If no statements were found, just respond with the original text
"""

#-------------------------------------------------------------------
#Implementation. Only edit this part if you know what your are doing

#Add the examples to the prompt
improved_prompt+="""

Below, I first give you an example of the output you should produce given the input.

After that, I give you the actual input to process.

"""

for example in range(num_examples):
  improved_prompt+=f"EXAMPLE INPUT:\n\n{df_examples.iloc[example][data_column]}\n\n"
  improved_prompt+=f"EXAMPLE OUTPUT:\n\n{df_examples.iloc[example][ground_truth_column]}\n\n"
improved_prompt+="ACTUAL INPUT:\n\n"


#call the extract_relevant method with the prompt and data
df_extracts=llmcode.extract_relevant(improved_prompt,
                          df,
                          data_col=data_column,
                          extracts_col="llm_extracts",
                          model=LLM_model
                      )

#calculate the IoU
IoU,html_report=llmcode.extract_IoU(df_extracts,
                                    extracts_col="llm_extracts",
                                    reference_col=ground_truth_column)

#display the quality report and print out the average IoU
display(HTML(html_report))
print(f"Average IoU = {np.mean(IoU)}")


 |████████████████████--------------------------------------------------------------------------------| 20.0%  |████████████████████████████████████████------------------------------------------------------------| 40.0%  |████████████████████████████████████████████████████████████----------------------------------------| 60.0%  |████████████████████████████████████████████████████████████████████████████████--------------------| 80.0%  |████████████████████████████████████████████████████████████████████████████████████████████████████| 100.0% 
Had to reconstruct 0 texts due to LLM errors.
A total of 0 texts were highlighted fully because of LLM errors, to avoid false negatives.
Average IoU: 0.7131526737942674


Unnamed: 0,coded_text,llm_extracts,IoU
0,"the experience was one of curiosity, uncertainty and puzzeling. i felt calmness, clarity and beauty. at first i did not know what to expect exactly, but with time, i learned that this was part of the experience. moving around pieces of art made me feel as if i am part of the art, i felt, the art was part of the interaction: everything came to live through my interaction, which made me feel part of it.","the experience was one of curiosity, uncertainty and puzzeling. i felt calmness, clarity and beauty. at first i did not know what to expect exactly, but with time, i learned that this was part of the experience. moving around pieces of art made me feel as if i am part of the art, i felt, the art was part of the interaction: everything came to live through my interaction, which made me feel part of it.",1.0
14,"An experience that shows the distortion of time and how we interpret the past. A lesson in how we canwhile dealing woth the morality of archaeology as a field, to whom does the unearthed art of the past belong? Does an institution's moral imperative to preserve outway the rights of thoae living on the land","An experience that shows the distortion of time and how we interpret the past. A lesson in how we canwhile dealing woth the morality of archaeology as a field, to whom does the unearthed art of the past belong? Does an institution's moral imperative to preserve outway the rights of thoae living on the land",1.0
48,"An RPG I am playing features digital art cut scene s, a plot that could be considered a novel, and a beautifully depicted digital world.","An RPG I am playing features digital art cut scene s, a plot that could be considered a novel, and a beautifully depicted digital world.",1.0
47,The mechanics and aesthetics of the game were in harmony in a way that made the story feel even more alive. I think it was a form of artistic expression not available to cinema or literature because none of those mediums can use my agency as a core component.,The mechanics and aesthetics of the game were in harmony in a way that made the story feel even more alive. I think it was a form of artistic expression not available to cinema or literature because none of those mediums can use my agency as a core component.,1.0
40,"The game is largely dependent on the storyline. Most decisions are presented with a fair amount of time to consider the variables, and there is no mashing of multiple buttons to achieve complicated moves. The game is very story-driven with the player being limited often to only picking different choices that affect the major story, but only slightly, because the end result is invariably similar.","The game is largely dependent on the storyline. Most decisions are presented with a fair amount of time to consider the variables, and there is no mashing of multiple buttons to achieve complicated moves. The game is very story-driven with the player being limited often to only picking different choices that affect the major story, but only slightly, because the end result is invariably similar.",1.0
39,"The game made me feel a range of emotions throughout. The base one being a feeling of accomplishment as I completed puzzles and figured out solutions to all the different “experiments”. I also felt joy and happiness, both in completing puzzles that I struggled with and during the moments of dark comedy and deadpan humor. The sadness and feeling of “Oh it’s on now” upon seeing and figuring out that everyone in the enrichment center is dead and Doug Ratman is the sole survivor.","The game made me feel a range of emotions throughout. The base one being a feeling of accomplishment as I completed puzzles and figured out solutions to all the different “experiments”. I also felt joy and happiness, both in completing puzzles that I struggled with and during the moments of dark comedy and deadpan humor. The sadness and feeling of “Oh it’s on now” upon seeing and figuring out that everyone in the enrichment center is dead and Doug Ratman is the sole survivor.",1.0
37,The game was very much like a classic story. It could only progress when you did. The writing style was very touching,The game was very much like a classic story. It could only progress when you did. The writing style was very touching.,1.0
27,"One of many moments in this game that I considered as art: the character enters a valley, right after a small outpost. Everything is huge, much bigger than what you experienced in the game beforehand, with mountains and an orange-is surrounding. An enormous eagle-like machine flies high in the sky, while the valley is filled with both robots and animals, as well as few humans. Even though the character is in the middle of the screen, it is barely noticeable compared to the size of the scenery.","One of many moments in this game that I considered as art: the character enters a valley, right after a small outpost. Everything is huge, much bigger than what you experienced in the game beforehand, with mountains and an orange-is surrounding. An enormous eagle-like machine flies high in the sky, while the valley is filled with both robots and animals, as well as few humans. Even though the character is in the middle of the screen, it is barely noticeable compared to the size of the scenery.",1.0
1,"The Shapeshifting Detective is a supernatural murder mystery game with three potential culprits, one of whom is the tarot reader Rayne. Possessed by an interdimensional being known as a traveller, Rayne is trying to cover up the murder the traveller forced him to commit. If the player doesn't have Rayne jailed, the game concludes with Rayne kidnapping you, planning to murder you so you cannot have him put in prison for the traveller's crime. In response, you can shapeshift into his closest friend, Bronwyn, to his shock. Resigned, he tells you he has no choice but to leave, in essence exiling himself so the traveller cannot return and he is not imprisoned. With a sad smile, he says ""I'll miss you the most"" and walks away, never to be seen again. I found this scene to be incredibly touching; Rayne is driven by his self-preservation and terror, yet is simultaneously unable to hurt Bronwyn even if refusing to do so puts him at risk. He cuts himself off from who he cares about most to protect them, and I think there's a beauty in that tragedy.","The Shapeshifting Detective is a supernatural murder mystery game with three potential culprits, one of whom is the tarot reader Rayne. Possessed by an interdimensional being known as a traveller, Rayne is trying to cover up the murder the traveller forced him to commit. If the player doesn't have Rayne jailed, the game concludes with Rayne kidnapping you, planning to murder you so you cannot have him put in prison for the traveller's crime. In response, you can shapeshift into his closest friend, Bronwyn, to his shock. Resigned, he tells you he has no choice but to leave, in essence exiling himself so the traveller cannot return and he is not imprisoned. With a sad smile, he says ""I'll miss you the most"" and walks away, never to be seen again. I found this scene to be incredibly touching; Rayne is driven by his self-preservation and terror, yet is simultaneously unable to hurt Bronwyn even if refusing to do so puts him at risk. He cuts himself off from who he cares about most to protect them, and I think there's a beauty in that tragedy.",1.0
23,"you start in an empty screen, everything is white. if you shoot, your gun fires paint splotches that make the room and later a medium sized map visible to you. there were multiple levels all with a different artistic choice, but the paint-the-world level was the most memorable to be because everyone world will look different depending where you want to see something. there was a sense of wonder, how clear you could see the world with just black paint splotches in a completely white map.","you start in an empty screen, everything is white. if you shoot, your gun fires paint splotches that make the room and later a medium sized map visible to you. there were multiple levels all with a different artistic choice, but the paint-the-world level was the most memorable to be because everyone world will look different depending where you want to see something. there was a sense of wonder, how clear you could see the world with just black paint splotches in a completely white map.",1.0


Average IoU = 0.7131526737942674


### Exporting the results for further analysis
A typical use case for the LLM-based highlighting is to use it as a preprocessing step for manual qualitative coding, helping the coder to quickly spot the relevant parts of the text. However, you should only do this if the performance of the model is acceptable&mdash;**are you ok with the frequency and types of errors the LLM makes with your test data?**

To code the LLM-highlighted data in a tool such as Atlas.ti, you can run the code below to export it as .pdf. If you run this notebook locally, the .pdf is saved to your computer. If you run this notebook in Colab, the .pdf can be downloaded using Colab's file browser (click the "folder" icon on the left).


In [None]:
#-------------------------------------------------------
#User-defined parameters. You can freely edit the values

pdf_filename = "LLM_highlights.pdf"  #@param {type:"string"}

#-------------------------------------------------------------------
#Implementation. Only edit this part if you know what your are doing
markdown_output="\n\n".join(df_extracts["llm_extracts"])
from markdown_pdf import MarkdownPdf, Section
pdf = MarkdownPdf(toc_level=2)
pdf.add_section(Section(markdown_output))
pdf.save(pdf_filename)


# Closing remarks: Validation and Test Data
In all AI and Machine Learning, a common danger is to [overfit](https://en.wikipedia.org/wiki/Overfitting) one's model or approach to some data, making it generalize poorly to new data.

**The more you iterate on your prompt instructions and examples, the more you are in danger of overfitting.**

This is why it is a standard practice to split one's data into [three distinct parts](https://en.wikipedia.org/wiki/Training,_validation,_and_test_data_sets):

1. Training data: This is used to train a model. When using readymade LLM's, one does not have access to the training data, and we can ignore this concept here.

2. Validation data: This is typically used to search for the best possible [hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning) such as when to stop training or how many layers to use in a neural network model.

3. Test data: This is used to test the performance of the final model after the hyperparameter tuning. *Separating test and validation data avoids overly optimistic test results caused by overfitting the hyperparameters to the validation data*.

The prompt instructions and examples can be considered as hyperparameters. Therefore, **one should ideally iterate/optimize the prompt with validation data and when done, verify the performance with separate test data**. This is especially important if your human-defined reference dataset is small.

**For academic research, we recommend using at least 100 texts for both the validation and test data**, i.e., the data file should have at least 200 texts with human-annotated ground truth highlights, as the first 100 would be used for validation and next 100 for testing.

For industry research, the designer or researcher should use their own judgement - how crucial is it to be able to measure the performance accurately?

**How to report LLM use in research papers?** There does not currently exist an established best practice for reporting LLM-based qualitative analysis tool use, but if you use the LLM-based highlighting, you could report at least the full prompt with examples, the number of validation and test data texts, the validation and test data IoUs, and a table with examples of the worst and best case LLM performance so that the reader can judge themselves if the LLM performance is acceptable.

**Exercise: Process the test data and calculate IoU**

Run the code below to calculate the IoU using the test data specified earlier. Note: this will re-use the prompt defined in the "Processing the data" section above. Edit and run that part again if you want to re-test with the test data.

Inspect the results. Is the LLM performance different than for the validation data? Can you spot any further human annotation errors or inconsistencies that should perhaps be corrected?



In [None]:
# @title
#call the extract_relevant method with the prompt and data
df_extracts_test=llmcode.extract_relevant(prompt,
                          df_test,
                          data_col=data_column,
                          extracts_col="llm_extracts",
                          model=LLM_model
                         )

#calculate the IoU
IoU,html_report=llmcode.extract_IoU(df_extracts_test,
                                    extracts_col="llm_extracts",
                                    reference_col=ground_truth_column)

#display the quality report and print out the average IoU
display(HTML(html_report))
print(f"Average IoU = {np.mean(IoU)}")


 |████████████████████--------------------------------------------------------------------------------| 20.0%  |████████████████████████████████████████------------------------------------------------------------| 40.0%  |████████████████████████████████████████████████████████████----------------------------------------| 60.0%  |████████████████████████████████████████████████████████████████████████████████--------------------| 80.0%  |████████████████████████████████████████████████████████████████████████████████████████████████████| 100.0% 


Comment: Themed, expressive worlds, exit stage left, GIANT monsters, raccoons that fly, bears that turn to stone, music that inspires. I was young still, but up to this point games we a very specific thing; this one is Mario, he steps on turtles and saves the princess, this one is Contra, they're soldiers fighting off aliens, this one is Metroid, a space soldier fighting aliens. But this one is art, you're not just moving from one level 

Unnamed: 0,coded_text,llm_extracts,IoU
61,"LiS2 - an interactive decision based game. I felt engrossed, like I was involved in an interactive movie. I was emotionally attached to the characters, trying to make the right decisions for them and limiting negative consequences. I was smiling at the cut scenes with music, and panicking at stressful scenes. It was very emotionally and philosophically stimulating for me, and I have been reflecting upon it for weeks.","LiS2 - an interactive decision based game. I felt engrossed, like I was involved in an interactive movie. I was emotionally attached to the characters, trying to make the right decisions for them and limiting negative consequences. I was smiling at the cut scenes with music, and panicking at stressful scenes. It was very emotionally and philosophically stimulating for me, and I have been reflecting upon it for weeks.",1.0
90,"Most recent experience was with the Last of Us Part II. Specifically, the story is centered around two characters who's narratives are intertwined with one another. They're respective journeys challenge the player/viewer, and force them to empathize with characters who consistently commit heinous acts of violence.","Most recent experience was with the Last of Us Part II. Specifically, the story is centered around two characters who's narratives are intertwined with one another. They're respective journeys challenge the player/viewer, and force them to empathize with characters who consistently commit heinous acts of violence.",1.0
71,"All games are a form of art, to pick just one instance is really difficult, so I will just tell what my most resent gaming experience was. Just after starting the game I was greeted by an UI which was well crafted to combine functionality with design around the theme of the game. During the loading screen which was an artwork there was a short bit of narrated information like you would find in a museum about Chinese history. Then I was thrown onto an artistic map of china with many details and sights. The map instantly sparked my curiosity about so many sights I never new of as I wasn't very familiar with. The craftsmanship at hand was great as it can't be easy to combine history, functionality and visual beauty in such a way.","All games are a form of art, to pick just one instance is really difficult, so I will just tell what my most resent gaming experience was. Just after starting the game I was greeted by an UI which was well crafted to combine functionality with design around the theme of the game. During the loading screen which was an artwork there was a short bit of narrated information like you would find in a museum about Chinese history. Then I was thrown onto an artistic map of china with many details and sights. The map instantly sparked my curiosity about so many sights I never new of as I wasn't very familiar with. The craftsmanship at hand was great as it can't be easy to combine history, functionality and visual beauty in such a way.",1.0
69,"The game and it's graphics, music and story made me feel calm and happy in a way nothing else could at the time. Playing it felt like a journey to another, better place, and that's art to me.","The game and it's graphics, music and story made me feel calm and happy in a way nothing else could at the time. Playing it felt like a journey to another, better place, and that's art to me.",1.0
68,"I was able to feel the emotions portrayed as much as the characters did. I felt love for another being and I truly felt a sense of loss when the characters lost things dear to them. During the game and after it ended, I felt the experience as real and genuine. The emotions it made me experience were unbearably strong and I could not get them out of my head or my heart","I was able to feel the emotions portrayed as much as the characters did. I felt love for another being and I truly felt a sense of loss when the characters lost things dear to them. During the game and after it ended, I felt the experience as real and genuine. The emotions it made me experience were unbearably strong and I could not get them out of my head or my heart",1.0
85,It was a thought and emotion provoking experience. It called to mind the feelings of workday life like monotony and powerlessness. It also evoked a feeling of curiosity and discovery and fear of the unknown or of change.,It was a thought and emotion provoking experience. It called to mind the feelings of workday life like monotony and powerlessness. It also evoked a feeling of curiosity and discovery and fear of the unknown or of change.,1.0
51,"I remember the game to be so warm at moments, I loved how they mixed all components so well, the music, the colors, the atmosphere, it made me feel as if a was in a movie, the plot was also very good, for me it was unexpected, I also loved the duality of the game as warm as it was it was also dark and cold","I remember the game to be so warm at moments, I loved how they mixed all components so well, the music, the colors, the atmosphere, it made me feel as if a was in a movie, the plot was also very good, for me it was unexpected, I also loved the duality of the game as warm as it was it was also dark and cold",1.0
57,"There are so many moments in narrative, story-driven games in particular that I think classify as art. The mechanics of the games being stripped back to their base basics necessitate the developers to spend their time and wow the player in other ways; and they mostly do that through soundtrack and visuals. I can think of so many moments from Life is Strange where I sat there, not playing the game, just watching the world around me; the people walking about, the perfectly-matched music blissfully playing away in the background, the story I'd been experiencing at the forefront of my mind as I immersed myself in the town of the game. Even to this day, no game has ever calmed me as much as Life is Strange has.","There are so many moments in narrative, story-driven games in particular that I think classify as art. The mechanics of the games being stripped back to their base basics necessitate the developers to spend their time and wow the player in other ways; and they mostly do that through soundtrack and visuals. I can think of so many moments from Life is Strange where I sat there, not playing the game, just watching the world around me; the people walking about, the perfectly-matched music blissfully playing away in the background, the story I'd been experiencing at the forefront of my mind as I immersed myself in the town of the game. Even to this day, no game has ever calmed me as much as Life is Strange has.",0.944282
74,"I remember time passing by without me noticing, feeling legit joy just from looking at the graphics, reading the dialogue and listening to the music and fx. While playing this game, I was able to completely immerse myself in the world the story developed in, like I was the character and I was going through every situation and task.","I remember time passing by without me noticing, feeling legit joy just from looking at the graphics, reading the dialogue and listening to the music and fx. While playing this game, I was able to completely immerse myself in the world the story developed in, like I was the character and I was going through every situation and task.",0.925651
94,"When i play a game I really enjoy and I just get lost in it all from the world to the sounds to the ambience, the characters are enguaging and I might even lose track of time in the real world.","When i play a game I really enjoy and I just get lost in it all from the world to the sounds to the ambience, the characters are enguaging and I might even lose track of time in the real world.",0.907285


Average IoU = 0.5017548587131252
