# A Hands-on Introduction to the Use of LLMs in Digital Humanities

### Requirements

In [None]:
pip install bitsandbytes

In [None]:
pip install accelerate

In [None]:
pip install anthropic

**Note**: After an initial installation of accelerate in Colab, the runtime must be restarted.
Sometimes it is also necessary to install the packages a second time after restarting runtime.

## 1. How to use Large Language Models (LLMs)

The simplest (and most widely known) way to interact with high-quality generative AI is through [ChatGPT](https://chatgpt.com/auth/login)

**Beware of Data Leakage**: »When you use our services for individuals such as ChatGPT, we may use your content to train our models. You can opt out of training through our privacy portal by clicking on “do not train on my content,” or to turn off training for your ChatGPT conversations, follow the instructions in our Data Controls FAQ. Once you opt out, new conversations will not be used to train our models.« ([OpenAI FAQs](https://help.openai.com/en/articles/6783457-what-is-chatgpt#))


### Prompt Engineering

Prompt engineering refers to »strategically designing task-specific instructions, 
refered to as prompts, to guide model output without altering parameters.« 
([Sahoo et al. 2024: »A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications«](https://doi.org/10.48550/arXiv.2402.07927))

For a detailed collection of common practices, including literature references, see: [Prompt Engineering Guide](https://www.promptingguide.ai/en)

**Key Practices**
- Write clear instructions
    - Ask the model to adapt a persona
    - Use delimiters to clearly indicate distinct parts of the input
    - Specify the steps rquired to complete a task
    - Provide examples
    - Specify the desired length of the output
- Provide reference text
- Split complex tasks into simpler subtasks
- Give the model time to ›think‹

## 2. Hugging Face

[Hugging Face](https://huggingface.co) is a company and open-source platform known for its tools and libraries for natural language processing (NLP). It provides easy-to-use APIs and pre-trained models for tasks such as text generation, sentiment analysis, and translation. Hugging Face's Transformers library has become a standard in the NLP community for deploying state-of-the-art models like BERT, T5, an Meta's LLaMa.

To get startet you have to sign up on HF and get an access token: https://huggingface.co/join

The HF token should be set as a secret in Colab.

In [None]:
### This line is only relevant if the notebook is running on your own computer and not on Colab
access_token "..." #Enter your access token here

### 2.1 Quickstart

OLMo is a series of actually open Language Models designed to enable the science of language models: https://huggingface.co/allenai/OLMo-7B-hf

In [None]:
# Use a pipeline as a high-level helper
from transformers import pipeline

In [None]:
olmo_pipe = pipeline("text-generation", model="allenai/OLMo-7B-hf")

In [None]:
print(olmo_pipe("Are you an open source Large Language Model?"))

In [None]:
# Load model directly
from transformers import AutoModelForCausalLM, AutoTokenizer

In [None]:
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-hf") # Loading model
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-7B-hf") # Loading tokenizer

In [None]:
message = ["Are you an opensource Large Language Model?"]
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
response = olmo.generate(**inputs, max_new_tokens=512, do_sample=True, top_k=50, top_p=0.95)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])

### 2.2 Why quantization matters

Assuming model weights are stored in 32-bit float format:
1 model paramater = 4 bytes
1 billion paramaters = 4 x 1,000,000,000 bytes = 4GB (not even counting optimizer, gradient and activation info)
Many cutting edge models (Llama, GPT4) easily break 100 billion trainable params 🤯

Let's calculate the size of olmo and the memory it needs for inferences!

In [None]:
import torch

In [None]:
def get_model_param_size(model):
    # Calculate the total number of parameters
    total_params = sum(p.numel() for p in model.parameters())
    
    # Convert parameters to a more readable format (e.g., million parameters)
    total_params_millions = total_params / 1e6
    
    print(f"""The model has {total_params_millions} million parameters.""")
    
    return total_params_millions

In [None]:
def estimate_gpu_memory(model):
    # Assuming float32 precision (4 bytes per parameter)
    param_memory = sum(p.numel() * 4 for p in model.parameters())
    
    # Adding a bit of overhead for model structure and intermediate computations
    total_memory = param_memory * 1.2
    
    # Convert bytes to gigabytes
    total_memory_gb = total_memory / (1024**3)
    
    print(f"""The model requires {total_memory_gb} GB of (GPU-)memory.""")
    
    return total_memory_gb

In [None]:
olmo_params = get_model_param_size(olmo)

In [None]:
olmo_gpu_needs = estimate_gpu_memory(olmo)

Solution for the memory problem: **Quantization**

Easiest option: huggingface's [bitsandbytes](https://huggingface.co/docs/transformers/main/en/quantization/bitsandbytes)

**Note**: Despite the use of bitsandbytes, it may be necessary to restart the runtime after using a model and before using a new one due to the size of the available RAMs and GPUs in Colab. To check the memory usage, click on resources. 

In [None]:
from transformers import BitsAndBytesConfig

In [None]:
quantization_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.bfloat16
    )

In [None]:
olmo_quant = AutoModelForCausalLM.from_pretrained(
    "allenai/OLMo-7B-hf", 
    device_map="auto",
    quantization_config=quantization_config
)

In [None]:
olmo_quant_gpu_needs = estimate_gpu_memory(olmo_quant)
print(olmo_quant_gpu_needs)

In [None]:
def run_olmo(model, tokenizer, question, temperature):
    
     generate_text = pipeline(
         model=model,
         tokenizer=tokenizer,
         return_full_text=False,  
         do_sample=True,
         task="text-generation",
         max_new_tokens=128,
         temperature=temperature
     )
        
    output = generate_text(question)
    result = output[0]["generated_text"]
    print(result)

In [None]:
question = "..." #Fill in your question here

In [None]:
###Fill out to run model with your question

### 2.3 Prompt Engineering Playground: Warum Up

In this section, you will improve your skills as a prompt engineer. The task is to extract all references to persons from a text excerpt from Hartmann von Aue's Erec.

Choose one of the following models and complete the code to extract all persons in the text passage:

- Llama 3: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
- Vago Solutions: https://huggingface.co/VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct

To perform this task, you might have to restart the runtime to clear the RAM and GPU memory!

In [None]:
#Only run this line if you did not import the huggingface libraries before
from transformers import pipline
from transformers import AutoModelForCausalLM, AutoTokenizer

In [None]:
#Google Drive HF-Secret
from google.colab import userdata
access_token = userdata.get('HF_Token')

In [None]:
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score

In [None]:
#Text snippet from Erec
text = """
nâch der küneginne sage
westen die guoten knehte
alle vil rehte
die zît wenne er solde komen:
ouch hâten siz vernomen
von dem ritter der dâ kam,
an dem er den sige nam.
diu ros wâren in bereit.
dô genôz er sîner vrümekeit.
mit dem künege Artûse
riten von dem hûse
Gâwein und Persevâus
und ein hêrre genant alsus,
der künec Iels von Gâlôes,
und Estorz fil roi Ares,
Lucâns der schenke schein in der schar,
dar zuo diu massenîe gar,
daz sin empfiengen alle
mit ritterlîchem schalle,
geselleclîchen unde wol,
als man lieben vriunt sol
der verlorner vunden ist.
gegen im was zer selben vrist
über den hof gegangen,
daz er würde empfangen,
mîn vrouwe diu künegîn.
si hiez in willekomen sîn:
sîner âventiure was si vrô.
vrouwen Ênîten nam si dô,
si sprach: "vrou maget wol getân,
"""

In [None]:
quantization_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.bfloat16
    )

In [None]:
model_id = ... #Fill in the name of the choosen model here
tokenizer = AutoTokenizer.from_pretrained(model_id, token=access_token, use_fast=True)
model_16bit = AutoModelForCausalLM.from_pretrained(model_id, token=access_token, device_map="auto", quantization_config=quantization_config)

In [None]:
def run_llm(model, tokenizer, text, temperature):
    
    snippet = str(text)
    
    generate_text = pipeline(
             #Complete the function
        )
    
    #Complete this prompt
    prompt = f"""

    Text snippet: {snippet}
    """
    
    temp = prompt.format(snippet=snippet)
    output = generate_text(temp)
    result = [output[0]["generated_text"]]
    return result

In [None]:
results = run_llm() #Complete the function

In [None]:
print(result)

In [None]:
solution = ["""Ginovêr,
# unbestimmt (multi), 
Idêrs,
Artûs, 
Gâwein,
Persevâus,
Iels von Gâlôes,
Estorz,
Enîte"""]

In [None]:
### Helper Function to convert a list of names separated by commas into a list of individual names
def split_names(names_list):
    # Gehe durch die Liste von Strings und splitte sie an jedem Komma
    split_list = [name.strip() for name in names_list[0].split(',')]
    return split_list

In [None]:
### Function for evaluating the results
def evaluate_performance(predictions_list, actuals_list):
    
    prediction_results = []
    for item in predictions_list:
        if any(string in item for string in actuals_list):
            prediction_results.append(True)
        else:
            prediction_results.append(False)

    actual_results = [True] * len(actuals_list)

    if len(prediction_results) < len(actual_results):
        prediction_results.extend([False] * (len(actual_results) - len(prediction_results)))
    if len(actual_results) < len(prediction_results):
        actual_results.extend([False] * (len(prediction_results) - len(actual_results)))
    
    precision = precision_score(actual_results, prediction_results)
    recall = recall_score(actual_results, prediction_results)
    f1 = f1_score(actual_results, prediction_results)
    accuracy = accuracy_score(actual_results, prediction_results)

    # Ausgabe der Metriken
    print(f"Precision: {precision:.2f}")
    print(f"Recall: {recall:.2f}")
    print(f"F1-Score: {f1:.2f}")
    print(f"Accuracy: {accuracy:.2f}")

In [None]:
###Run helper function if necessary
results = ###

In [None]:
evaluate_performance(results, solution)

## 3. Proprietary LLM: Anthropic's claude

Claude is an AI assistant developed by Anthropic. Anthropic is an AI startup company based in San Francisco, California. Anthropic is still headed up by founders but got heavy funding by Google and Amazon: [Anthropic Homepage](https://www.anthropic.com)

In [None]:
pip install anthropic

In [None]:
import anthropic
import os

In [None]:
#Colab Anthropic Secret
ANTHROPIC_API_KEY = userdata.get('secretName')

In [None]:
#Load Anthropic Client
client = anthropic.Anthropic(
    api_key = ANTHROPIC_API_KEY
)

#### Prompt Function 1

In [None]:
def get_completion(text):
    message = client.messages.create(
    model="claude-3-5-sonnet-20240620",
    max_tokens=1000,
    temperature=0,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": f"""ENTER YOUR PROMPT HERE {text}"""
                }
            ]
        }
    ])
    return message.content[0].text 

#### Prompt Function 2

In [None]:
def get_completion_ant(prompt):
    message = client.messages.create(
        model="claude-3-5-sonnet-20240620",
        max_tokens=1024,
        messages=[
            {"role": "user", "content": prompt}
        ])
    return message.content[0].text  

### 3.1 Case Study 1: Figure-related Named Entities Recognition

In the following, the task from section 2.3 should be solved again, but with a different model (claude-sonnet-3-5) on a larger data set. The goal is to extract all figure- or person-related named entities from Hartmann von Aue's Erec.
Before creating or revising your prompt, familiarize yourself with the data frame to be analyzed, which contains the text sections from the Erec.

In [None]:
import pandas as pd
import torch
import ast

In [None]:
#Google Drive Access
from google.colab import drive
drive.mount('/content/drive')

In [None]:
#Load Erec Dataframe
df = pd.read_json('Erec_GoldAnno.json', orient='records', lines=True)

In [None]:
#Checking the length of the dataframe
len(df)

In [None]:
#Checking the structe of the dataframe
df.head()

In [None]:
def extract_NERs(dataframe):
    NERs_final = []
    combined_gold_tags = []
    for index, row in dataframe.iterrows():
        snippet = row['Text']
        text = str(snippet)
        gold_tag = (row['NERs'])
        
        #Complete the prompt
        prompt = f"""
       
        Middle High German text: {text}
        Named Entities:
        """
        
        
        output =  #Complete this line of code. The output should be a list of lists such as [['Figure A', 'Figure B']['Figure A', 'figure C', 'Figure X']]
        print(output)
        
        NERs_final.append(output)
        combined_gold_tags.append(gold_tag)
    
    return NERs_final, combined_gold_tags

In [None]:
#Helper function to convert a pseudo list of lists, such as ["['Figure 1', 'Figure 2']"], into an adequate Python list
def convert_sublists(list_of_lists):
    converted_list = []
    for sublist in list_of_lists:
        converted_list.extend([ast.literal_eval(item) for item in sublist])
    return converted_list

In [None]:
def calculate_metrics(goldstandard, generated):
    """
    Calculates F1, Precision, Recall and Accuracy for two lists of lists.

    :param goldstandard: List of lists containing the gold standard data.
    :param generated: List of lists containing the generated data.
    :return: A dictionary containing the calculated metrics.
    """
    total_tp = 0
    total_fp = 0
    total_fn = 0
    total_elements = 0

    for i in range(len(goldstandard)):
        gold_sublist = set(goldstandard[i])
        if i < len(generated):
            generated_sublist = set(generated[i])
        else:
            generated_sublist = set()

        tp = gold_sublist.intersection(generated_sublist)
        fp = generated_sublist - gold_sublist
        fn = gold_sublist - generated_sublist

        total_tp += len(tp)
        total_fp += len(fp)
        total_fn += len(fn)
        total_elements += len(gold_sublist)

    precision = total_tp / (total_tp + total_fp) if (total_tp + total_fp) > 0 else 0
    recall = total_tp / (total_tp + total_fn) if (total_tp + total_fn) > 0 else 0
    f1 = (2 * precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
    accuracy = total_tp / total_elements if total_elements > 0 else 0

    return {
        'F1': f1,
        'Precision': precision,
        'Recall': recall,
        'Accuracy': accuracy
    }

In [None]:
#Extract Named Entites from dataframe

In [None]:
#Convert list of lists if necessary

In [None]:
#Evaluate your output

### Optional: Figure-related Named Entities Recognition with a HuggingFace Model

Below you can check whether you get better scores with a HuggingFace model than with the Anthropic model. This task is optional.

In [None]:
import torch

In [None]:
#Only run the following lines if you did not import the huggingface libraries before
from transformers import pipeline
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import BitsAndBytesConfig

In [None]:
quantization_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.bfloat16
    )

In [None]:
#Load model
#Load tokenizer

In [None]:
def run_llm(model, tokenizer, prompt, temperature):
    generate_text = pipeline(
    #Complete this function
    )
    output = generate_text(prompt)
    result = output[0]["generated_text"]
    return result

In [None]:
#Function that iterates over a column of a data frame and extracts the NERs using an LLM
def extract_NERs(model, tokenizer, dataframe, temperature):
    NERs_final = []
    combined_gold_tags = []
    
    for index, row in dataframe.iterrows():
        snippet = row['Text']
        gold_tag = (row['NERs'])
        
        template = f"""
        COMPLETE THIS PROMPT
        {snippet}
        """
        
        prompt = template.format(snippet=snippet)
        result = run_llm(model, tokenizer, prompt, temperature)
        #print(result) -> Comment this line if you want to see the output
        NERs_final.append(result)
        combined_gold_tags.append(gold_tag)
    
    combined_outputs = zip(NERs_final, combined_gold_tags)
    return combined_outputs

In [None]:
combined_outputs = #Complete

In [None]:
evaluation = #Complete this function