### CaseHOLD Llama labeling

In this notebook, we run a labeling procedure on the test set of the CaseHOLD dataset. The dataset uses a "citing prompt", and contains five multiple choice answer options. We ask the LLM to read over the prompt and options, then select the best one.

We test three different variants of the Llama LLMs. We record the responses for each model for further analysis elsewhere.

In [None]:
import pandas as pd
from datasets import load_dataset
from langchain_aws import ChatBedrockConverse
from boto3 import client
from botocore.config import Config

config = Config(read_timeout=1000)

client = client(service_name='bedrock-runtime',
                      config=config, region_name="us-east-1")

ds = load_dataset("casehold/casehold", "all")

In [None]:
def create_prompt(r):
    return """
    Task: Legal Holding Identification
    
    Context: You are analyzing a legal text to identify the most appropriate legal holding. A legal holding is the court's determination of a matter of law based on the facts of a particular case.
    
    Input Text: "{citing_prompt}"

    Question: Based on the legal context above, which of the following holdings best completes the text where the <HOLDING> tag appears? Consider:
    - The specific legal issue being discussed
    - The logical flow of the legal argument 
    - The precedential value implied by the context

    Options:
    A: {holding_0}
    B: {holding_1}
    C: {holding_2}
    D: {holding_3}
    E: {holding_4}

    Instructions:
    1. Analyze the context and legal reasoning in the input text
    2. Consider how each option would fit within the legal argument
    3. Evaluate which option best maintains the logical flow. Explain your reasoning first, formatted like this <reasoning> reasoning </reasoning>
    4. Provide your final answer in the format: ANSWER: X (where X is A, B, C, D, or E)""".format(
        citing_prompt = r['citing_prompt'], holding_0 = r['holding_0'], holding_1 = r['holding_1'], 
        holding_2 = r['holding_2'], holding_3 = r['holding_3'], holding_4 = r['holding_4'])

In [None]:
test = ds['test'].to_pandas()
test['label'] = test['label'].astype(float)
test['final_prompt'] = test.apply(create_prompt, axis = 1)

In [None]:
test

In [None]:
test['response_llama_3b'] = ''
test['response_llama_11b'] = ''
test['response_llama_90b'] = ''


In [None]:
llm_llama32_3b = ChatBedrockConverse(model="us.meta.llama3-2-3b-instruct-v1:0", region_name="us-east-1", temperature = 0, client = client)
llm_llama32_11b = ChatBedrockConverse(model="us.meta.llama3-2-11b-instruct-v1:0", region_name="us-east-1", temperature = 0, client = client)
llm_llama32_90b = ChatBedrockConverse(model="us.meta.llama3-2-90b-instruct-v1:0", region_name="us-east-1", temperature = 0, client = client)

In [None]:
for i in range(0, test.shape[0]):
    
    print(i)
    row = test.loc[i]
    messages = [
        ("user", row['final_prompt'])
    ]

    e = llm_llama32_3b.invoke(messages)
    test.loc[i, 'response_llama_3b'] = e.content

    f = llm_llama32_11b.invoke(messages)
    test.loc[i, 'response_llama_11b'] = f.content

    g = llm_llama32_90b.invoke(messages)
    test.loc[i, 'response_llama_90b'] = g.content

In [None]:
test.to_csv('casehold_test_llama_labels.csv')