# Testing the Prediction Model Pipeline
This notebook tests the `ModelPredictor` class which uses predefined system prompts to generate answers for different scenarios. Below are the steps to use this notebook:

## Setup
First, make sure that all required libraries are installed and import the necessary modules. You will also need to ensure that the prediction model and system prompts scripts are available as modules.

In [1]:
import json
from prediction_model import ModelPredictor
from system_prompts import SystemPrompts
from datetime import datetime
prompts_manager = SystemPrompts()


## Load Data
Load the data which contains various scenarios and questions that will be processed by the prediction model.

In [2]:
with open('../data/processed/extracted_data.json', 'r') as file:
    files = json.load(file)

for data in files:
    print(data)

{'file': '202122-cnl-midterm-marcela.pdf', 'scenario': 'Twenty-eight year old Marcela is a streamer who has built a sizeable fan base (of around\n500,000 subscribers) on the augmented reality (AR) streaming platform “Witch”. The\nplatform was founded 6 years ago and is based in Singapore. Its main selling point is a\nrange of AR ﬁlters called — “perspectives” or “paxes” by the community — which let\nstreamers superimpose hyper-realistic 3D graphics over real world video. These can be\nactivated at the click of a button, and streamers generally do not concern themselves with\nhow they work.\nThe platform rose to fame because of its dedication towards helping its creators make con-\ntent (and money) “like magic”. Since the beginning, Witch maintained a “creator ﬁrst”\npolicy which very publicly declares that all subscription fees “go directly in the ﬁrst in-\nstance to our content creators; we only take a modest amount to keep the platform run-\nning”. This modest amount is 10% of the su

## 

## Setup Pipeline

In [3]:
# Retrieve and print a system prompt using the SystemPrompts class
prompt_key = 'ans_tort_qns'
sys_prompt = prompts_manager.retrieve(prompt_key)
print(sys_prompt)

You are a Singapore lawyer specialized in the Law of Torts.

Task: Analyze provided hypothetical scenarios under Singapore Tort Law, addressing specific questions with detailed legal principles and relevant case law.

Instructions:

1. Read and understand the scenario provided.
2. Identify all legal issues related to the question.
3. Answer using the 'Issue', 'Rule', 'Application', 'Conclusion' structure:
a. **Issue**: Clearly state the legal issue(s) identified.
b. **Rule**: Define and explain the relevant legal principles, statutes, and case law that apply to the issue.
c. **Application**: Apply the legal principles to the specific facts of the scenario, analyzing how the rule affects the case.
d. **Conclusion**: Provide a reasoned conclusion based on the application of the rule to the facts.
4. Justify answers with references to relevant statutes and cases.
5. Organize the answer clearly, using headings or bullet points as needed.

Example:

**Issue**: Did the defendant owe a duty o

In [4]:
# Load settings from settings.json
with open('settings.json', 'r') as settings_file:
    settings = json.load(settings_file)

model_name = settings["model"]
# model_name = 'gpt-4-turbo'
print(model_name)

predictor = ModelPredictor(
    system_prompt=sys_prompt,
    model_name = model_name
)

Meta-Llama-3-8B-Instruct-GGUF


## Generate Answers
This function will use the `ModelPredictor` to generate answers for the loaded questions multiple times, based on the epochs specified.

In [5]:
def generate_answers(data, epoch):

    # Iterate over each file in the JSON data
    scenario = data['scenario']
    questions = data['questions']
    answers = {}

    # For each question, prepare the prompt and get the model's prediction multiple times
    for i, question in enumerate(questions):
        question_key = f"question_{i + 1}"
        answers[question_key] = []

        for _ in range(epoch):
            prompt = f"Scenario: {scenario}\nQuestion: {question}\n\nMy answer is:"
            answer = predictor.predict(prompt)
            print(answer)
            answers[question_key].append(answer)

    # Add the answers to the data
    data['answers'] = answers

    return data

## Run Prediction
Execute the prediction process and save the updated data with the generated answers.

In [6]:
for data in files:
    updated_data = generate_answers(data, epoch=3)

    name = data['file'].split(".")[0]

    output_filename = f'../results/{name}_{model_name}.json'
    with open(output_filename, 'w') as file:
        json.dump(updated_data, file, indent=4)

    print(f"Processing complete. The updated data is saved in '{output_filename}'.")

**Issue**: Does Tommy have any tort claims against Aisya and Kwan (A&K)?

**Rule**: Under Singapore law, a person can only be liable for damages if they owe a duty of care to the claimant. The duty of care arises when there is a foreseeable risk of harm to the claimant.

**Application**: In this scenario, Tommy claims that A&K failed to take reasonable care to prevent him from tripping over an electrical wire and breaking his camera. However, it was SI who left the construction site gate open, allowing Tommy to enter the property. A&K did not have any control or responsibility for the construction site at the time of the incident.

**Conclusion**: Therefore, Tommy does not have a tort claim against Aisya and Kwan (A&K). The duty of care lies with SI, who failed to take reasonable care to prevent Tommy from entering the property. Tommy should ask SI to compensate him for his losses, as they are the ones responsible for the incident.

Reference: Chan Hiang Leng Colin v Minister for the E

## Manage System Prompt

In [7]:
# Store a new prompt 
prompt_key = 'test_123'
new_prompt_text = """
testing1232412432
"""
prompts_manager.store(prompt_key, new_prompt_text)
print('New prompt stored successfully.')

New prompt stored successfully.


In [8]:
# Delete a prompt
prompt_key = 'test_123'

prompts_manager.remove(prompt_key)
print('Prompt removed successfully.')

Prompt removed successfully.


## Evaluation

In [1]:
import os
import pandas as pd
import json
from eval import SummaryChecker 
checker = SummaryChecker("MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli")

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
def process_json_files(folder_path, output_csv_path):
    results_list = []
    for filename in os.listdir(folder_path):
        if filename.endswith('.json'):
            model_name = filename.split('_')[1].replace('.json', '')
            with open(os.path.join(folder_path, filename), 'r') as file:
                data = json.load(file)
                source_document = data["scenario"] + " " + " ".join(data["questions"])
                summary = " ".join([data["answers"][f"question_{i+1}"][0] for i in range(len(data["questions"]))])
                
                results = checker.evaluate_summary(source_document, summary)
                results_list.append({
                    "ModelName": model_name,
                    "AvgContradictionScore": results["Average Contradiction Score"],
                    "Coverage_Perc": results["Coverage Percentage"]
                })
    df = pd.DataFrame(results_list)
    df.to_csv(output_csv_path, index=False)
    return df

In [5]:
folder_path = "../results"
output_path = "../results/eval/results.csv"
process_json_files(folder_path, output_path)

Entailment: 1.04%, Neutral: 98.86%, Contradiction: 0.10%
Entailment: 1.27%, Neutral: 96.45%, Contradiction: 2.28%
Entailment: 0.31%, Neutral: 99.38%, Contradiction: 0.31%
Entailment: 0.67%, Neutral: 97.96%, Contradiction: 1.38%
Entailment: 0.69%, Neutral: 98.96%, Contradiction: 0.36%
Entailment: 0.65%, Neutral: 98.29%, Contradiction: 1.06%
Entailment: 0.02%, Neutral: 99.94%, Contradiction: 0.04%
Entailment: 0.50%, Neutral: 99.01%, Contradiction: 0.49%
Entailment: 0.07%, Neutral: 99.56%, Contradiction: 0.37%
Entailment: 0.69%, Neutral: 97.18%, Contradiction: 2.13%
Entailment: 0.41%, Neutral: 26.13%, Contradiction: 73.46%
Entailment: 0.09%, Neutral: 99.86%, Contradiction: 0.05%
Entailment: 0.06%, Neutral: 99.93%, Contradiction: 0.01%
Entailment: 0.21%, Neutral: 99.71%, Contradiction: 0.07%
Entailment: 2.81%, Neutral: 96.95%, Contradiction: 0.24%
Entailment: 1.91%, Neutral: 97.99%, Contradiction: 0.10%
Entailment: 0.21%, Neutral: 99.77%, Contradiction: 0.02%
Entailment: 0.33%, Neutral: 99

Unnamed: 0,ModelName,AvgContradictionScore,Coverage_Perc
0,gemma-2-27b-it-GGUF,4.81628,87.234043
1,gpt-4-turbo,3.430307,35.483871
2,llama-3,7.039984,20.967742
3,Meta-Llama-3-8B-Instruct-GGUF,5.105873,10.638298
4,Mistral-Nemo-Instruct-2407,9.099237,8.510638
