# Validation of Extracted Topics
This notebook validates the accuracy of topics extracted from the deposition transcript. It uses a reasoning LLM to confirm correctness.

This is the shorter validation notebook to demonstrate just 10 topics.

In [1]:
# imports
import json
import random
import os
import re
from PyPDF2 import PdfReader
from ollama import chat, ChatResponse

In [2]:
# topics loader
def load_extracted_topics(file_path):
    with open(file_path, "r") as f:
        return json.load(f)

# topic file path
topics_file = "../outputs/extracted_topics.json"
extracted_topics = load_extracted_topics(topics_file)

In [3]:
# NUMBER OF TOPICS TO BE VALIDATED
# number_of_topics = int(input("Enter the number of random topics to validate: "))
number_of_topics = 10  # testing default

In [4]:
# random topics are picked from pool of extracted topics
def select_random_topics(extracted_topics, num_samples):
    all_topics = []
    for page, topics in extracted_topics.items():
        for topic in topics:
            all_topics.append({"page": page, **topic})
    return random.sample(all_topics, num_samples)

random_topics = select_random_topics(extracted_topics, number_of_topics)
print("Selected Topics:\n")
for topic in random_topics:
    print(f"{random_topics.index(topic)+1}. {topic['topic']}, Page: {topic['page_start']}")

Selected Topics:

1. Answer - General Consumer Protections, Page: 17
2. Mr. Purcell's Clarification, Page: 83
3. ITT, Page: 16
4. Attorney's Request to Answer, Page: 8
5. Settlement Agreement, Page: 45
6. Question Regarding Vervent's Cessation of PEAKS Loan Servicing (Repeat), Page: 79
7. Definition of Servicing Loans, Page: 80
8. Retention Rates at ITT, Page: 30
9. FedLoan Servicing to Mohela Packaging, Page: 21
10. Witness's Opinion - Unenforceability, Page: 83


In [5]:
# transcript loader
def load_transcript(file_path):
    reader = PdfReader(file_path)
    return [page.extract_text() for page in reader.pages]

# path to transcript
transcript_file = "../inputs/deposition.pdf"
transcript_pages = load_transcript(transcript_file)

# this function gets a 1000 character text excerpt from the transcript to provide context to LLM
def get_text_excerpt(page_number, line_start, transcript_pages, char_context=500):
    page_text = transcript_pages[page_number - 1]
    lines = page_text.split("\n")
    if 0 <= line_start - 1 < len(lines):
        target_line = lines[line_start - 1]
        target_index = page_text.find(target_line)
        start_index = max(0, target_index - char_context)   # 500 chars before target line
        end_index = min(len(page_text), target_index + len(target_line) + char_context) # 500 chars after target line
        return page_text[start_index:end_index]
    return None

# excerpts attached to topics
for topic in random_topics:
    topic["text_excerpt"] = get_text_excerpt(
        int(topic["page_start"]), int(topic["line_start"]), transcript_pages
    )

# Validation
Validation is done using the reasoning capabilities of DeepSeek R1 (8b).

The way R1 output with OLlama is structured is as follows:

```
<think>
Reasoning goes here.
</think>
Output goes here.
```

Therefore, we can display the reasoning alongside the output, and get rid of everything between the `<think>` tags to get our final Yes/No outputs. With those, we can calculate our accuracy.

In [6]:
# VALIDATION WITH REASONING
def validate_topic_with_llm(topic):
    prompt = f"""
    The following is an excerpt from a deposition transcript file:
    "{topic['text_excerpt']}"

    The extracted topic is: "{topic['topic']}".
    Is the topic relevant to the excerpt?
    Provide either "Yes." or "No." and nothing else. Do not use formatting.
    """
    try:
        response: ChatResponse = chat(
            model="deepseek-r1",
            messages=[{"role": "user", "content": prompt}],
        )
        raw_response = response.message.content.strip()
        print(f"Topic: {topic['topic']}\n\nExcerpt:\n{topic['text_excerpt']}\n\nResponse:\n{raw_response}\n\n---------------")
        
        # We remove everything inside <think> tags
        cleaned_response = re.sub(r"<think>.*?</think>", "", raw_response, flags=re.DOTALL).strip()
        return cleaned_response
    except Exception as e:
        return f"Error: {e}"

# iterate and validate
for topic in random_topics:
    if "llm_response" in topic:
        continue
    print(f"\n\nTopic {random_topics.index(topic)+1}/{len(random_topics)}:\n")
    topic["llm_response"] = validate_topic_with_llm(topic)



Topic 1/10:

Topic: Answer - General Consumer Protections

Excerpt:
   regulations; they're not subject to notice and comment,      01:29
8   but I have provided input and have reviewed those            01:29
9   contracts.                                                   01:29
10       Q    And what were the nature of the comments that you   01:29
11   had?                                                         01:29
12       A    The nature of the comments that we submitted on     01:29
13   the proposal were with regards to -- I mean, in -- the       01:29
14   overarching comments were to ensure that general consumer    01:29
15   protections were provided in these contracts to ensure       01:29
16   that there were adequate protections for student loan        01:30
17   borrowers to -- so that -- when the laws were violated,      01:30
18   that those borrowers were able to seek redress for -- for    01:30
19   those violations; that there were adequate -- that -- that   01:3

# Results

### Accuracy

In [7]:
# accuracy calculation
def calculate_accuracy(random_topics):
    correct_count = sum(1 for topic in random_topics if topic["llm_response"] == "Yes.")
    accuracy = (correct_count / len(random_topics)) * 100   # percentage of correctly identified topics
    
    print(f"{len(random_topics)} random topics selected for validation.")
    print(f"Of those, {correct_count} were evaluated as accurate to the context.")
    print(f"\nValidation Accuracy: {accuracy:.2f}%")

calculate_accuracy(random_topics)

10 random topics selected for validation.
Of those, 10 were evaluated as accurate to the context.

Validation Accuracy: 100.00%


### Inaccurate Topics

In [8]:
def print_inaccurate(random_topics):
    for topic in random_topics:
        if topic["llm_response"] == "No.":
            print(f"Topic {random_topics.index(topic)+1}: {topic['topic']}, Page: {topic['page_start']}")

print_inaccurate(random_topics)