In this section, I wanted to use the API, but due to the token limitations I had and the need for debugging, I preferred to use the offline model. I chose the flan-t5-large model based on the research and resources I had available.

Loading model

In [2]:
from transformers import T5ForConditionalGeneration, T5Tokenizer

model_name = "google/flan-t5-large"
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)


You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


Load Dataset

In [3]:
from datasets import load_dataset

dataset = load_dataset("dzunggg/legal-qa-v1")

train_data = dataset["train"]
# print(train_data[0])

train_data

Dataset({
    features: ['question', 'answer'],
    num_rows: 3742
})

In this step we need to add a classification label to each question in the dataset
To do this we first convert the entire text to lowercase and then assign the corresponding category using keyword-based matching.

Categorizing Questions

In [4]:
categories = {
    'contract_law': ['contract', 'agreement', 'breach', 'deal', 'obligation', 'party', 'terms', 'binding', 'enforce', 'offer', 'acceptance', 'consideration', 'void', 'unilateral'],
    'criminal_law': ['crime', 'criminal', 'theft', 'assault', 'murder', 'felony', 'misdemeanor', 'arrest', 'probable cause', 'self-defense', 'bail', 'plea', 'intent', 'innocence'],
    'intellectual_property': ['patent', 'copyright', 'trademark', 'ip', 'intellectual', 'infringement', 'license', 'fair use', 'trade secret', 'plagiarism'],
    'family_law': ['divorce', 'custody', 'marriage', 'child', 'alimony', 'adoption', 'spouse', 'guardianship', 'prenuptial', 'domestic violence', 'support'],
    'other': []  
}

def categorize(question):
    question_lower = question.lower()
    for cat, keywords in categories.items():
        if any(kw in question_lower for kw in keywords):
            return cat
    return 'other'

train_data = train_data.map(lambda x: {'category': categorize(x['question'])})

train_data[0:5]

{'question': ['Q: I was wondering if a pain management office is acting illegally/did an illegal action.. I was discharged as a patient from a pain management office after them telling me that a previous pain management specialist I saw administered a steroid shot wrong and I told them in the portal that I spoke to lawyers for advice but no lawsuit/case was created. It was maybe 1-2 months after I was discharged that I no longer have access to my patient portal with them. Every time I try to login I enter my credentials, wait a few seconds, and then I get re-directed back to the original screen where I have various options to login. I know I can speak to the office directly and ask them about what specifically is going on, talk to other lawyers if this is a violation of my rights, etc. but I was just wondering if anyone on this site would know if this action is in fact illegal. ',
  'Q: My houseboat was moved while I was at work and was asked to leave the property after 16 months witho

In this step, we converted all the questions in the dataset into numerical vectors (embeddings) using a small and fast model. This means every question is now represented as a series of numbers that capture its meaning.

Then we stored all these vectors inside a very fast database called **FAISS**, so when the user asks a question, we can find 3–4 similar questions from the dataset in just a fraction of a second.

Finally, we saved three important files that the chatbot will use later:

* **legalqa_faiss.index** → the fast search database
* **legalqa_embeddings.npy** → all the generated vectors
* **legalqa_with_category.parquet** → the original dataset + the categories we added earlier

With this setup, whenever the user asks something, the chatbot first finds a few similar real questions and answers from the dataset and shows them to the model, so the final answer becomes much more accurate and reliable.
That's it!


In [5]:
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

print("Loading embedder...")
embedder = SentenceTransformer('all-MiniLM-L6-v2')

print("Creating embeddings for questions...")
questions = train_data['question']
embeddings = embedder.encode(questions, batch_size=32, show_progress_bar=True)

print("Building FAISS index...")
dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(np.array(embeddings))

print("Saving FAISS index...")
faiss.write_index(index, "legalqa_faiss.index")

print("Saving embeddings...")
np.save("legalqa_embeddings.npy", embeddings)

print("Saving processed dataset with categories...")
train_data.to_parquet("legalqa_with_category.parquet")

print("Preprocessing completed!")
print("Files created:")
print("   - legalqa_faiss.index")
print("   - legalqa_embeddings.npy")
print("   - legalqa_with_category.parquet")

Loading embedder...
Creating embeddings for questions...


Batches:   0%|          | 0/117 [00:00<?, ?it/s]

Building FAISS index...
Saving FAISS index...
Saving embeddings...
Saving processed dataset with categories...


Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

Preprocessing completed!
Files created:
   - legalqa_faiss.index
   - legalqa_embeddings.npy
   - legalqa_with_category.parquet


In this step, we reloaded all the files we created during the preprocessing stage so the chatbot can use them:

In [6]:
from datasets import Dataset
import faiss
import numpy as np
from sentence_transformers import SentenceTransformer
from transformers import T5ForConditionalGeneration, T5Tokenizer

print("Loading processed dataset...")
train_data = Dataset.from_parquet("legalqa_with_category.parquet")

print("Loading FAISS index...")
index = faiss.read_index("legalqa_faiss.index")

print("Loading embeddings...")
embeddings = np.load("legalqa_embeddings.npy")

print("Loading embedder model...")
embedder = SentenceTransformer('all-MiniLM-L6-v2')

print("Loading FLAN-T5 model and tokenizer...")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-large")
tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-large")

print("All files loaded successfully!")

Loading processed dataset...


Generating train split: 0 examples [00:00, ? examples/s]

Loading FAISS index...
Loading embeddings...
Loading embedder model...
Loading FLAN-T5 model and tokenizer...
All files loaded successfully!


In this section, we write a function that identifies the category of the user's question.

In [7]:
import nltk
from nltk.stem import WordNetLemmatizer
nltk.download('wordnet')
lemmatizer = WordNetLemmatizer()

def detect_category(question):
    question_lower = ' '.join(lemmatizer.lemmatize(word) for word in question.lower().split())
    for cat, keywords in categories.items():
        lemmatized_keywords = [lemmatizer.lemmatize(kw) for kw in keywords]
        if any(kw in question_lower for kw in lemmatized_keywords):
            return cat
    return 'other'

print("Test Recognizing Categories")
print(detect_category("What is a breach of contract?"))         
print(detect_category("Can I get divorced without a lawyer?"))   
print(detect_category("Hello, how are you today?"))

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Asus\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


Test Recognizing Categories
contract_law
family_law
other


The `retrieve_similar` function was implemented for retrieving similar questions.
This function converts the user's question into an embedding vector and, using the FAISS index, returns the 3–4 closest questions from the dataset (and if a category is selected, only from within that category).

In [8]:
def retrieve_similar(sual, k=8, dastbandi=None):
    query_embedding = embedder.encode([sual])
    
    if dastbandi and dastbandi != 'none' and dastbandi in categories:
        indices_in_category = [i for i, cat in enumerate(train_data['category']) if cat == dastbandi]
        if len(indices_in_category) == 0:
            return []
        subset_embeddings = embeddings[indices_in_category]
        subset_index = faiss.IndexFlatL2(subset_embeddings.shape[1])
        subset_index.add(subset_embeddings)
        D, I = subset_index.search(query_embedding, k)
        retrieved_indices = [indices_in_category[idx] for idx in I[0] if idx >= 0]
    else:
        D, I = index.search(query_embedding, k)
        retrieved_indices = [int(idx) for idx in I[0] if idx >= 0 and D[0][list(I[0]).index(idx)] < 0.5]
    
    natije = []
    for idx in retrieved_indices:
        natije.append({
            'question': train_data[idx]['question'],
            'answer'  : train_data[idx]['answer'],
            'category': train_data[idx]['category']
        })
    return natije

for i, r in enumerate(retrieve_similar("What happens if I breach a contract?", k=3, dastbandi="contract_law"), 1):
    print(f"{i}. {r['question']}")

1. What Is Breach of Contract?
2. Q: Hi. I recently signed a co tract with a mktg company and before I did I was told something else. Can I get out if it. The person for the company told me something v different than was in the contract. 
3. Q: Did my employer breach my employee agreement by failing to give 30-day prior notice?. I'm a salaried, contract employee. My agreement was for four years, with an optional fifth year. That fifth year was at the sole discretion of the company and they would provide "30 days' prior written notice (e-mail sufficing)". I received notice they would pick up that final year 12 days before the contract was set to expire. Is that last year now void? Can I request a new agreement be drawn up? Do I become an at-will employee of I continue with the company (without signing anything new?). What options do I have here; they violated the agreement, correct? Thank you. 


This function checks whether a question is valid for the legal chatbot or not. Its logic involves several steps:

Question Length
If the question is shorter than 4 characters, it is considered invalid.

Presence of Vowel Letters (aeiou)
To avoid meaningless inputs like "ghjkl", if the question contains no vowel letters, it is considered invalid.

Checking for Legal-Related Keywords
The function uses detect_category to determine whether the question is related to the legal domain or not.

Semantic Similarity Check
If the question falls into the "other" category and has no semantic similarity with any questions in the dataset (using retrieve_similar), it is considered invalid.

In [9]:
def is_valid_question(q):
    q = q.strip().lower()
    if len(q) < 4:
        return False

    vowels = "aeiou"
    if not any(v in q for v in vowels):
        return False

    legal_words = [
        "contract", "law", "legal", "court", "crime", "rights",
        "license", "property", "agreement", "family", "criminal",
        "penalty", "sue", "intellectual", "divorce", "fine"
    ]

    if detect_category(q) == 'other' and not retrieve_similar(q, k=2):  
        return False
    return True


If the sentence has no question mark or contains fewer than five words, it is considered an ambiguous question.

If the question starts with expressions like “what is”, “how to”, “tell me”, or “explain” and the sentence is short, it is also classified as ambiguous.

Otherwise, the question is considered clear.

In [10]:
def is_ambiguous(question):
    if "?" not in question or len(question.split()) < 5: 
        return True
    vague_starts = ["what is", "how to", "tell me", "explain"]  
    if any(question.lower().startswith(v) for v in vague_starts) and len(question) < 20:
        return True
    return False


In this section, we first check whether the category we detect matches the original category provided by the user or not, 
and we respond accordingly.

If it doesn’t match, the behavior is clear in the function.
If it does match, we construct the desired prompt and send it to the model.

The prompt is constructed in such a way that we first include a text indicating to the model that our task is related to legal matters, 
and then we provide the model with similar questions to this request that we obtained from the dataset using RAG.

After that, using this function and the interface we built with Gradio, we display the result to the user.


In [None]:
import gradio as gr

def generate_answer(question, category):
    if not question or not question.strip():
        return "Please enter a question."
    
    if is_ambiguous(question):
        return "Your question seems unclear. Could you provide more details?"

    if not is_valid_question(question):
        return "Please enter a valid legal question."

    real_cat = detect_category(question)

    if category != "none" and real_cat != category and real_cat != "other":
        return f"Note: This question seems to be about {real_cat.replace('_', ' ').title()}."

    if real_cat == "other":
        if len(retrieve_similar(question, k=2)) == []:
            return "I can only answer legal questions."

    examples = retrieve_similar(question, k=4, dastbandi=category if category != "none" else None)

    prompt = """You are a knowledgeable legal expert. Use the following examples to inform your response, but synthesize a clear, accurate answer in your own words. Focus on key facts and avoid unnecessary details.

    Examples:
    """
    for e in examples:
        prompt += f"Question: {e['question']}\nAnswer: {e['answer']}\n\n"
    prompt += f"Question: {question}\nAnswer:"

    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=1024)
    
    outputs = model.generate(
        inputs.input_ids,
        max_new_tokens=500,  
        num_beams=6,  
        temperature=0.5,
        do_sample=True,
        no_repeat_ngram_size=3,
        early_stopping=True
    )
    answer = tokenizer.decode(outputs[0], skip_special_tokens=True)

    if len(answer.strip()) < 50 and examples: 
        answer = "Based on similar cases: " + examples[0]['answer']

    return answer

gr.Interface(
    generate_answer,
    inputs=[
        gr.Textbox(lines=4, label="Your Legal Question", placeholder="e.g. What happens if I breach a contract?"),
        gr.Dropdown(["none", "contract_law", "criminal_law", "intellectual_property", "family_law"], 
                    label="Category", value="none")
    ],
    outputs=gr.Textbox(lines=15, label="Answer"),
    title="Legal Assistant Chatbot (FLAN-T5-Large + RAG)",
).launch(share=True)



* Running on local URL:  http://127.0.0.1:7860

Could not create share link. Please check your internet connection or our status page: https://status.gradio.app.


2025/12/08 17:15:10 [E] [service.go:262] fail to parse proxy url
2025/12/08 17:15:10 [W] [service.go:132] login to server failed: parse "127.0.0.1:10809": first path segment in URL cannot contain colon




In this section, we calculated the evaluation matrices.

In [12]:
import json
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
from rouge_score import rouge_scorer

smooth = SmoothingFunction().method1

def evaluate_category_accuracy(dataset, detect_category_fn):
    total = 0
    correct = 0
    for item in dataset:
        if detect_category_fn(item["question"]) == item["category"]:
            correct += 1
        total += 1
    return correct / total if total > 0 else 0

def evaluate_bleu(dataset, generate_fn):
    scores = []
    for item in dataset:
        ref = item["answer"].split()
        pred = generate_fn(item["question"], "none").split()
        bleu = sentence_bleu([ref], pred, smoothing_function=smooth)
        scores.append(bleu)
    return sum(scores) / len(scores)

def evaluate_rouge(dataset, generate_fn):
    scorer = rouge_scorer.RougeScorer(["rouge1", "rougeL"], use_stemmer=True)
    r1 = []
    rl = []
    for item in dataset:
        pred = generate_fn(item["question"], "none")
        scores = scorer.score(item["answer"], pred)
        r1.append(scores["rouge1"].fmeasure)
        rl.append(scores["rougeL"].fmeasure)
    return {"ROUGE-1": sum(r1) / len(r1), "ROUGE-L": sum(rl) / len(rl)}

def evaluate_ambiguity_handling(dataset, detect_irrelevant_fn):
    total = 0
    correct = 0
    for item in dataset:
        irr = detect_irrelevant_fn(item["question"])
        if item.get("is_irrelevant", False):
            if irr: correct += 1
        else:
            if not irr: correct += 1
        total += 1
    return correct / total if total > 0 else 0

def run_all_evaluations(dataset):
    acc = evaluate_category_accuracy(dataset, detect_category)
    bleu = evaluate_bleu(dataset, generate_answer)
    rouge = evaluate_rouge(dataset, generate_answer)
    if any("is_irrelevant" in item for item in dataset):
        amb = evaluate_ambiguity_handling(dataset, detect_irrelevant)
    else:
        amb = None
    return {"Accuracy": acc, "BLEU": bleu, "ROUGE": rouge, "Ambiguity": amb}

def detect_irrelevant(text):
    t = text.strip()
    if len(t) < 4:
        return True
    letters = sum(c.isalpha() for c in t)
    if letters < 3:
        return True
    ratio = letters / max(len(t), 1)
    if ratio < 0.25:
        return True
    return False



def show_evaluation_results(dataset):
    results = run_all_evaluations(dataset)
    print("------ Evaluation Results ------")
    print(f"Accuracy               : {results['Accuracy']:.2f}")
    print(f"BLEU Score             : {results['BLEU']:.2f}")
    print(f"ROUGE-1                : {results['ROUGE']['ROUGE-1']:.2f}")
    print(f"ROUGE-L                : {results['ROUGE']['ROUGE-L']:.2f}")
    if results["Ambiguity"] is not None:
        print(f"Ambiguity Handling     : {results['Ambiguity']:.2f}")
    print("--------------------------------")


Build Dataset and test

In [13]:
dataset = [
    # Contract Law
    {"question": "What is a valid contract?", "answer": "A valid contract requires offer, acceptance, consideration, and the intention to create legal relations.", "category": "contract_law"},
    {"question": "What happens if a contract is breached?", "answer": "A breach of contract allows the injured party to claim damages or request specific performance.", "category": "contract_law"},
    {"question": "What is consideration in contract law?", "answer": "Consideration is something of value exchanged by the parties to form a binding contract.", "category": "contract_law"},
    {"question": "Can a verbal contract be legally binding?", "answer": "Yes, verbal contracts can be legally binding unless the law requires the agreement to be written.", "category": "contract_law"},
    {"question": "What is a void contract?", "answer": "A void contract has no legal effect and cannot be enforced by either party.", "category": "contract_law"},
    {"question": "What is a unilateral contract?", "answer": "A unilateral contract involves one party making a promise in exchange for an act.", "category": "contract_law"},
    {"question": "What makes a contract unenforceable?", "answer": "A contract may be unenforceable if it lacks capacity, consent, legality, or proper form.", "category": "contract_law"},
    {"question": "What is specific performance?", "answer": "Specific performance is a court order requiring a party to fulfill their contractual obligations.", "category": "contract_law"},

    # Criminal Law
    {"question": "What is a felony?", "answer": "A felony is a serious criminal offense punishable by imprisonment for more than one year.", "category": "criminal_law"},
    {"question": "What is probable cause?", "answer": "Probable cause refers to reasonable grounds for police to make an arrest or conduct a search.", "category": "criminal_law"},
    {"question": "What is the presumption of innocence?", "answer": "The presumption of innocence means that a person is considered innocent until proven guilty.", "category": "criminal_law"},
    {"question": "What is self-defense?", "answer": "Self-defense is the lawful use of force to protect oneself from harm.", "category": "criminal_law"},
    {"question": "What is a misdemeanor?", "answer": "A misdemeanor is a less serious crime, typically punishable by fines or short-term imprisonment.", "category": "criminal_law"},
    {"question": "What is bail?", "answer": "Bail is a sum of money paid to secure a defendant’s release pending trial.", "category": "criminal_law"},
    {"question": "What is plea bargaining?", "answer": "Plea bargaining is an agreement where the defendant pleads guilty in exchange for a reduced sentence.", "category": "criminal_law"},
    {"question": "What is criminal intent?", "answer": "Criminal intent refers to the mental state of knowingly committing a wrongful act.", "category": "criminal_law"},

    # Family Law
    {"question": "What is child custody?", "answer": "Child custody determines the legal and practical relationship between a parent and child.", "category": "family_law"},
    {"question": "What is alimony?", "answer": "Alimony is financial support paid by one spouse to the other after divorce.", "category": "family_law"},
    {"question": "How is child support determined?", "answer": "Child support is calculated based on income, parenting time, and the needs of the child.", "category": "family_law"},
    {"question": "What is legal guardianship?", "answer": "Legal guardianship grants an adult the right to care for a minor or incapacitated person.", "category": "family_law"},
    {"question": "What is a prenuptial agreement?", "answer": "A prenuptial agreement outlines asset and financial arrangements before marriage.", "category": "family_law"},
    {"question": "Can a parent lose custody?", "answer": "Yes, a parent may lose custody due to abuse, neglect, or inability to provide proper care.", "category": "family_law"},
    {"question": "What is domestic violence?", "answer": "Domestic violence includes physical, emotional, or psychological abuse within a household.", "category": "family_law"},
    {"question": "What is joint custody?", "answer": "Joint custody allows both parents to share legal responsibility for the child.", "category": "family_law"},

    # Intellectual Property
    {"question": "What is copyright?", "answer": "Copyright protects original works of authorship, such as books, music, and art.", "category": "intellectual_property"},
    {"question": "What is a patent?", "answer": "A patent gives an inventor exclusive rights to their invention for a limited time.", "category": "intellectual_property"},
    {"question": "What is trademark infringement?", "answer": "Trademark infringement occurs when a mark is used without authorization in a way that causes confusion.", "category": "intellectual_property"},
    {"question": "What is fair use?", "answer": "Fair use allows limited use of copyrighted material without permission for purposes like criticism or education.", "category": "intellectual_property"},
    {"question": "What is a trade secret?", "answer": "A trade secret is confidential business information that provides a competitive advantage.", "category": "intellectual_property"},
    {"question": "What is licensing?", "answer": "Licensing allows others to use intellectual property under agreed conditions.", "category": "intellectual_property"},
    {"question": "What is patent infringement?", "answer": "Patent infringement occurs when a patented invention is used or sold without permission.", "category": "intellectual_property"},
    {"question": "What does plagiarism mean?", "answer": "Plagiarism is using someone else's work without proper acknowledgment.", "category": "intellectual_property"},

    # Irrelevant Questions (Ambiguity + Non-legal)
    {"question": "How do I cook pasta?", "answer": "This is not a legal question.", "category": "other", "is_irrelevant": True},
    {"question": "What is the best video game of 2023?", "answer": "This is not related to legal topics.", "category": "other", "is_irrelevant": True},
    {"question": "Why is the sky blue?", "answer": "This question is not related to law.", "category": "other", "is_irrelevant": True},
    {"question": "Explain quantum physics.", "answer": "This topic is unrelated to legal matters.", "category": "other", "is_irrelevant": True},
    {"question": "What is your favorite color?", "answer": "This is not a legal inquiry.", "category": "other", "is_irrelevant": True},
    {"question": "How to lose weight fast?", "answer": "This is not a legal question.", "category": "other", "is_irrelevant": True},
    {"question": "What is the capital of France?", "answer": "This is not related to legal topics.", "category": "other", "is_irrelevant": True},
    {"question": "How does a rocket work?", "answer": "This is not a legal question.", "category": "other", "is_irrelevant": True}
]

show_evaluation_results(dataset)


------ Evaluation Results ------
Accuracy               : 0.88
BLEU Score             : 0.03
ROUGE-1                : 0.20
ROUGE-L                : 0.19
Ambiguity Handling     : 0.80
--------------------------------
