In [2]:
!pip install crewai sentence-transformers rank-bm25 faiss-cpu torch transformers

Collecting crewai
  Downloading crewai-0.203.0-py3-none-any.whl.metadata (35 kB)
Collecting sentence-transformers
  Using cached sentence_transformers-5.1.1-py3-none-any.whl.metadata (16 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.12.0-cp311-cp311-win_amd64.whl.metadata (5.2 kB)
Collecting torch
  Downloading torch-2.8.0-cp311-cp311-win_amd64.whl.metadata (30 kB)
Collecting transformers
  Downloading transformers-4.57.0-py3-none-any.whl.metadata (41 kB)
Collecting appdirs>=1.4.4 (from crewai)
  Using cached appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Collecting blinker>=1.9.0 (from crewai)
  Using cached blinker-1.9.0-py3-none-any.whl.metadata (1.6 kB)
Collecting chromadb~=1.1.0 (from crewai)
  Downloading chromadb-1.1.1-cp39-abi3-win_amd64.whl.metadata (7.4 kB)
Collecting instructor>=1.3.3 (from crewai)
  Using cached instructor-1.11.3-py3-none-any.whl.metadata (11 kB)
Collecting json-repair==0.25.2 (from crewai)
  Using cached json_repair-0.25.2-py3-none-any.whl.metad

In [4]:
pip install ipywidgets

Collecting ipywidgets
  Using cached ipywidgets-8.1.7-py3-none-any.whl.metadata (2.4 kB)
Collecting widgetsnbextension~=4.0.14 (from ipywidgets)
  Using cached widgetsnbextension-4.0.14-py3-none-any.whl.metadata (1.6 kB)
Collecting jupyterlab_widgets~=3.0.15 (from ipywidgets)
  Using cached jupyterlab_widgets-3.0.15-py3-none-any.whl.metadata (20 kB)
Using cached ipywidgets-8.1.7-py3-none-any.whl (139 kB)
Using cached jupyterlab_widgets-3.0.15-py3-none-any.whl (216 kB)
Using cached widgetsnbextension-4.0.14-py3-none-any.whl (2.2 MB)
Installing collected packages: widgetsnbextension, jupyterlab_widgets, ipywidgets

   ------------- -------------------------- 1/3 [jupyterlab_widgets]
   -------------------------- ------------- 2/3 [ipywidgets]
   -------------------------- ------------- 2/3 [ipywidgets]
   -------------------------- ------------- 2/3 [ipywidgets]
   -------------------------- ------------- 2/3 [ipywidgets]
   -------------------------- ------------- 2/3 [ipywidgets]
   --

In [13]:
from sentence_transformers import SentenceTransformer
from rank_bm25 import BM25Okapi
import faiss

case_corpus = [
    "The defendant breached their duty of care by failing to maintain safe premises. The plaintiff suffered damages.",
    "Contract interpretation requires examining the intent of the parties.",
    "Negligence claims require proof of duty, breach, causation, and damages.",
    "Corporate liability extends to officers when they personally participate in tortious conduct. Damages were substantial.",
    "Employment contracts must specify termination clauses clearly. The defendant was negligent."
]

embedding_model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2") # Lightweight model for demo
case_embeddings = embedding_model.encode(case_corpus)
bm25 = BM25Okapi([d.split() for d in case_corpus])
dimension = case_embeddings.shape[1]
faiss_index = faiss.IndexFlatIP(dimension)
faiss.normalize_L2(case_embeddings)
faiss_index.add(case_embeddings)


In [14]:
from crewai import Agent

semantic_agent = Agent(
    role="Semantic Legal Retriever",
    goal="Retrieve case law using semantic similarity with dense embeddings.",
    backstory="You are a retrieval specialist designed to find the most relevant cases using vector search and FAISS.",
    memory=True
)
bm25_agent = Agent(
    role="Keyword Case Retriever",
    goal="Find cases by legal keywords using BM25.",
    backstory="You specialize in keyword-based search to match statutory language.",
    memory=True
)
summarizer_agent = Agent(
    role="Legal Summarizer",
    goal="Summarize legal cases for clarity.",
    backstory="You use generative AI to produce concise headnotes and summaries for judges and attorneys.",
    memory=True
)


In [15]:
from transformers import pipeline
summarizer_pipeline = pipeline("summarization", model="facebook/bart-large-cnn")

def semantic_task(query, k=3):
    query_emb = embedding_model.encode([query])
    faiss.normalize_L2(query_emb)
    scores, indices = faiss_index.search(query_emb.astype("float32"), k)
    return [case_corpus[i] for i in indices[0]]

def bm25_task(query, k=3):
    scores = bm25.get_scores(query.split())
    top_indices = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)[:k]
    return [case_corpus[i] for i in top_indices]

def summarize_task(text):
    result = summarizer_pipeline(text, max_length=80, min_length=30, do_sample=False)
    return result[0]["summary_text"]


Device set to use cpu


In [16]:
def research_workflow(query):
    print(f"Query: {query}\n")
    print("--- Semantic Retrieval Agent ---")
    sem_cases = semantic_task(query, k=3)
    for i, case in enumerate(sem_cases):
        print(f"[Semantic {i+1}]", case)
    print("\n--- BM25 Keyword Agent ---")
    bm_cases = bm25_task(query, k=3)
    for i, case in enumerate(bm_cases):
        print(f"[BM25 {i+1}]", case)
    print("\n--- Summarizer Agent ---")
    for i, case in enumerate(sem_cases):
        summary = summarize_task(case)
        print(f"Summary for Semantic Case {i+1}:", summary)


In [18]:
research_workflow("negligence duty of care")


Your max_length is set to 80, but your input_length is only 18. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=9)


Query: negligence duty of care

--- Semantic Retrieval Agent ---
[Semantic 1] Negligence claims require proof of duty, breach, causation, and damages.
[Semantic 2] The defendant breached their duty of care by failing to maintain safe premises. The plaintiff suffered damages.
[Semantic 3] Contract interpretation requires examining the intent of the parties.

--- BM25 Keyword Agent ---
[BM25 1] The defendant breached their duty of care by failing to maintain safe premises. The plaintiff suffered damages.
[BM25 2] Contract interpretation requires examining the intent of the parties.
[BM25 3] Negligence claims require proof of duty, breach, causation, and damages.

--- Summarizer Agent ---


Your max_length is set to 80, but your input_length is only 21. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=10)


Summary for Semantic Case 1: Negligence claims require proof of duty, breach, causation, and damages. Negligence is a form of negligence that can only be proven with proof of a breach of duty.


Your max_length is set to 80, but your input_length is only 12. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=6)


Summary for Semantic Case 2: The defendant breached their duty of care by failing to maintain safe premises. The plaintiff suffered damages. The case was settled at the High Court in London, where the judge ruled in favour of the plaintiff.
Summary for Semantic Case 3: Contract interpretation requires examining the intent of the parties. Contract interpretation requires looking at the intent, not the content of the contract. The intent of a contract is to protect the interests of both parties.


In [19]:
from crewai import Agent

precedent_agent = Agent(
    role="Precedent Reasoner",
    goal="Analyze precedents to find supporting or opposing cases and flag outdated legal logic.",
    backstory="You cross-reference new legal arguments with prior case law, highlighting relationships and outdated rulings.",
    memory=True
)

kg_agent = Agent(
    role="Knowledge Graph Builder",
    goal="Construct and highlight legal case relationships, statutes, and influences.",
    backstory="You build knowledge graphs linking cases, statutes, and influential nodes for legal discovery.",
    memory=True
)


In [20]:
def precedent_task(query, corpus):
    """Simulated precedent analysis based on keywords and matches in corpus."""
    supporting = []
    opposing = []
    outdated = []
    for case in corpus:
        if "duty" in query.lower() and "duty" in case.lower():
            supporting.append(case)
        if "breach" in query.lower() and "award" in case.lower():
            opposing.append(case)
        if "outdated" in case.lower() or "overruled" in case.lower():
            outdated.append(case)
    return {
        "supporting_cases": supporting,
        "opposing_cases": opposing,
        "outdated_cases": outdated
    }

def kg_task(corpus, statutes):
    """Simulated knowledge graph: maps case to statute if any shared words."""
    graph = []
    for case in corpus:
        linked = []
        for statute in statutes:
            case_words = set(case.lower().split())
            statute_words = set(statute.lower().split())
            if len(case_words.intersection(statute_words)) > 0:
                linked.append(statute)
        graph.append({
            "case": case,
            "linked_statutes": linked
        })
    # Highlight most linked cases
    influential = max(graph, key=lambda x: len(x['linked_statutes']))
    return {
        "graph": graph,
        "most_influential": influential
    }


In [21]:
def extended_research_workflow(query):
    print(f"Query: {query}\n")
    
    print("--- Semantic Retrieval Agent ---")
    sem_cases = semantic_task(query, k=3)
    for i, case in enumerate(sem_cases):
        print(f"[Semantic {i+1}]", case)
        
    print("\n--- BM25 Keyword Agent ---")
    bm_cases = bm25_task(query, k=3)
    for i, case in enumerate(bm_cases):
        print(f"[BM25 {i+1}]", case)
        
    print("\n--- Summarizer Agent ---")
    for i, case in enumerate(sem_cases):
        summary = summarize_task(case)
        print(f"Summary for Semantic Case {i+1}:", summary)
    
    print("\n--- Precedent Reasoner Agent ---")
    precedent_results = precedent_task(query, case_corpus)
    print("Supporting Cases:", precedent_results["supporting_cases"])
    print("Opposing Cases:", precedent_results["opposing_cases"])
    print("Outdated Cases:", precedent_results["outdated_cases"])
    
    print("\n--- Knowledge Graph Builder Agent ---")
    kg_results = kg_task(case_corpus, statute_corpus)
    print("Most Influential Case:", kg_results["most_influential"]["case"])
    print("Linked Statutes:", kg_results["most_influential"]["linked_statutes"])


In [22]:
extended_research_workflow("duty of care negligence")


Your max_length is set to 80, but your input_length is only 18. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=9)


Query: duty of care negligence

--- Semantic Retrieval Agent ---
[Semantic 1] Negligence claims require proof of duty, breach, causation, and damages.
[Semantic 2] The defendant breached their duty of care by failing to maintain safe premises. The plaintiff suffered damages.
[Semantic 3] Corporate liability extends to officers when they personally participate in tortious conduct. Damages were substantial.

--- BM25 Keyword Agent ---
[BM25 1] The defendant breached their duty of care by failing to maintain safe premises. The plaintiff suffered damages.
[BM25 2] Contract interpretation requires examining the intent of the parties.
[BM25 3] Negligence claims require proof of duty, breach, causation, and damages.

--- Summarizer Agent ---


Your max_length is set to 80, but your input_length is only 21. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=10)


Summary for Semantic Case 1: Negligence claims require proof of duty, breach, causation, and damages. Negligence is a form of negligence that can only be proven with proof of a breach of duty.


Your max_length is set to 80, but your input_length is only 21. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=10)


Summary for Semantic Case 2: The defendant breached their duty of care by failing to maintain safe premises. The plaintiff suffered damages. The case was settled at the High Court in London, where the judge ruled in favour of the plaintiff.
Summary for Semantic Case 3: Corporate liability extends to officers when they personally participate in tortious conduct. Damages were substantial, according to the lawsuit. The case was settled out of court, with no admission of liability.

--- Precedent Reasoner Agent ---
Supporting Cases: ['The defendant breached their duty of care by failing to maintain safe premises. The plaintiff suffered damages.', 'Negligence claims require proof of duty, breach, causation, and damages.']
Opposing Cases: []
Outdated Cases: []

--- Knowledge Graph Builder Agent ---
Most Influential Case: Employment contracts must specify termination clauses clearly. The defendant was negligent.
Linked Statutes: ['Section 1: All persons have the right to equal protection.', '