## 1. Load Cleaned Rulings

This notebook loads the pre-cleaned legal rulings from `data/clean/` to build TF-IDF embeddings for similarity-based retrieval.


In [1]:
import os

# Load cleaned ruling texts
clean_dir = os.path.join("data", "clean")
filepaths = [os.path.join(clean_dir, f) for f in os.listdir(clean_dir) if f.endswith(".txt")]

documents = []
for filepath in filepaths:
    with open(filepath, "r", encoding="utf-8") as file:
        documents.append(file.read().strip())

print(f"Loaded {len(documents)} cleaned rulings.")

Loaded 74 cleaned rulings.


## 2. Build TF-IDF Embeddings

This step converts each ruling into a numerical vector using TF-IDF,  
capturing the importance of terms relative to the collection of documents.


In [2]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Initialize TF-IDF Vectorizer
vectorizer = TfidfVectorizer(stop_words="english")

# Fit and transform the documents
tfidf_matrix = vectorizer.fit_transform(documents)

print(f"TF-IDF matrix shape: {tfidf_matrix.shape}")

TF-IDF matrix shape: (74, 5545)


## 3. Search Interface: Find Similar Rulings

Enter a query, convert it to TF-IDF, compute cosine similarity,  
and display the most relevant rulings.


In [7]:
from sklearn.metrics.pairwise import cosine_similarity

def search(query, top_n=5):
    query_vec = vectorizer.transform([query])
    similarities = cosine_similarity(query_vec, tfidf_matrix).flatten()
    ranked_indices = similarities.argsort()[::-1][:top_n]
    
    results = []
    for idx in ranked_indices:
        results.append((idx, similarities[idx], documents[idx]))
    
    return results

# Example query
query = "criminal tax fraud conviction in New York"
results = search(query, top_n=3)

# Show results
for i, (idx, score, doc) in enumerate(results):
    print(f"\n--- Result {i+1} | Score: {score:.2f} ---\n")
    print(doc[:500])
    print("\n--- End Result ---\n")



--- Result 1 | Score: 0.06 ---

Matter of Hoovler v Vazquez-Doles (2025 NY Slip Op 02204)





Matter of Hoovler v Vazquez-Doles


2025 NY Slip Op 02204


Decided on April 16, 2025


Appellate Division, Second Department



Published by New York State Law Reporting Bureau pursuant to Judiciary Law § 431.


This opinion is uncorrected and subject to revision before publication in the Official Reports.



Decided on April 16, 2025
SUPREME COURT OF THE STATE OF NEW YORK
Appellate Division, Second Judicial Department

MARK C. DILL

--- End Result ---


--- Result 2 | Score: 0.05 ---

People v Palm (2025 NY Slip Op 02799)





People v Palm


2025 NY Slip Op 02799


Decided on May 7, 2025


Appellate Division, Second Department



Published by New York State Law Reporting Bureau pursuant to Judiciary Law § 431.


This opinion is uncorrected and subject to revision before publication in the Official Reports.



Decided on May 7, 2025
SUPREME COURT OF THE STATE OF NEW YORK
Appellate Division