Wisconsin Law Enforcement Legal Chat RAG System

A proof-of-concept Retrieval-Augmented Generation (RAG) system that enables Wisconsin law enforcement officers to quickly query state statutes, case law, and department policies through a conversational interface.

Project Structure

codefourrag/
├── backend/          # FastAPI backend application
├── frontend/         # Next.js frontend application
├── data/            # Documents and embeddings
├── docs/            # Documentation
└── scripts/         # Utility scripts

Setup Instructions

Prerequisites

Python 3.10+
Node.js 18+ (for frontend)
npm or yarn

Backend Setup

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Copy environment variables:

cp .env.example .env
# Edit .env and add your API keys

Run the backend:

cd backend
uvicorn main:app --reload

The API will be available at http://localhost:8000

Frontend Setup

Navigate to frontend directory:

cd frontend

Install dependencies:

npm install

Run the development server:

npm run dev

The frontend will be available at http://localhost:3000

Development Workflow

This project is being built incrementally through 10 separate processes. Each process is implemented and tested independently before moving to the next.

API Documentation

Once the backend is running, visit:

API Docs: http://localhost:8000/docs
Alternative Docs: http://localhost:8000/redoc

Testing

Run backend tests:

cd backend
pytest

Data Directory

Place your Wisconsin legal documents in the following directories:

data/raw/statutes/ - State statutes (PDF/HTML)
data/raw/case_law/ - Case law summaries (PDF)
data/raw/policies/ - Department policies (DOCX/PDF)
data/raw/training/ - Training materials

You can organize files within subdirectories as needed. The system will recursively scan all subdirectories within data/raw/.

Supported File Formats

PDF (.pdf) - Uses pdfplumber
Word Documents (.docx, .doc) - Uses python-docx
HTML (.html, .htm) - Uses BeautifulSoup4
Text Files (.txt, .md) - Plain text parsing

Document Ingestion

Using the API Endpoint

Once the backend is running, you can ingest documents by calling the /api/ingest endpoint:

# Ingest all documents from data/raw/
curl -X POST "http://localhost:8000/api/ingest"

# Or use the interactive API docs at http://localhost:8000/docs

The ingestion process will:

Recursively scan data/raw/ and all subdirectories
Parse supported file formats (PDF, DOCX, HTML, TXT)
Normalize text (remove headers/footers, preserve section markers)
Extract metadata (title, jurisdiction, dates, statute numbers, department)
Return a list of normalized Document objects

Note: This step does NOT chunk or index documents yet. That will be handled in subsequent steps.

Response Format

The /api/ingest endpoint returns:

status: "success", "partial", or "failed"
documents_processed: Number of successfully processed documents
documents_failed: Number of documents that failed to process
total_documents: Total number of documents found
documents: List of Document objects with text and metadata
failures: List of failed files with error messages
processing_time_seconds: Time taken to process

Example

{
  "status": "success",
  "documents_processed": 5,
  "documents_failed": 0,
  "total_documents": 5,
  "documents": [
    {
      "text": "Normalized document text...",
      "metadata": {
        "title": "Wisconsin Statute 940.01",
        "jurisdiction": "WI",
        "document_type": "statute",
        "statute_numbers": ["940.01"],
        "dates": ["2023"],
        "source_path": "data/raw/statutes/940.01.pdf"
      },
      "source_path": "data/raw/statutes/940.01.pdf"
    }
  ],
  "failures": [],
  "processing_time_seconds": 2.34
}

Performance Evaluation

To evaluate system performance (retrieval accuracy, response time, relevance scoring):

# Make sure backend is running first
python scripts/evaluate_performance.py

This will generate performance metrics and save results to performance_results.json.

See PERFORMANCE_METRICS.md for detailed methodology and expected results.

Documentation

README.md: Quick start guide and setup instructions
EXPLANATION.md: Complete implementation details and technical documentation
ARCHITECTURE.md: System architecture, design decisions, scalability, and security
PERFORMANCE_METRICS.md: Performance evaluation methodology and metrics

License

This is a take-home assignment project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wisconsin Law Enforcement Legal Chat RAG System

Project Structure

Setup Instructions

Prerequisites

Backend Setup

Frontend Setup

Development Workflow

API Documentation

Testing

Data Directory

Supported File Formats

Document Ingestion

Using the API Endpoint

Response Format

Example

Performance Evaluation

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
backend		backend
data/raw		data/raw
frontend		frontend
scripts		scripts
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
EXPLANATION.md		EXPLANATION.md
PERFORMANCE_METRICS.md		PERFORMANCE_METRICS.md
README.md		README.md
env.example		env.example
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Wisconsin Law Enforcement Legal Chat RAG System

Project Structure

Setup Instructions

Prerequisites

Backend Setup

Frontend Setup

Development Workflow

API Documentation

Testing

Data Directory

Supported File Formats

Document Ingestion

Using the API Endpoint

Response Format

Example

Performance Evaluation

Documentation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages