A proof-of-concept Retrieval-Augmented Generation (RAG) system that enables Wisconsin law enforcement officers to quickly query state statutes, case law, and department policies through a conversational interface.
codefourrag/
├── backend/ # FastAPI backend application
├── frontend/ # Next.js frontend application
├── data/ # Documents and embeddings
├── docs/ # Documentation
└── scripts/ # Utility scripts
- Python 3.10+
- Node.js 18+ (for frontend)
- npm or yarn
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Copy environment variables:
cp .env.example .env
# Edit .env and add your API keys- Run the backend:
cd backend
uvicorn main:app --reloadThe API will be available at http://localhost:8000
- Navigate to frontend directory:
cd frontend- Install dependencies:
npm install- Run the development server:
npm run devThe frontend will be available at http://localhost:3000
This project is being built incrementally through 10 separate processes. Each process is implemented and tested independently before moving to the next.
Once the backend is running, visit:
- API Docs:
http://localhost:8000/docs - Alternative Docs:
http://localhost:8000/redoc
Run backend tests:
cd backend
pytestPlace your Wisconsin legal documents in the following directories:
data/raw/statutes/- State statutes (PDF/HTML)data/raw/case_law/- Case law summaries (PDF)data/raw/policies/- Department policies (DOCX/PDF)data/raw/training/- Training materials
You can organize files within subdirectories as needed. The system will recursively scan all subdirectories within data/raw/.
- PDF (
.pdf) - Uses pdfplumber - Word Documents (
.docx,.doc) - Uses python-docx - HTML (
.html,.htm) - Uses BeautifulSoup4 - Text Files (
.txt,.md) - Plain text parsing
Once the backend is running, you can ingest documents by calling the /api/ingest endpoint:
# Ingest all documents from data/raw/
curl -X POST "http://localhost:8000/api/ingest"
# Or use the interactive API docs at http://localhost:8000/docsThe ingestion process will:
- Recursively scan
data/raw/and all subdirectories - Parse supported file formats (PDF, DOCX, HTML, TXT)
- Normalize text (remove headers/footers, preserve section markers)
- Extract metadata (title, jurisdiction, dates, statute numbers, department)
- Return a list of normalized Document objects
Note: This step does NOT chunk or index documents yet. That will be handled in subsequent steps.
The /api/ingest endpoint returns:
status: "success", "partial", or "failed"documents_processed: Number of successfully processed documentsdocuments_failed: Number of documents that failed to processtotal_documents: Total number of documents founddocuments: List of Document objects with text and metadatafailures: List of failed files with error messagesprocessing_time_seconds: Time taken to process
{
"status": "success",
"documents_processed": 5,
"documents_failed": 0,
"total_documents": 5,
"documents": [
{
"text": "Normalized document text...",
"metadata": {
"title": "Wisconsin Statute 940.01",
"jurisdiction": "WI",
"document_type": "statute",
"statute_numbers": ["940.01"],
"dates": ["2023"],
"source_path": "data/raw/statutes/940.01.pdf"
},
"source_path": "data/raw/statutes/940.01.pdf"
}
],
"failures": [],
"processing_time_seconds": 2.34
}To evaluate system performance (retrieval accuracy, response time, relevance scoring):
# Make sure backend is running first
python scripts/evaluate_performance.pyThis will generate performance metrics and save results to performance_results.json.
See PERFORMANCE_METRICS.md for detailed methodology and expected results.
- README.md: Quick start guide and setup instructions
- EXPLANATION.md: Complete implementation details and technical documentation
- ARCHITECTURE.md: System architecture, design decisions, scalability, and security
- PERFORMANCE_METRICS.md: Performance evaluation methodology and metrics
This is a take-home assignment project.