An AI-powered contract analysis and clause rewriting tool with React frontend and FastAPI backend. The application helps legal professionals identify risky contract clauses, suggests balanced rewrites using AI, and generates comprehensive reports with visual diffs.
- Python 3.8+
- Node.js 16+
- Google Cloud Project with Gemini API access
# Clone and setup
git clone <your-repo-url>
cd Legal_Redline_Sandbox
# Configure environment
# Edit .env with your Google Cloud credentials# Windows
setup.bat
# Linux/Mac
chmod +x setup.sh
./setup.sh# Backend
cd backend
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
# Frontend
cd ../frontend
npm install# Terminal 1 - Backend
cd backend
source venv/bin/activate # Windows: venv\Scripts\activate
python -m uvicorn main:app --reload --host 0.0.0.0 --port 8000
# Terminal 2 - Frontend
cd frontend
npm run devpython test_system.py # Validates setupπ Access: http://localhost:3000
# Google Cloud (Required)
GOOGLE_CLOUD_PROJECT_ID=your-project-id
GOOGLE_APPLICATION_CREDENTIALS=./path/to/service-account.json
GOOGLE_API_KEY=your-gemini-api-key
# Security (Required)
SECRET_KEY=your-64-character-random-secret
# Basic Config
ENVIRONMENT=development
DEBUG=true
CORS_ORIGINS=["http://localhost:3000"]- Frontend: React + Vite with Context state management
- Backend: FastAPI with async job processing + JWT auth
- AI: Google Gemini API for clause analysis and rewriting
- Processing: Document AI OCR + Cloud DLP for privacy scanning
- Features: Real-time job polling, export reports, diff visualization
Preferred communication style: Simple, everyday language.
- Streamlit Web Application: Single-page application with sidebar navigation for different workflow stages
- Session State Management: Maintains processed documents, risky clauses, selected clauses, and rewrite history across user interactions
- Component-Based UI: Modular interface with separate sections for document upload, risk analysis, clause rewriting, and export functionality
- Modular Processing Pipeline: Separate utility classes handle distinct responsibilities:
PDFProcessor: Document parsing and text extraction using PyMuPDFRiskDetector: Rule-based clause risk analysis with pattern matchingClauseRewriter: AI-powered clause rewriting using Google Gemini APIDiffGenerator: HTML diff generation for comparing original vs. rewritten textExportManager: Report generation in HTML and PDF formats
- Google Gemini API: Primary AI service for clause rewriting with structured JSON responses
- Rule-Based Risk Detection: Pattern matching system for identifying problematic clauses including auto-renewal, unilateral changes, short notice periods, high penalties, and exclusive jurisdiction clauses
- Contextual Rewriting: AI considers user-specified controls and generates rationale, fallback positions, and risk reduction explanations
- PDF Text Extraction: Uses PyMuPDF (fitz) for reliable PDF parsing and page-by-page text extraction
- Clause Identification: Pattern-based clause detection using regex for common legal document structures
- Risk Scoring: Numerical risk assessment system with thresholds for different clause types
- Diff Generation: HTML-based visual comparison between original and rewritten clauses
- Multi-Format Reports: Generates both HTML and PDF reports with comprehensive analysis
- Visual Diff Integration: Includes side-by-side comparisons with syntax highlighting
- Structured Output: Reports contain risk analysis, rewrite suggestions, and implementation recommendations
- Google Gemini API: Primary AI service for clause rewriting and analysis via
google.genaiclient - Environment-Based Authentication: Requires
GEMINI_API_KEYenvironment variable for API access
- PyMuPDF (fitz): PDF parsing and text extraction library for handling legal documents
- Python-docx: Microsoft Word document processing capabilities
- PyPDF: Additional PDF processing utilities
- Streamlit: Primary web framework for building the interactive user interface
- FastAPI/Uvicorn: Backend API framework for potential service expansion
- difflib: Built-in Python library for generating text differences and comparisons
- base64: Encoding utilities for file handling and data transfer
- tempfile: Temporary file management for document processing
- re (regex): Pattern matching for clause identification and risk detection
- JSON: Data serialization for API communication and session state management
- Pandas/NumPy: Data manipulation and analysis capabilities for document statistics
- HTML: Custom HTML generation for reports and diff visualization
- Google Cloud Services: Architecture supports integration with Document AI, Cloud Storage, and AI Platform for enhanced processing capabilities
- Cloud Logging: Structured logging support for production deployment monitoring
- Python 3.11+
- Node.js 16+
- Google Gemini API Key (get it from Google AI Studio)
Create a .env file in the project root:
GEMINI_API_KEY=your_gemini_api_key_here
JWT_SECRET_KEY=your_secret_key_for_jwt
GOOGLE_CLOUD_PROJECT_ID=your_project_id (optional, for OCR/DLP)# Create virtual environment
python -m venv .venv
.\\.venv\\Scripts\\Activate.ps1
# Install dependencies
pip install -r backend/requirements.txt
# Run the backend server
uvicorn backend.main:app --reload --port 8000# In a new terminal, navigate to frontend
cd frontend
# Install dependencies
npm install
# Start development server
npm run devThe frontend will be available at http://localhost:3000 and the backend API at http://localhost:8000.
-
Register a new account at
http://localhost:3000/register -
Login with your credentials
-
Upload a contract document (PDF or image)
-
Review risk analysis and use the redline sandbox
-
Set up environment variables
Create a
.envfile in the project root:GEMINI_API_KEY=your_api_key_here
Or export the environment variable:
export GEMINI_API_KEY=your_api_key_here -
Run the application
streamlit run app.py
- Upload your PDF contract documents
- The system automatically processes and analyzes the document
- View identified risky clauses with risk scores
- Understand why certain clauses might be problematic
- Select and rewrite specific risky clauses
- Use AI-powered suggestions for better alternatives
- Generate comprehensive HTML or PDF reports
- Include analysis, rewrites, and side-by-side comparisons
- General Chat: Ask legal questions without a document
- Document-Context Chat: Get answers specific to your uploaded contract
- Features:
- Conversation history and context awareness
- Document-aware responses when contracts are uploaded
- Export chat history
- Clear chat and statistics
- Without Document: Ask general legal questions like "What is force majeure?"
- With Document: Reference your uploaded contract like "Explain the termination clause"
- Example Queries:
- "What are the main risks in this contract?"
- "What's the difference between arbitration and mediation?"
- "How should I negotiate better contract terms?"
Run the test suite:
# Test the chatbot functionality
python test_chatbot.py
# Test the risk detection system
python test_risk_detection.pyIMPORTANT: This tool is for informational purposes only and does not constitute legal advice. Always consult with a qualified attorney for legal matters. The AI analyses and suggestions provided are educational and should not be relied upon as legal counsel.
Gen_AI_Google_Hackathon/
βββ app.py # Main Streamlit application
βββ utils/
β βββ chatbot.py # AI Chat Assistant
β βββ clause_rewriter.py # AI clause rewriting
β βββ diff_generator.py # Text diff generation
β βββ export_manager.py # Report export functionality
β βββ pdf_processor.py # PDF document processing
β βββ risk_detector.py # Risk analysis engine
βββ test_chatbot.py # Chatbot functionality tests
βββ test_risk_detection.py # Risk detection tests
βββ sample_contract.txt # Sample contract for testing
βββ pyproject.toml # Project dependencies
βββ README.md # This file