Skip to content

MantaYuana/CLARA_AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

148 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

βš–οΈ CLARA

Contract & Legal AI Reasoning Assistant

Software Engineering

πŸ† 2nd Place β€” Hackvidia at Arkavidia 10.0

Node.js TypeScript React Neo4j Redis Gemini License: MIT

CLARA is an AI-powered legal assistant purpose-built for Indonesian MSMEs (Micro, Small & Medium Enterprises). It helps business owners understand contracts and employment law, draft legal documents, and detect risky clauses β€” all without needing a lawyer on retainer.

Features Β· Architecture Β· Quick Start Β· API Docs Β· Contributing


πŸ“– Table of Contents


🎯 About the Project

Indonesian MSMEs frequently sign contracts they don't fully understand, often without access to legal counsel. CLARA bridges this gap by combining:

  • Retrieval-Augmented Generation (RAG) over a curated Indonesian legal knowledge base
  • Self-Consistency Reasoning with Jensen-Shannon entropy confidence scoring
  • Knowledge Graph (Neo4j) for symbolic legal reasoning via Cypher traversal
  • AI-powered document drafting for MoU, LoI, and PKS document types
  • OCR + guardrail pipeline that automatically flags illegal contract clauses

CLARA won 2nd Place at the Hackvidia competition, Arkavidia 10.0 β€” a national-level IT competition hosted by HMIF ITB.


✨ Features

πŸ“„ Contract Review & Risk Analysis

Upload any contract (PDF or image) and get an instant, structured legal risk report:

  • Clause-by-clause explanation in plain language
  • Severity-tagged violations: CRITICAL, WARNING, INFO
  • Automatic detection of illegal patterns (forced seizure, excessive penalties, illegal wage cuts, etc.)
  • Statutory citations (Indonesian Law, Government Regulations, Ministry Decrees)

πŸ” Legal Q&A (RAG Pipeline)

Ask questions about Indonesian employment law and contract regulation:

  • Hybrid Retrieval: Dense vector search + BM25 full-text search + symbolic Neo4j graph traversal, fused via Reciprocal Rank Fusion (RRF)
  • Self-Consistency Loop: Generates multiple reasoning paths, measures divergence (Jensen-Shannon entropy), and maps to a green / yellow / red confidence level
  • Answers always cite specific articles and laws (Pasal N UU No. X Tahun YYYY)

✍️ AI Document Drafter

Conversational smart drafter for legal documents:

  • Supports MoU (Memorandum of Understanding), LoI (Letter of Intent), and PKS (Cooperation Agreement)
  • Multi-turn dialogue: CLARA asks clarifying questions until all required fields are gathered
  • Detects legally binding terms and warns before generating
  • Outputs a structured Markdown document + downloadable PDF
  • Post-generation guardrail scan on the produced draft

πŸ” Authentication & Session Management

  • Google OAuth 2.0 login
  • JWT-based stateless session
  • Per-user chat history persisted in Neo4j

πŸ“Š User Dashboard

  • Aggregated view of all uploaded contract reviews and drafting projects
  • File management with source tracing per conversation

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        React (Vite) Frontend                        β”‚
β”‚  Pages: Landing Β· Login Β· Chat Β· Files Β· Home                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚ REST / JWT
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      Express.js Backend (Node)                      β”‚
β”‚                                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  /auth   β”‚  β”‚/contract β”‚  β”‚  /query  β”‚  β”‚    /drafter      β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                     β”‚              β”‚                  β”‚             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚                    Service Layer                               β”‚ β”‚
β”‚  β”‚  OCR Service  β†’  Guardrail  β†’  Hybrid Retrieval  β†’  Reasoning β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  BullMQ Worker  β”‚   β”‚  Neo4j DB   β”‚   β”‚  Google Gemini API   β”‚  β”‚
β”‚  β”‚  (async OCR)    β”‚   β”‚  (KG + RAG) β”‚   β”‚  (LLM + Embeddings)  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚           β”‚ Redis Queue                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Google Cloud  β”‚
    β”‚  Vision (OCR)  β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

RAG & Reasoning Pipeline

User Query
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           Hybrid Retrieval             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚  Dense   β”‚ β”‚  BM25  β”‚ β”‚Symbolic  β”‚ β”‚
β”‚  β”‚  (768d   β”‚ β”‚ (full  β”‚ β”‚(Neo4j    β”‚ β”‚
β”‚  β”‚ vector)  β”‚ β”‚  text) β”‚ β”‚ Cypher)  β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚         Reciprocal Rank Fusion         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
                   β–Ό
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚   Reasoning     β”‚
         β”‚  Service (N=3   β”‚
         β”‚   paths)        β”‚
         β”‚                 β”‚
         β”‚  JS Entropy  β†’  β”‚
         β”‚  Confidence     β”‚
         β”‚  green/yellow/  β”‚
         β”‚  red            β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚
                  β–Ό
           Final Answer
          + Citations
          + Confidence

πŸ› οΈ Tech Stack

Layer Technology
Frontend React 19, Vite 7, Tailwind CSS 4, Framer Motion, React Router DOM 7
Backend Node.js 22, Express.js 5, TypeScript 5.7
AI / LLM Google Gemini 2.5 Flash, Google Gemini Embeddings (gemini-embedding-001, 768d)
OCR Google Cloud Vision API
Database Neo4j 5.18 Community (APOC + Graph Data Science plugins)
Queue BullMQ + Redis 7
Auth Passport.js, Google OAuth 2.0, JSON Web Tokens
PDF Generation pdf-lib
API Docs Swagger UI (OpenAPI 3.0)
Containerization Docker + Docker Compose
Deployment Vercel (Frontend), Docker (Backend)

πŸ“‚ Project Structure

CLARA_AI/
β”œβ”€β”€ docker-compose.yml          # Orchestrates Neo4j, Redis, and Backend
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ index.ts            # Express app entry point
β”‚   β”‚   β”œβ”€β”€ config/             # env, Neo4j, Passport, Redis, Swagger
β”‚   β”‚   β”œβ”€β”€ middleware/         # JWT auth guard
β”‚   β”‚   β”œβ”€β”€ routes/             # auth, chat, contract, document, drafter, query
β”‚   β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”‚   β”œβ”€β”€ chat/           # Chat history persistence (Neo4j)
β”‚   β”‚   β”‚   β”œβ”€β”€ dashboard/      # User project aggregation
β”‚   β”‚   β”‚   β”œβ”€β”€ drafter/        # Multi-turn document drafting + PDF export
β”‚   β”‚   β”‚   β”œβ”€β”€ embedding/      # Gemini embedding service
β”‚   β”‚   β”‚   β”œβ”€β”€ guardrail/      # Statutory limit & clause violation checks
β”‚   β”‚   β”‚   β”œβ”€β”€ ocr/            # Google Cloud Vision OCR
β”‚   β”‚   β”‚   β”œβ”€β”€ reasoning/      # Self-consistency loop, JS-entropy, citations
β”‚   β”‚   β”‚   β”œβ”€β”€ retrieval/      # Dense, BM25, Symbolic, Hybrid (RRF) retrieval
β”‚   β”‚   β”‚   └── user/           # User creation & lookup
β”‚   β”‚   β”œβ”€β”€ workers/
β”‚   β”‚   β”‚   └── analysisWorker.ts  # BullMQ worker for async OCR jobs
β”‚   β”‚   β”œβ”€β”€ queues/
β”‚   β”‚   β”‚   └── analysisQueue.ts   # BullMQ queue definition
β”‚   β”‚   β”œβ”€β”€ scripts/            # DB init, PDF seeding, knowledge seeding
β”‚   β”‚   └── utils/              # Response helpers
β”‚   β”œβ”€β”€ base_knowledge/         # Curated Indonesian legal PDFs for RAG seeding
β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”œβ”€β”€ package.json
β”‚   └── tsconfig.json
└── frontend/
    β”œβ”€β”€ src/
    β”‚   β”œβ”€β”€ pages/              # Landing, Login, Home, ChatDetail, Files
    β”‚   β”œβ”€β”€ components/         # ChatBubble, ChatPanel, SourcesPanel, StudioPanel…
    β”‚   β”œβ”€β”€ hooks/              # useAuth, useChat, useProjects, useSources…
    β”‚   β”œβ”€β”€ Services/           # Axios service wrappers per domain
    β”‚   └── lib/                # Configured Axios instance
    β”œβ”€β”€ public/
    β”œβ”€β”€ vite.config.js
    └── package.json

πŸš€ Quick Start

Prerequisites


1. Clone the Repository

git clone https://github.com/your-org/clara-ai.git
cd clara-ai

2. Configure Environment Variables

cp backend/.env.example backend/.env

Fill in all required values in backend/.env (see Environment Variables).

Place your Google Cloud Vision service account JSON at:

backend/clara-google-cloud-vision.json

3. Start Infrastructure (Neo4j + Redis)

docker compose up neo4j redis -d

Wait for both services to be healthy:

docker compose ps   # both should show "(healthy)"

4. Install Backend Dependencies & Initialize the Database

cd backend
npm install
npm run init-schema   # creates Neo4j constraints and indexes
npm run seed:pdf      # seeds base Indonesian legal knowledge into Neo4j

5. Start the Backend

npm run dev           # runs on http://localhost:3001

6. Start the Frontend

cd ../frontend
npm install
npm run dev           # runs on http://localhost:5173

Open http://localhost:5173 in your browser.


Docker (Full Stack)

To run all services including the backend in Docker:

docker compose up --build
Service URL
Frontend (dev) http://localhost:5173
Backend API http://localhost:3001
API Docs (Swagger) http://localhost:3001/api/docs
Neo4j Browser http://localhost:7474

πŸ”§ Environment Variables

Create backend/.env based on the table below:

Variable Description Example
PORT Backend port 3001
NODE_ENV Environment development
NEO4J_URI Neo4j Bolt URI bolt://localhost:7687
NEO4J_USER Neo4j username neo4j
NEO4J_PASSWORD Neo4j password clara_password
GOOGLE_AI_API_KEY Gemini API key AIza...
GEMINI_MODEL Gemini model name gemini-2.5-flash
EMBEDDING_MODEL Embedding model gemini-embedding-001
EMBEDDING_DIMENSION Embedding vector size 768
GOOGLE_APPLICATION_CREDENTIALS Path to GCV service account JSON clara-google-cloud-vision.json
JWT_SECRET Secret for signing JWTs <long random string>
OAUTH_GOOGLE_CLIENT_ID Google OAuth client ID 123...apps.googleusercontent.com
OAUTH_GOOGLE_CLIENT_SECRET Google OAuth client secret GOCSPX-...
REASONING_PATHS Number of self-consistency paths 3
TEMPERATURE_LOW Temperature for conservative reasoning 0.1
TEMPERATURE_HIGH Temperature for exploratory reasoning 0.7
MAX_CONTEXT_TOKENS Max tokens in Gemini context 8192
TOP_K_DENSE Top-K for dense retrieval 5
TOP_K_BM25 Top-K for BM25 retrieval 5
TOP_K_SYMBOLIC Top-K for symbolic/graph retrieval 5
HYBRID_DENSE_WEIGHT RRF weight for dense leg 0.5
HYBRID_BM25_WEIGHT RRF weight for BM25 leg 0.3
HYBRID_SYMBOLIC_WEIGHT RRF weight for symbolic leg 0.2
MAX_FILE_SIZE_MB Maximum upload file size 10
UPLOAD_DIR Local upload directory ./uploads
VITE_API_URL Frontend β†’ Backend base URL http://localhost:3001

πŸ“š API Documentation

After starting the backend, interactive Swagger docs are available at:

http://localhost:3001/api/docs

Key Endpoints

Method Endpoint Auth Description
GET /health β€” Service health check
GET /api/v1/auth/google β€” Initiate Google OAuth flow
GET /api/v1/auth/google/callback β€” OAuth callback, returns JWT
POST /api/v1/document/analyze Optional Upload contract PDF/image for async OCR analysis (returns 202)
GET /api/v1/document/:id/status Optional Poll OCR job status
POST /api/v1/contract/review βœ… JWT Review an already-analyzed contract; runs guardrail + reasoning
POST /api/v1/query βœ… JWT Ask a legal question via hybrid RAG + self-consistency
POST /api/v1/drafter/chat βœ… JWT Multi-turn document drafting conversation
GET /api/v1/chat/sessions βœ… JWT List user's chat sessions
GET /api/v1/chat/sessions/:id βœ… JWT Get message history for a session

πŸ§ͺ Running Tests

cd backend
npm test

The test suite uses Jest + ts-jest. Test files follow the *.test.ts convention.

Notable test files:

  • src/services/guardrail/guardrailService.test.ts
  • src/services/retrieval/hybridRetrieval.test.ts

🚒 Deployment

Backend

The backend is fully containerized. For production, deploy via Docker Compose on any VPS or cloud VM:

docker compose up --build -d

Make sure NODE_ENV=production and update NEO4J_URI / REDIS_URL to point to your managed services.

Frontend

The frontend is configured for Vercel deployment (vercel.json is included). All routes are rewired to index.html for SPA routing.

cd frontend
npm run build       # outputs to dist/
vercel --prod       # or connect your GitHub repo in the Vercel dashboard

Set VITE_API_URL in Vercel's Environment Variables to point to your backend URL.


🀝 Contributing

We welcome contributions! Please follow these steps:

1. Fork & Branch

git fork https://github.com/your-org/clara-ai.git
git checkout -b feat/your-feature-name

2. Development Workflow

# Backend
cd backend && npm run dev

# Frontend (separate terminal)
cd frontend && npm run dev

3. Code Style

  • Backend: TypeScript strict mode. Follow existing service patterns (service class β†’ route handler separation).
  • Frontend: React functional components with hooks. Keep service calls in src/Services/.
  • All new backend endpoints must include a Swagger JSDoc comment block.

4. Commit Convention

Follow Conventional Commits:

feat: add penalty clause detection to guardrail
fix: resolve RRF weight normalization bug
docs: update retrieval architecture section
refactor: extract confidence label mapping to util

5. Open a Pull Request

  • Target the main branch
  • Include a short description of what and why
  • Reference any related issues

6. Reporting Issues

Use GitHub Issues. Include:

  • Steps to reproduce
  • Expected vs actual behavior
  • Relevant logs or screenshots

πŸ‘₯ Team

CLARA was built with ❀️ by a team of 4 engineers competing at Arkavidia 10.0 Hackvidia:

Name Role
Manta Yuana Backend & Project Manager
Kadek Pindra Frontend & Integration
Rama Dita Fullstack & Data Engineering
Dewa Surya Frontend & UI/UX Designer

πŸ“„ License

This project is licensed under the MIT License. See LICENSE for details.


Made for Indonesian MSMEs Β· Built at Arkavidia 10.0 Hackvidia Β· πŸ† 2nd Place

About

CLARA is an AI-powered legal assistant purpose-built for Indonesian MSMEs

https://clara-ai-nine.vercel.app

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors