AI-Powered Data Platform

A production-grade, multi-tenant platform that centralizes business data and enables AI agents to analyze, query, and act on it.

Built from scratch as a solo full-stack engineer — 20+ shared libraries, 6 microservices in Python and TypeScript.

The Problem

Small and medium businesses generate data across multiple platforms (ERPs, e-commerce, spreadsheets) but lack the tools to centralize, analyze, and act on it. Hiring data teams is expensive. Generic BI tools require technical expertise.

The Solution

The solution is a data-centralization and analysis platform that creates a context layer so AI agents can perform tasks effectively — from answering natural-language questions about sales data, to generating reports, to managing knowledge bases — all scoped per tenant with strict data isolation.

Dashboard — real-time KPI scorecards, charts, and AI chat in a unified interface

Platform Features

📊 Data Analysis & Visualization

Ingest data from multiple sources (BigQuery, Shopify, VTEX, CSV/XLSX uploads), transform it into a star-schema analytics layer, and visualize it through interactive dashboards with scorecards, bar charts, and detail views.

Detail view — drill-down into individual product analytics with AI-generated insights

🗣️ Natural Language to SQL

Users ask questions in plain language; the platform converts them to safe, validated SQL queries. A defense-in-depth pipeline ensures security:

Parse — AST validation via sqlglot (only SELECT allowed)
Validate — table/column allowlists, mandatory filters, PII masking
Rewrite — expand SELECT *, inject LIMIT, enforce client_id filter
Execute — via PostgREST with RLS enforcement

Text-to-SQL — natural language query converted to validated SQL with results rendered in the chat

📚 Knowledge Base (Hybrid RAG)

Upload documents (PDF, DOCX, TXT, CSV) to build per-tenant knowledge bases. The retrieval pipeline combines multiple strategies for high-quality answers:

Semantic search — pgvector cosine similarity with multilingual embeddings
Keyword search — PostgreSQL full-text search (BM25)
Reciprocal Rank Fusion — merges semantic + keyword results
Reranking — Cohere, CrossEncoder, or LLM-based reranking
MMR diversification — Maximal Marginal Relevance to avoid redundant results

RAG pipeline — hybrid retrieval with source attribution and confidence scores

Knowledge base management — upload, chunk, embed, and search documents per tenant

🔧 MCP Tool Server (20+ Tools)

A centralized FastMCP server exposes tools that agents can invoke at runtime. Tools are registered as modular packages, each with its own auth, validation, and tier gating:

Module	Tools	Description
`rag_module`	`executar_rag_cliente`	Hybrid semantic + BM25 document search
`sql_module`	`executar_sql_agent`	Safe text-to-SQL with defense-in-depth
`csv_module`	CSV analysis	Statistics, distributions, column profiling
`google_module`	Sheets, Gmail, Calendar	Full Google Workspace integration via OAuth
`common_module`	File retrieval, context	Utility tools for agent context
`web_monitor_module`	URL monitoring	Track website changes
`prompt_module`	MCP prompts	Langfuse-versioned prompt resources
`structured_data_formatter`	Output formatting	Deterministic formatting for reports
`config_helper_module`	Tool validation	Availability checks per tier

MCP tool server — modular tool registration with health introspection

🤖 Multi-Agent Architecture

The platform runs specialized agents built with LangGraph, orchestrated through a supervisor pattern:

Orchestrator (Atendente Core) — LangGraph state machine with 4 nodes: init → supervisor → execute_tools → elicit. Routes between tool execution, knowledge retrieval, data analysis, and clarification requests.
Standalone Agents — Catalog-driven factory that dynamically builds agents from database definitions. Each agent gets its own session, tools, and context.
Sales Agent / Support Agent — Specialized lightweight agents using the shared AgentBuilder fluent API.

User message → Supervisor Node → Route decision
                    ├── execute_tools → MCP Server → Tool result → Response
                    ├── elicit → Clarification question → User
                    └── respond → Direct LLM response

🔐 Multi-Tenant Security & Context Isolation

Every layer enforces tenant isolation:

PostgreSQL Row-Level Security (RLS) on all tables — 62 migrations maintain the schema
JWT validation supporting HS256 + ES256 + RS256 (Supabase Auth)
Per-request context injection — ClientContext carries tenant config, enabled tools, tier, and brand voice
Tool-level auth — each MCP tool extracts and validates JWT independently
Tier-based access control — tools, agents, and features gated by subscription tier (BASIC → PRO → ENTERPRISE → ADMIN)

📈 Observability & Prompt Management

OpenTelemetry traces exported to Grafana Cloud (Tempo, Loki, Mimir)
Langfuse as prompt management system — version-controlled prompts with A/B testing labels, Redis-cached with builtin fallbacks
One-line bootstrap — setup_observability(app, service_name) instruments any service
End-to-end tracing — from HTTP request → agent graph → tool call → LLM invocation → database query

🔄 Data Connectors & Ingestion

A factory-based connector system integrates with external data sources:

BigQuery — federated queries via Foreign Data Wrappers
Shopify / VTEX / Loja Integrada — e-commerce platform connectors
CSV/XLSX uploads — automatic parsing, column detection, and schema inference
Column mapping — AI-assisted mapping of source columns to the star-schema

Column mapping — AI-assisted mapping of imported data to the analytics schema

💬 Human-in-the-Loop (HITL)

An elicitation service handles cases where the agent needs clarification or human approval:

Multiple elicitation types: yes_no, multiple_choice, free_text
Priority queue for human review (Streamlit UI)
Audit trail for all decisions
Integrated into the agent graph as a first-class node

Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                        FRONTEND LAYER                               │
│  React 18 + TypeScript + Vite + Chakra UI                          │
│  ┌──────────────┐  ┌───────────────┐  ┌──────────────────┐        │
│  │  Dashboard    │  │  Chat Panel   │  │  HITL Review     │        │
│  │  (Scorecards, │  │  (SSE Stream) │  │  (Streamlit)     │        │
│  │   Charts)     │  │               │  │                  │        │
│  └──────┬───────┘  └──────┬────────┘  └────────┬─────────┘        │
└─────────┼─────────────────┼─────────────────────┼──────────────────┘
          │                 │                     │
          ▼                 ▼                     ▼
┌─────────────────────────────────────────────────────────────────────┐
│                      SERVICE LAYER (FastAPI)                        │
│  ┌──────────────┐  ┌───────────────┐  ┌──────────────────┐        │
│  │  Atendente   │  │  Standalone   │  │   File Upload    │        │
│  │  Core        │  │  Agent API    │  │   API            │        │
│  │  (LangGraph) │  │  (Catalog)    │  │   (Ingestion)    │        │
│  └──────┬───────┘  └──────┬────────┘  └────────┬─────────┘        │
│         │                 │                     │                   │
│         ▼                 ▼                     │                   │
│  ┌──────────────────────────────┐               │                  │
│  │   Tool Pool API (FastMCP)   │◄──────────────┘                  │
│  │   20+ tools, JWT per-tool   │                                   │
│  └──────────────┬──────────────┘                                   │
└─────────────────┼──────────────────────────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────────────────────────┐
│                       LIBRARY LAYER (20 packages)                   │
│  ┌──────────────┐ ┌──────────┐ ┌────────────┐ ┌────────────────┐  │
│  │ Agent        │ │ RAG      │ │ SQL        │ │ LLM Service    │  │
│  │ Framework    │ │ Factory  │ │ Factory    │ │ (multi-provider)│  │
│  ├──────────────┤ ├──────────┤ ├────────────┤ ├────────────────┤  │
│  │ Auth (JWT)   │ │ Context  │ │ Prompt     │ │ Observability  │  │
│  │              │ │ Service  │ │ Management │ │ Bootstrap      │  │
│  ├──────────────┤ ├──────────┤ ├────────────┤ ├────────────────┤  │
│  │ MCP Commons  │ │ Parsers  │ │ Tool       │ │ Data           │  │
│  │              │ │          │ │ Registry   │ │ Connectors     │  │
│  └──────────────┘ └──────────┘ └────────────┘ └────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────────────────────────┐
│                       DATA LAYER                                    │
│  ┌──────────────────┐ ┌──────────┐ ┌───────────┐ ┌─────────────┐  │
│  │ PostgreSQL       │ │ pgvector │ │ Redis     │ │ Supabase    │  │
│  │ (RLS, analytics  │ │ (RAG     │ │ (cache,   │ │ (Auth, Edge │  │
│  │  star-schema)    │ │  chunks) │ │  checkpts)│ │  Functions) │  │
│  └──────────────────┘ └──────────┘ └───────────┘ └─────────────┘  │
└─────────────────────────────────────────────────────────────────────┘

Shared Library Ecosystem (20 packages)

One of the core engineering decisions: every reusable capability is a library, not duplicated code. All services depend on the same shared packages:

Library	Purpose
`_agent_framework`	LangGraph builder pattern, state machines, node registry
`_auth`	JWT decode (HS256/ES256/RS256), RLS context injection
`_context_service`	Per-tenant context loading with Redis cache (5min TTL)
`_data_connectors`	Factory for BigQuery, Shopify, VTEX, Loja Integrada
`_db_connector`	SQLAlchemy async engine management
`_elicitation_service`	Agent clarification requests (yes/no, multiple choice, free text)
`_experiment_service`	Experiment manifests, batch evaluation, classification
`_google_suite_client`	Google Sheets, Gmail, Calendar with OAuth token management
`_hitl_service`	Human-in-the-loop review queue with Streamlit UI
`_llm_service`	Provider abstraction (OpenAI, Anthropic, Google, Ollama) with tier budgets
`_mcp_commons`	MCP tool dataclasses, executor with parallel invocation
`_models`	Shared Pydantic/SQLModel domain models
`_observability_bootstrap`	One-line OpenTelemetry + Langfuse + Grafana setup
`_parsers`	PDF, DOCX, CSV, TXT parsing + semantic chunking
`_prompt_management`	Langfuse prompt fetching with Redis cache and builtin fallbacks
`_rag_factory`	Hybrid retrieval (semantic + BM25 + RRF + reranking + MMR)
`_shared_utils`	Common utilities across all services
`_sql_factory`	Text-to-SQL with AST validation, allowlists, PII masking
`_supabase_client`	Typed Supabase SDK wrapper
`_tool_registry`	Tool discovery, tier validation, Docker MCP bridge
`_twilio_client`	WhatsApp webhook integration

Engineering Practices

Practice	Implementation
Monorepo structure	Single repo with `libs/`, `services/`, `apps/`, `supabase/` — shared dependencies via path imports
Factory patterns	`ConnectorFactory`, `StandaloneAgentFactory`, `RAGFactory` — pluggable components
Builder pattern	`AgentBuilder` fluent API: `.with_llm().with_mcp().with_checkpointer().build()`
Dependency injection	FastAPI `Depends()` for auth, context, and services
Defense-in-depth	SQL validation has 4 security layers; tools validate JWT independently
12-factor config	All config via environment variables, `.env` files, no hardcoded secrets
Database migrations	62 Alembic/Supabase migrations — versioned schema evolution
Code quality	`ruff` for formatting + linting, enforced via `make fmt` / `make lint`
Testing	Unit tests, E2E smoke tests, persona tests, batch evaluation with Langfuse traces
Streaming	Server-Sent Events (SSE) for real-time agent responses
Caching	Redis for context (5min TTL), prompts, agent checkpoints, tool results
Observability	OpenTelemetry → Grafana Cloud; Langfuse for LLM traces; structured logging

Tech Stack

Backend: Python 3.11+, FastAPI, Pydantic, SQLModel, LangGraph, LangChain, FastMCP

Frontend: React 18, TypeScript, Vite, Chakra UI, Recharts

AI/ML: LangGraph agents, pgvector embeddings, hybrid RAG (BM25 + semantic + RRF), Cohere reranking, multi-provider LLM (OpenAI, Anthropic, Google, Ollama)

Database: PostgreSQL with RLS, pgvector, Supabase (Auth, Edge Functions, Storage, PostgREST)

Infrastructure: Docker Compose (dev), Google Cloud Run (prod), Artifact Registry, Redis, Nginx

Observability: OpenTelemetry, Grafana Cloud (Tempo, Loki, Mimir, Faro), Langfuse

Auth: Supabase Auth, JWT (HS256/ES256/RS256), PostgreSQL RLS, per-tool tier gating

Repository Structure

apps/
├──  _dashboard/          # React 18 + TypeScript admin dashboard
├── hitl_dashboard/          # Streamlit HITL review interface
└── landing/                 # Marketing landing page

services/
├── atendente_core/          # Main LangGraph agent orchestrator
├── tool_pool_api/           # FastMCP server (20+ tools)
├── standalone_agent_api/    # Catalog-driven agent builder
├── file_upload_api/         # Document ingestion + vector pipeline
├── vendas_agent/            # Sales-specialized agent
└── support_agent/           # Support-specialized agent

libs/                        # 20 shared Python packages (see table above)

supabase/
├── migrations/              # 62 SQL migrations (RLS, star-schema, vector DB)
└── functions/               # 5 Edge Functions (search, process, sync, enrich, match)

scripts/                     # Evaluation, seeding, and utility scripts
docs/                        # Architecture documentation

Quick Start

# 1. Clone and configure
git clone https://github.com/ br/ -mono.git
cd  -mono
cp .env.example .env          # fill in your keys

# 2. Start the development stack
make dev

# 3. Open the dashboard
open http://localhost:8080

Services run with hot reload and connect to a remote Supabase instance — no local database setup required.

Available Commands

# Development
make dev               # Start core stack (dashboard + backend + tools + redis)
make dev-logs          # Tail all service logs
make dev-rebuild       # Rebuild after dependency changes

# Testing & Evaluation
make test              # Unit tests
make smoke-test        # End-to-end integration
make batch-run         # Batch test with Langfuse traces
make experiment-run    # Run evaluation experiments

# Database
make migrate           # Apply Alembic migrations
make migrate-prod      # Apply to production (with confirmation)

# Code Quality
make fmt               # Format with ruff
make lint              # Lint with ruff
make lint-fix          # Auto-fix lint issues

# Deployment
make cloudrun-build    # Build Docker images
make cloudrun-push-all # Push to GCP Artifact Registry

About

This platform was designed and implemented by me as the sole engineer using Copilot , the idea is to deliver business management and productivity solutions for SMBs.

The goal: enable non-technical business users to ask questions, get reports, and manage their data through natural conversation — with AI doing the heavy lifting, securely scoped to each tenant's data.

Architectured and built by Lucas Cruz

Name		Name	Last commit message	Last commit date
Latest commit History 318 Commits
.github/workflows		.github/workflows
alembic		alembic
apps		apps
docker-configs/otel-collector		docker-configs/otel-collector
docs/internal		docs/internal
langfuse		langfuse
libs		libs
screenshots		screenshots
scripts		scripts
seeds		seeds
services		services
supabase		supabase
tests		tests
.dockerignore		.dockerignore
.env.cloudrun.example		.env.cloudrun.example
.gcloudignore		.gcloudignore
.gitignore		.gitignore
.gitkeep		.gitkeep
.secrets.baseline		.secrets.baseline
.trivyignore		.trivyignore
Makefile		Makefile
README.md		README.md
docker-compose.cloud-run.yml		docker-compose.cloud-run.yml
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Powered Data Platform

The Problem

The Solution

Platform Features

📊 Data Analysis & Visualization

🗣️ Natural Language to SQL

📚 Knowledge Base (Hybrid RAG)

🔧 MCP Tool Server (20+ Tools)

🤖 Multi-Agent Architecture

🔐 Multi-Tenant Security & Context Isolation

📈 Observability & Prompt Management

🔄 Data Connectors & Ingestion

💬 Human-in-the-Loop (HITL)

Architecture Overview

Shared Library Ecosystem (20 packages)

Engineering Practices

Tech Stack

Repository Structure

Quick Start

Available Commands

About

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI-Powered Data Platform

The Problem

The Solution

Platform Features

📊 Data Analysis & Visualization

🗣️ Natural Language to SQL

📚 Knowledge Base (Hybrid RAG)

🔧 MCP Tool Server (20+ Tools)

🤖 Multi-Agent Architecture

🔐 Multi-Tenant Security & Context Isolation

📈 Observability & Prompt Management

🔄 Data Connectors & Ingestion

💬 Human-in-the-Loop (HITL)

Architecture Overview

Shared Library Ecosystem (20 packages)

Engineering Practices

Tech Stack

Repository Structure

Quick Start

Available Commands

About

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages