A modern data stack that can be quickly deployed and is ready to scale. This repository includes:
- PostgreSQL: Database and storage
- Adventure Works: Sample dataset from Microsoft
- dbt Core: Data transformation
- Streamlit: Interactive dashboards with AI Analytics Assistant
- Airbyte: Extract and Load (via
abctl- see Airbyte Setup) - AI Assistant: Natural language queries powered by OpenAI GPT-4 (see AI Setup)
- OPA Access Control: Fine-grained authorization with OPA/OPAL (see OPAL Setup)
- Docker and Docker Compose installed
- Git (for Adventure Works installation)
- For Airbyte: 8GB+ RAM, 4+ CPUs recommended
Start all infrastructure services (PostgreSQL, SQL Server, Streamlit, dbt-docs, Airbyte):
./start.shThis script will:
- Start PostgreSQL, Streamlit, dbt-docs, and SQL Server
- Automatically install AdventureWorks database on SQL Server if it doesn't exist
- Automatically install and start Airbyte if not already installed (takes ~30 minutes on first run)
Use Airbyte to ingest data from SQLServer AdventureWorks database to PostgresSQL data_warehouse database. Remember to bring schema.table from source to data_warehouse database to match with dbt source. Once data has been ingested into PostgreSQL, run the analytics stack:
./launch.shThis script will:
- Run dbt models to transform AdventureWorks data
- Generate dbt documentation
- Build and launch Streamlit dashboard
- Build and launch dbt documentation
- Open both applications in your browser
Stop all services:
./stop.shThis will stop all services including Airbyte (if running).
Once started, access the services at:
| Service | URL | Credentials |
|---|---|---|
| PostgreSQL | localhost:5432 |
postgres/postgres |
| Streamlit Dashboard | http://localhost:8501 | - |
| dbt Documentation | http://localhost:8080 | - |
| SQL Server | localhost:1433 |
sa/YourStrong@Passw0rd |
| Airbyte Web UI | http://localhost:8000 | See Airbyte Setup |
| OPA (Access Control) | http://localhost:8181 | - |
| OPAL Server | http://localhost:7002 | (when Git configured) |
| OPAL Client | http://localhost:7001 | (when Git configured) |
# Check all containers
docker-compose ps
# View logs
docker-compose logs -f [service_name]
# Check specific service
docker-compose logs -f postgres
docker-compose logs -f streamlit
docker-compose logs -f dbt-docs
docker-compose logs -f sqlserverOption 1: Using Docker Compose directly
# Start all services
docker-compose up -d
# Or start services individually
docker-compose up -d postgres # PostgreSQL database
docker-compose up -d streamlit # Streamlit dashboard
docker-compose up -d dbt-docs # dbt documentation server
docker-compose up -d sqlserver # SQL Server databaseThe ./start.sh script automates all of this (including Airbyte installation), but here's the manual process:
docker-compose up -d postgresWait for PostgreSQL to be ready (healthcheck will verify):
docker-compose ps postgresConnection Details:
- Host:
localhost - Port:
5432(or value fromPOSTGRES_PORTenv var) - User:
postgres(or value fromPOSTGRES_USERenv var) - Password:
postgres(or value fromPOSTGRES_PASSWORDenv var) - Database:
data_warehouse(or value fromPOSTGRES_DBenv var)
Quick Connection Test:
docker exec -it data_warehouse_postgres psql -U postgres -d data_warehousedocker-compose up -d streamlitAccess at http://localhost:8501 (or the port specified in STREAMLIT_PORT).
docker-compose up -d dbt-docsAccess at http://localhost:8080 (or the port specified in DBT_DOCS_PORT).
Note: The dbt-docs service automatically generates documentation on startup.
Airbyte is not included in docker-compose.yml because:
- Docker Compose is deprecated: Airbyte deprecated Docker Compose in version 1.0 (September 2024)
- Images not available: Airbyte doesn't publish images to Docker Hub - they're only available via Helm charts
- Official method:
abctlis the only officially supported local deployment method - Better experience:
abctlhandles everything automatically (Kubernetes, Helm charts, image management)
Automatic Installation (Recommended):
The ./start.sh script automatically installs and starts Airbyte if it's not already installed:
- Installs
abctlif needed - Installs Airbyte (takes ~30 minutes on first run)
- Starts the Airbyte server
Manual Installation:
If you prefer to install Airbyte manually:
cd airbyte
./setup_with_abctl.shThis will:
- Install
abctlif needed - Set up Airbyte using Kubernetes (via
kind) - Deploy using Helm charts
- Provide you with credentials
The Airbyte web UI will be available at http://localhost:8000.
Note: Airbyte requires significant resources (recommended: 8GB+ RAM, 4+ CPUs). See airbyte/README.md for detailed setup and troubleshooting.
The ./start.sh script automatically installs AdventureWorks on SQL Server if it doesn't exist. To install manually:
./adventureworks/install_adventureworks_sqlserver.shThis will:
- Start SQL Server if not running
- Download and restore the AdventureWorks2022 database
- Verify the installation
See adventureworks/README.md for detailed installation instructions.
You can customize the configuration using environment variables or a .env file:
# PostgreSQL
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DB=data_warehouse
POSTGRES_PORT=5432
# Streamlit
STREAMLIT_PORT=8501
# dbt Docs
DBT_DOCS_PORT=8080
# SQL Server
SQLSERVER_SA_PASSWORD=YourStrong@Passw0rd
SQLSERVER_PORT=1433Option 1: Using the stop script (recommended)
./stop.shThis will stop all services including Airbyte (if running).
Option 2: Using Docker Compose directly
# Stop all services
docker-compose down
# Stop specific service
docker-compose stop [service_name]
# Stop and remove volumes (⚠️ deletes data)
docker-compose down -vTo stop Airbyte separately:
abctl local stopdata_warehouse/
├── docker-compose.yml # Core services (postgres, streamlit, dbt-docs, sqlserver)
├── start.sh # Startup script for all services
├── launch.sh # Launch analytics stack (dbt, docs, streamlit)
├── stop.sh # Shutdown script for all services
├── opal/ # OPAL access control
│ ├── policies/ # Rego policy files
│ ├── data/ # Authorization data (roles, users)
│ ├── docker-compose.yml # OPAL service definitions
│ ├── setup.sh # Setup and management script
│ ├── client.py # Python client library
│ └── README.md # OPAL documentation
├── dbt/ # dbt project
│ ├── models/ # SQL models (staging, intermediate, marts)
│ │ └── schema_ai.md # Auto-generated AI context (by run_dbt.sh)
│ ├── scripts/
│ │ └── generate_ai_schema.py # AI sync script (auto-run on dbt run)
│ ├── profiles.yml # Database connection config
│ ├── run_dbt.sh # Convenience script for running dbt commands
│ ├── setup_venv.sh # Virtual environment setup script
│ ├── setup_schema.sh # Database schema setup script
│ ├── generate_docs.sh # Documentation generation script
│ └── README.md # dbt-specific documentation
├── streamlit/ # Streamlit dashboard
│ ├── app.py # Main dashboard application
│ ├── pages/ # Multi-page analytics modules
│ ├── ai/ # AI Analytics Assistant module
│ │ └── allowed_tables.json # Auto-generated table whitelist (by dbt run)
│ ├── run.sh # Convenience script for local development
│ ├── requirements.txt # Python dependencies
│ └── README.md # Streamlit-specific documentation
├── adventureworks/ # AdventureWorks installation files
│ ├── install_adventureworks_sqlserver.sh # SQL Server installation script
│ └── README.md # AdventureWorks and SQL Server installation documentation
└── airbyte/ # Airbyte Core setup
├── README.md # Airbyte setup instructions
├── setup_with_abctl.sh # Official Airbyte setup script
└── troubleshooting.md # Troubleshooting guide
The Streamlit dashboard includes an AI Analytics Assistant that lets you query your data using natural language (e.g., "What is our revenue by territory?").
- Go to OpenAI Platform
- Create an account or sign in
- Click "Create new secret key"
- Copy the key (starts with
sk-)
Create a .env file in the streamlit/ directory:
cd streamlit
cat > .env << 'EOF'
OPENAI_API_KEY=sk-proj-your-actual-key-here
EOF
⚠️ Security: The.envfile is already in.gitignoreand will NOT be committed to Git.
- Start the dashboard:
http://localhost:8501 - Navigate to 🤖 AI Assistant in the sidebar
- Ask questions in plain English!
For more details, see streamlit/ai/README.md.
The data warehouse includes OPA (Open Policy Agent) for fine-grained access control, with optional OPAL for Git-based policy management.
OPA is enabled by default when you run ./start.sh. No additional configuration needed!
./start.sh # Starts all services including OPA| Service | URL | Description |
|---|---|---|
| OPA (standalone) | http://localhost:8181 | Policy evaluation endpoint |
| OPAL Server | http://localhost:7002 | Policy admin (when Git configured) |
| OPAL Client | http://localhost:7001 | OPAL client status |
| OPA (via OPAL) | http://localhost:8183 | Policy evaluation via OPAL |
| Mode | Command | Use Case |
|---|---|---|
| Standalone OPA | ./setup.sh start |
Local policies, simple setup |
| OPAL + OPA | ./setup.sh start-opal |
Git-synced policies, production |
cd opal
./setup.sh testcurl -X POST http://localhost:8181/v1/data/datawarehouse/authz/allow \
-H "Content-Type: application/json" \
-d '{
"input": {
"user": "admin@company.com",
"action": "read",
"resource": "mart_sales"
}
}'
# Response: {"result": true}To enable automatic policy updates from Git:
cd opal
cp env.example .env
# Edit .env and set:
# OPAL_POLICY_REPO_URL=https://github.com/YOUR_USERNAME/data_warehouse.git
# OPAL_REPO_POLICY_PATHS=opal/policiesThen restart with ./start.sh - OPAL will automatically sync policies from Git.
| Role | Description |
|---|---|
admin |
Full system access |
data_engineer |
Full data pipeline access |
senior_analyst |
Read all marts, dims, facts |
analyst |
Read marts and dims |
viewer |
Limited read-only access |
executive |
All dashboards and reports |
sales_manager |
Sales data (territory-scoped) |
hr_manager |
Employee data (department-scoped) |
operations_manager |
Inventory and production data |
OPAL_ENABLED=false ./start.shFor detailed documentation, see opal/README.md.
When you add new dbt models, the AI components are automatically synchronized:
cd dbt
./run_dbt.sh run # Runs dbt AND syncs AI componentsThis auto-generates:
dbt/models/schema_ai.md- Optimized LLM contextstreamlit/ai/allowed_tables.json- SQL validator whitelist
No manual updates needed! New tables are automatically available to the AI assistant.
Each component has its own README with component-specific details:
- dbt/README.md - dbt project setup, local development, and usage
- streamlit/README.md - Streamlit dashboard development
- streamlit/ai/README.md - AI Analytics Assistant setup
- adventureworks/README.md - AdventureWorks database installation
- airbyte/README.md - Airbyte setup and troubleshooting
- opal/README.md - OPAL access control setup and policies
-
Check if ports are already in use:
lsof -i :5432 # PostgreSQL lsof -i :8501 # Streamlit lsof -i :8080 # dbt-docs lsof -i :1433 # SQL Server
-
Check container logs:
docker-compose logs [service_name]
-
Verify Docker is running:
docker ps
# Rebuild specific service
docker-compose build [service_name]
# Rebuild and restart
docker-compose up -d --build [service_name]- dbt Documentation - dbt project details
- Streamlit Documentation - Dashboard development
- Adventure Works Installation - Sample database setup
- Airbyte Setup - Data integration platform
See LICENSE file for details.