Skip to content

aladelca/data-analyst-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Analyst Agent (Django)

Minimal Django app to upload CSV/Excel/JSON datasets, generate an initial data profile with an OpenAI agent (Code Interpreter), and chat for exploratory analysis and charts.

Setup

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Copy env file and set keys:

cp .env.example .env

Set OPENAI_API_KEY and optionally DJANGO_SECRET_KEY in .env.

Database backend

Default (no DB URL configured): local SQLite.

Recommended for staging/production: Supabase Postgres.

Set one of these in .env:

SUPABASE_DB_URL=postgresql://postgres.<project-ref>:<password>@<host>:6543/postgres
# or
DATABASE_URL=postgresql://...

Optional DB flags:

DB_SSL_REQUIRE=true
DB_CONN_MAX_AGE=600

Optional Stripe placeholders:

STRIPE_SECRET_KEY=
STRIPE_PUBLISHABLE_KEY=
STRIPE_WEBHOOK_SECRET=
STRIPE_PRICE_PRO_MONTHLY=
STRIPE_PRICE_TEAM_MONTHLY=

Optional async/email settings:

ANALYSIS_JOB_TIMEOUT_SECONDS=900
ANALYSIS_JOB_MAX_ATTEMPTS=3
EMAIL_BACKEND=django.core.mail.backends.console.EmailBackend
DEFAULT_FROM_EMAIL=noreply@dataanalystagent.local
EMAIL_HOST=localhost
EMAIL_PORT=1025
EMAIL_HOST_USER=
EMAIL_HOST_PASSWORD=
EMAIL_USE_TLS=false
EMAIL_USE_SSL=false
EMAIL_TIMEOUT=30

Run

python manage.py migrate
python manage.py runserver

Open http://127.0.0.1:8000/ and upload a dataset.

The app now requires authentication (/signup, /login) and isolates datasets by user. Pricing page supports plan switching in demo mode (Free/Pro/Team) with monthly usage limits. If Stripe keys/prices are configured, paid plans can use Stripe Checkout. Webhook endpoint is available at /billing/webhook/ (placeholder-friendly for local testing). When DEBUG=false, webhook processing requires STRIPE_WEBHOOK_SECRET. Billing management endpoint is available at /billing/portal/ (when Stripe is configured).

Tests

python manage.py test

Notes

  • Uploaded files are stored in media/.
  • Metadata, chat sessions, and artifacts are stored in the configured DB backend (SQLite by default, Supabase Postgres when DB URL is set).
  • The workspace now lists only the authenticated user's datasets and supports owner-only open/download/rename/replace/retention/delete actions.
  • User datasets are intentionally not exposed through Django admin.
  • Uploads and generated artifacts are now encrypted at rest in application storage and decrypted only through owner-authorized app flows.
  • Set FILE_ENCRYPTION_KEY to a Fernet-compatible key for a dedicated storage encryption secret. If omitted, the app derives one from DJANGO_SECRET_KEY.
  • When DJANGO_DEBUG=false, the app now expects FILE_ENCRYPTION_KEY to be configured explicitly via Django system checks.
  • Direct media URLs are no longer used for user files; charts are served through authenticated artifact routes and exported HTML embeds chart data inline.
  • Dataset management actions are recorded in an owner-scoped audit trail.
  • Replace/delete attempt best-effort cleanup of the uploaded source file previously sent to OpenAI (openai_file_id).
  • This is not zero-knowledge privacy: the backend can still decrypt files in order to process them and send analysis requests to OpenAI.
  • Default retention mode is ephemeral. Use save_analysis at upload time to persist history.
  • Monthly usage quotas are enforced for analyze and chat operations based on current plan.
  • Paid plan quotas require entitlement status (demo, active, or trialing); otherwise limits fall back to Free.
  • Dataset detail includes a downloadable executive HTML export.
  • Dataset detail now includes executive highlights and suggested questions for faster analysis.
  • Dataset detail includes domain playbooks (finance/retail/e-commerce one-click guided analyses).
  • Dataset detail includes metric dictionary coverage (mapped vs missing metrics).
  • Dataset detail supports manual metric mapping overrides persisted per dataset.
  • Dataset detail supports recurring schedules for playbook runs (weekly/monthly) with async queueing.
  • Dataset detail supports custom playbooks created by users and runnable on demand.
  • Analysis, chat, and playbook execution now run as background jobs with status polling and cancellation support.
  • Scheduled report runs are dispatched asynchronously and can send email via the configured Django email backend.
  • Cleanup command for expired ephemeral uploads:
python manage.py cleanup_ephemeral_uploads --hours 24
  • Run due scheduled reports locally:
python manage.py run_scheduled_reports --limit 20
  • Run the async job worker locally:
python manage.py run_job_worker --once --limit 20
# or continuously
python manage.py run_job_worker --poll-interval 2
  • Run the scheduler loop locally:
python manage.py run_scheduler --poll-interval 30
  • Encrypt legacy plaintext uploads/artifacts already stored on disk:
python manage.py encrypt_stored_files

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors