A multi-tenant Flask app that turns a stack of PDFs into a branded, embeddable chat widget. Upload your documents, get a one-line <script> tag you can paste into any website, and ship a grounded, retrieval-based assistant trained on your own content.
Built October 2023 while exploring how to package a small RAG pipeline behind a real product surface (auth, per-user state, CDN-served widget) rather than as a notebook demo.
I wanted to learn what it takes to wrap a language model into something a non-technical user could actually deploy. The interesting parts were not the model call. They were the boring product seams: per-user document isolation, persisting assistants across restarts, generating a customizable widget per tenant, and serving it from a CDN so the host site does not need to know anything about Flask.
The result is a small but end-to-end SaaS pattern for a "chatbot for your docs" service.
- User registers, logs in, gets a unique PIN (used as the widget's tenant key).
- User uploads one or more PDFs.
- The app extracts text with PyPDF2, sentence-splits it, and builds a TF-IDF index per assistant.
- User clicks "Create Chatbot" and gets a flash-message containing a
<script>tag. - That script (served from S3 via CloudFront) injects a floating chat widget into the host page.
- When a visitor sends a message, the widget POSTs to
/chat/with the tenant PIN; the server retrieves the best-matching passage via cosine similarity and asks GPT-3.5-turbo to answer using only that passage.
A strict system prompt forces the model to refuse anything outside the uploaded corpus, which is how grounding is enforced (no embeddings DB, no vector store, just TF-IDF retrieval and a guarded prompt).
Browser (host site)
|
| <script defer src="cloudfront/<pin>_embedChatbot.js">
v
+---------------------------------------------------+
| CloudFront <---- S3 (per-tenant widget JS) |
+---------------------------------------------------+
|
| POST /chat/ {message, userPin}
v
+---------------------------------------------------+
| Flask app (app.py) |
| - Flask-Login auth, Postgres user store |
| - per-PIN Assistant dict (in-process) |
| - pickled to S3 on create/delete |
+---------------------------------------------------+
|
v
+---------------------------------------------------+
| Assistant -> PDFProcessor (PyPDF2 + regex) |
| -> Chatbot (TF-IDF + cosine sim) |
| -> OpenAI chat completions API |
+---------------------------------------------------+
Key files:
app.py(480 lines): routes, auth, S3/CloudFront glue, assistant lifecycleassistant.py: orchestrates PDF -> sentences -> retrieval -> LLMchatbot.py: TF-IDF index + GPT-3.5 call with grounding promptpdf_processor.py: PyPDF2 text extraction + basic cleanupstatic/embedChatbot.js: ~12KB self-contained widget template (CSS, HTML, fetch loop, all scoped)templates/: login, register, password reset, create-chatbot flows
You will need: Python 3.10+, Postgres, an S3 bucket fronted by CloudFront, and an OpenAI API key.
git clone https://github.com/dparikh79/AI-Chatbot.git
cd AI-Chatbot
python -m venv aichatbot_env
source aichatbot_env/bin/activate
pip install -r requirements.txtCreate a .env file at the repo root (do not commit it, see .gitignore):
# OpenAI
API_ENDPOINT=https://api.openai.com/v1/chat/completions
API_KEY=sk-...
# Flask
FLASK_SECRET_KEY=<long-random-hex>
DATABASE_URL=postgresql://user:pass@host:5432/aichatbot
# Local storage
BASE_UPLOAD_FOLDER=/absolute/path/to/uploads
ALLOWED_EXTENSIONS=pdf
# AWS (S3 + CloudFront for the embeddable widget)
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=us-east-1
AWS_BUCKET_NAME=your-bucket
CLOUDFRONT_DOMAIN=https://dxxxxxx.cloudfront.netThen:
flask --app app.py runVisit http://127.0.0.1:5000/, register, upload a PDF, click "Create Test Chatbot", and copy the generated <script> tag into any HTML page to see the widget appear.
- Backend: Flask, Flask-Login, Flask-SQLAlchemy, Flask-CORS, gunicorn
- Data: PostgreSQL (users, reset tokens), pickle on S3 (assistant cache)
- Retrieval: scikit-learn
TfidfVectorizer+ cosine similarity - LLM: OpenAI
gpt-3.5-turbovia raw HTTP (no SDK) - PDF: PyPDF2
- Storage / CDN: AWS S3 + CloudFront (per-tenant widget JS)
- Frontend: vanilla JS widget template, Jinja for the admin pages
This was an exploration, not a hardened product. If I rebuilt it now:
- Swap TF-IDF for a real embedding model (OpenAI
text-embedding-3-smallor a local sentence-transformer) and a small vector store (FAISS, pgvector, or Chroma). Cosine sim over TF-IDF is brittle for paraphrased queries. - Move the assistant cache from a pickled S3 blob to Postgres or DynamoDB. Pickle + global dict + atexit save is fine for a prototype, not for concurrent workers.
- Use a managed auth provider (Clerk, Auth0, or Cognito) instead of hand-rolled Flask-Login + SHA-256 password hashing. The current
generate_password_hash(..., method="sha256")should be bcrypt or argon2. - Move conversation history out of a per-request local variable (currently each
/chat/call starts a fresh history, so the bot has no memory). Persist per-PIN history in Redis or Postgres with a TTL. - Containerize and deploy on Fargate or Lambda instead of a single gunicorn box.
- Add streaming responses and a rate limiter (Flask-Limiter is already in
requirements.txtbut unused).
- No multi-turn memory:
process_questioncreates a newconversation_history = []every call. The bot forgets immediately. This was on my list to fix but the project moved on. - AWS coupling: the embeddable-widget path requires S3 + CloudFront. You can run the upload-and-chat flow locally without AWS by skipping
/chatbot/create/and using only/chat/directly. - Single-process assumption: the
assistantsdict lives in process memory and is reloaded from a single pickle on S3. Running multiple gunicorn workers will see stale state. - Python pickle on S3 is a known footgun: anyone with write access to the bucket could ship arbitrary code into the app. Fine for solo dev, not for production.
MIT. See LICENSE.
Built by Dhruvil Parikh while learning how a RAG prototype becomes a deployable product surface.