A full-stack Retrieval-Augmented Generation (RAG) application that lets you chat with an AI about any GitHub developer's public repositories. Load a user's profile, and the system fetches, chunks, embeds, and indexes their repo data — then answers your questions using only that context.
- Profile Ingestion — Fetches all public repos + READMEs for any GitHub user via the GitHub API.
- RAG Pipeline — Chunks text, generates embeddings with
all-MiniLM-L6-v2, and stores them in an in-memory FAISS index. - Context-Aware Chat — Questions are answered strictly from retrieved repository context — no hallucination.
- Ollama-Powered LLM — Uses a locally-running Ollama model for answer generation (no cloud API keys needed).
- Modern Frontend — Clean, responsive chat UI built with vanilla HTML/CSS/JS.
frontend/ → Vanilla JS chat interface (served via static file server)
backend/
├── api/ → FastAPI route handlers (profile loading, Q&A)
├── config/ → App settings & environment config
├── github/ → GitHub API client & data parser
├── models/ → Pydantic request/response schemas
├── rag/ → RAG pipeline (chunker, embedder, retriever, LLM client, prompt builder)
├── utils/ → Logging utilities
└── vector/ → FAISS vector store (in-memory, per-user indexes)
- Python 3.10+
- Ollama — Install from ollama.com and pull a model:
ollama pull llama3
- GitHub Token (optional) — Increases API rate limits. Generate one at github.com/settings/tokens.
git clone <your-repo-url>
cd github-chat
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtCreate a .env file in the project root:
# GitHub (optional — increases rate limits)
GITHUB_TOKEN=ghp_your_token_here
# Ollama LLM Configuration
OPENAI_API_KEY=ollama
OPENAI_API_BASE=http://localhost:11434/v1
LLM_MODEL=llama3Note: Ollama exposes an OpenAI-compatible API at
http://localhost:11434/v1, so no code changes are required — just set the environment variables above.
uvicorn backend.main:app --reload --port 8000cd frontend
python3 -m http.server 3000Open http://localhost:3000 in your browser.
- Enter a GitHub username in the sidebar and click Load Context.
- Wait for the system to fetch repos, chunk text, and build the vector index.
- Ask questions in the chat — the AI answers based only on the loaded repository data.
All settings are managed via environment variables or a .env file:
| Variable | Default | Description |
|---|---|---|
GITHUB_TOKEN |
"" |
GitHub personal access token (optional) |
OPENAI_API_KEY |
"sk-placeholder" |
API key (set to ollama for Ollama) |
OPENAI_API_BASE |
"" |
LLM API base URL (http://localhost:11434/v1 for Ollama) |
LLM_MODEL |
"gpt-3.5-turbo" |
Model name (llama3, mistral, etc.) |
EMBEDDING_MODEL |
"all-MiniLM-L6-v2" |
Sentence-transformer model for embeddings |
CHUNK_SIZE |
500 |
Characters per text chunk |
CHUNK_OVERLAP |
50 |
Overlap between consecutive chunks |
TOP_K_RETRIEVAL |
5 |
Number of context chunks to retrieve |
| Layer | Technology |
|---|---|
| Backend | FastAPI, Uvicorn |
| LLM | Ollama (OpenAI-compatible API) |
| Embeddings | Sentence-Transformers (all-MiniLM-L6-v2) |
| Vector Store | FAISS (in-memory) |
| Frontend | Vanilla HTML / CSS / JavaScript |
| Data Source | GitHub REST API |
This project is for educational and portfolio purposes.