Complete Sinhala chatbot system using Hybrid RAG (JSON + Text), Streamlit UI, FAISS retrieval, and Ollama local inference.
- Fully offline capable at runtime
- Ollama local LLM inference (
http://localhost:11434/api/generate) - Streamlit chat interface
- Sinhala Unicode input/output
- In-session memory for recent conversation turns (last 10)
- Hybrid retrieval flow:
- JSON semantic retrieval (Top 2)
- Text semantic retrieval from selected topics (Top 3)
- Context merge and grounded generation
project/
├── app.py
├── chatbot/
│ ├── hybrid_retriever.py
│ ├── json_retriever.py
│ ├── text_retriever.py
│ ├── embeddings.py
│ ├── prompt.py
│ ├── ollama.py
│ ├── memory.py
│ ├── build_indexes.py
│ └── test_queries.py
├── data/
│ ├── knowledge.json
│ └── documents/
│ ├── headache.txt
│ ├── stress.txt
│ └── ...
├── vectorstore/
│ ├── json_index.faiss
│ └── text_index.faiss
├── requirements.txt
└── README.md
- Ollama must be installed locally.
- Gemma model must already exist locally.
- Sentence-transformers embedding model must be locally available.
Runtime code uses local-only embedding loading by default (EMBEDDING_LOCAL_ONLY=1).
pip install -r requirements.txtollama serve
ollama pull gemmaAfter model pull is completed once, usage is local/offline.
python -m chatbot.build_indexesThis generates:
vectorstore/json_index.faissvectorstore/text_index.faiss
streamlit run app.py --server.port 8501Open:
The exact assignment prompt is used in chatbot/prompt.py with strict fallback:
"මට ඒ පිළිබඳ ප්රමාණවත් තොරතුරු නොමැත"
Run retrieval tests:
python -m chatbot.test_queriesThe script includes 20 Sinhala test cases and checks retrieval correctness by expected topic.
- No keyword matching is used for retrieval.
- Full dataset is never sent to the model.
- Only retrieved context is sent to Ollama.