A Streamlit-based chatbot powered by LLMs via LangChain and Ollama. This chatbot lets you chat with multiple AI models by choosing their persona (e.g., Astronaut, Doctor, Engineer) and even control the creativity.
- Multiple Models: Choose from available models (llama3.2:1b, deepseek-r1:1.5b, qwen3:1.7b).
- Custom Personas: Chat with the bot as an AI Assistant, Astronaut, Banker, Content Creator, Doctor, or Engineer.
- Creativity Control: Adjust temperature to make responses more creative or focused.
- Chat Memory: Keeps conversation history during a session.
- Custom Personas: Chat with the bot as an AI Assistant, Astronaut, Banker, Content Creator, Doctor, or Engineer.
- Streamlit UI: Lightweight, interactive, and user-friendly interface.
git clone https://github.com/NOHA-Projects/localChatbot.git
cd localChatbotYou can either create conda environment or setup virtual environment
# Conda Environment
conda create -n chatbot python==3.10 -y
conda activate chatbot
# Virtual Environment
python -m chatbot venv
source venv/bin/activate # On Linux
chatbot\Scripts\activate # On Windowspip install -r requirements.txt
# Setup ollama
curl -fsSL https://ollama.com/install.sh | sh # For Linux
https://ollama.com/download/windows # For Windows
# Run the following commands to install the models used in chatbot
ollama run llama3.2:1b
ollama run qwen3:1.7b
ollama run deepseek-r1:1.5b # Start the ollama server using the following command in a terminal
ollama serve
# Command to run the application
streamlit run app.py
# Command to run the evaluation
python evaluation.pyThe output of evaluation will be generated as a .json file once the script has been executed.
- Exact Match (EM) : A strict, all-or-nothing metric that checks whether the model’s prediction exactly matches the ground truth answer character-for-character. This score is computed only over questions where an exact comparison is meaningful and possible.
- Flesch–Kincaid Grade Level : The Flesch–Kincaid Grade Level is a readability test that rates a piece of text on a U.S. school grade level, indicating the number of years of education a person needs to understand it.
- BERT : BERTScore is an LLM evaluation metric that measures semantic similarity between a generated text and a reference text by leveraging contextual embeddings from a pre-trained BERT model.
| Model | Exact Match | Flesch Kincaid Grade | BERT Score |
|---|---|---|---|
| llama3.2:1b | 33 % | 9 | 88.63 % |
| deepseek-r1:1.5b | 100 % | 9 | 86.94 % |
| qwen3:1.7b | 100 % | 11 | 89.82 % |
This experiment was designed to demonstrate the integration of multiple language models within a single web application. The platform allows users to easily plug in their own models and run inference across diverse tasks—such as text summarization, instruction following, logical reasoning, and more—to compare performance and select the best-suited model for their needs.
All evaluations are conducted on a fixed set of benchmark datasets with specific capabilities (e.g., summarization quality, instructions following, or logical reasoning), ensuring consistent and reproducible comparisons. While the current setup includes a few pre-integrated models, the architecture is modular and extensible, encouraging experimentation with custom or fine-tuned models.