Skip to content

NOHA-Projects/localChatbot

Repository files navigation

🤖 NOHA Chatbot

A Streamlit-based chatbot powered by LLMs via LangChain and Ollama. This chatbot lets you chat with multiple AI models by choosing their persona (e.g., Astronaut, Doctor, Engineer) and even control the creativity.

Chatbot Screenshot

Features

  • Multiple Models: Choose from available models (llama3.2:1b, deepseek-r1:1.5b, qwen3:1.7b).
  • Custom Personas: Chat with the bot as an AI Assistant, Astronaut, Banker, Content Creator, Doctor, or Engineer.
  • Creativity Control: Adjust temperature to make responses more creative or focused.
  • Chat Memory: Keeps conversation history during a session.
  • Custom Personas: Chat with the bot as an AI Assistant, Astronaut, Banker, Content Creator, Doctor, or Engineer.
  • Streamlit UI: Lightweight, interactive, and user-friendly interface.

1. Clone the Repository

git clone https://github.com/NOHA-Projects/localChatbot.git
cd localChatbot

2. Environment Setup

You can either create conda environment or setup virtual environment

# Conda Environment
conda create -n chatbot python==3.10 -y
conda activate chatbot

# Virtual Environment
python -m chatbot venv 
source venv/bin/activate # On Linux
chatbot\Scripts\activate # On Windows

3. Install the dependencies

pip install -r requirements.txt

# Setup ollama
curl -fsSL https://ollama.com/install.sh | sh # For Linux
https://ollama.com/download/windows # For Windows

# Run the following commands to install the models used in chatbot
ollama run llama3.2:1b
ollama run qwen3:1.7b 
ollama run deepseek-r1:1.5b 

4. Run the application and evaluation

# Start the ollama server using the following command in a terminal
ollama serve

# Command to run the application
streamlit run app.py

# Command to run the evaluation
python evaluation.py

The output of evaluation will be generated as a .json file once the script has been executed.

5. Benchmarking LLama, Qwen & DeepSeek: Head-to-Head Chatbot Evaluation

Evaluation Metrics

  • Exact Match (EM) : A strict, all-or-nothing metric that checks whether the model’s prediction exactly matches the ground truth answer character-for-character. This score is computed only over questions where an exact comparison is meaningful and possible.
  • Flesch–Kincaid Grade Level : The Flesch–Kincaid Grade Level is a readability test that rates a piece of text on a U.S. school grade level, indicating the number of years of education a person needs to understand it.
  • BERT : BERTScore is an LLM evaluation metric that measures semantic similarity between a generated text and a reference text by leveraging contextual embeddings from a pre-trained BERT model.
Model Exact Match Flesch Kincaid Grade BERT Score
llama3.2:1b 33 % 9 88.63 %
deepseek-r1:1.5b 100 % 9 86.94 %
qwen3:1.7b 100 % 11 89.82 %

This experiment was designed to demonstrate the integration of multiple language models within a single web application. The platform allows users to easily plug in their own models and run inference across diverse tasks—such as text summarization, instruction following, logical reasoning, and more—to compare performance and select the best-suited model for their needs.

All evaluations are conducted on a fixed set of benchmark datasets with specific capabilities (e.g., summarization quality, instructions following, or logical reasoning), ensuring consistent and reproducible comparisons. While the current setup includes a few pre-integrated models, the architecture is modular and extensible, encouraging experimentation with custom or fine-tuned models.

About

Developing a chatbot with langchain and streamlit

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages