🤖 NOHA Chatbot

A Streamlit-based chatbot powered by LLMs via LangChain and Ollama. This chatbot lets you chat with multiple AI models by choosing their persona (e.g., Astronaut, Doctor, Engineer) and even control the creativity.

Features

Multiple Models: Choose from available models (llama3.2:1b, deepseek-r1:1.5b, qwen3:1.7b).
Custom Personas: Chat with the bot as an AI Assistant, Astronaut, Banker, Content Creator, Doctor, or Engineer.
Creativity Control: Adjust temperature to make responses more creative or focused.
Chat Memory: Keeps conversation history during a session.
Custom Personas: Chat with the bot as an AI Assistant, Astronaut, Banker, Content Creator, Doctor, or Engineer.
Streamlit UI: Lightweight, interactive, and user-friendly interface.

1. Clone the Repository

git clone https://github.com/NOHA-Projects/localChatbot.git
cd localChatbot

2. Environment Setup

You can either create conda environment or setup virtual environment

# Conda Environment
conda create -n chatbot python==3.10 -y
conda activate chatbot

# Virtual Environment
python -m chatbot venv 
source venv/bin/activate # On Linux
chatbot\Scripts\activate # On Windows

3. Install the dependencies

pip install -r requirements.txt

# Setup ollama
curl -fsSL https://ollama.com/install.sh | sh # For Linux
https://ollama.com/download/windows # For Windows

# Run the following commands to install the models used in chatbot
ollama run llama3.2:1b
ollama run qwen3:1.7b 
ollama run deepseek-r1:1.5b

4. Run the application and evaluation

# Start the ollama server using the following command in a terminal
ollama serve

# Command to run the application
streamlit run app.py

# Command to run the evaluation
python evaluation.py

The output of evaluation will be generated as a .json file once the script has been executed.

5. Benchmarking LLama, Qwen & DeepSeek: Head-to-Head Chatbot Evaluation

Evaluation Metrics

Exact Match (EM) : A strict, all-or-nothing metric that checks whether the model’s prediction exactly matches the ground truth answer character-for-character. This score is computed only over questions where an exact comparison is meaningful and possible.
Flesch–Kincaid Grade Level : The Flesch–Kincaid Grade Level is a readability test that rates a piece of text on a U.S. school grade level, indicating the number of years of education a person needs to understand it.
BERT : BERTScore is an LLM evaluation metric that measures semantic similarity between a generated text and a reference text by leveraging contextual embeddings from a pre-trained BERT model.

Model	Exact Match	Flesch Kincaid Grade	BERT Score
llama3.2:1b	33 %	9	88.63 %
deepseek-r1:1.5b	100 %	9	86.94 %
qwen3:1.7b	100 %	11	89.82 %

This experiment was designed to demonstrate the integration of multiple language models within a single web application. The platform allows users to easily plug in their own models and run inference across diverse tasks—such as text summarization, instruction following, logical reasoning, and more—to compare performance and select the best-suited model for their needs.

All evaluations are conducted on a fixed set of benchmark datasets with specific capabilities (e.g., summarization quality, instructions following, or logical reasoning), ensuring consistent and reproducible comparisons. While the current setup includes a few pre-integrated models, the architecture is modular and extensible, encouraging experimentation with custom or fine-tuned models.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
chatbot_multimodels.py		chatbot_multimodels.py
evaluation.py		evaluation.py
mlflow_eval.xlsx		mlflow_eval.xlsx
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 NOHA Chatbot

Features

1. Clone the Repository

2. Environment Setup

3. Install the dependencies

4. Run the application and evaluation

5. Benchmarking LLama, Qwen & DeepSeek: Head-to-Head Chatbot Evaluation

Evaluation Metrics

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 NOHA Chatbot

Features

1. Clone the Repository

2. Environment Setup

3. Install the dependencies

4. Run the application and evaluation

5. Benchmarking LLama, Qwen & DeepSeek: Head-to-Head Chatbot Evaluation

Evaluation Metrics

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages