CS5393 Midterm Project – Exploring Open-Source LLMs with Ollama

Project Overview

This midterm project explores the capabilities and limitations of open-source Large Language Models (LLMs) running locally via Ollama. The project focuses on:

Basic model exploration across four types of tasks.
Focused experimentation on prompt engineering techniques.

To ensure reproducibility and solve environment issues, all experiments were conducted using Docker containers.

How to Run the Project

1. Clone the Repository

git clone <https://github.com/Gitlio11/CS5393-Midterm>
cd CS5393-MIDTERM

2. Set Up Docker Environment

# Build and start the Docker containers
docker-compose up

3. Run the Models

In a separate terminal:

# Run one of the available models
ollama run llama2

ollama run mistral

ollama run tinyllama

4. Project Structure

CS5393-MIDTERM/
├── app/
│   ├── main.py
│   ├── requirements.txt
│   └── results/        # This is where experiment results are stored
├── model_outputs/      # I place the model response samples
│   ├── llama2/
│   ├── mistral/
│   └── tinyllama/
├── report/             # Final analysis and report
├── docker-compose.yml
├── Dockerfile
└── README.md

Prompt Engineering Experiments

This project tests four different prompt engineering techniques across three models:

Models Tested

Llama2: Meta's open-source LLM known for general-purpose capabilities
Mistral: A newer model with strong reasoning capabilities
TinyLlama: A smaller, more efficient model

Prompt Engineering Techniques

1. Zero-Shot Prompting

Direct questions without examples:

"What is the capital of Sweden?"
"Explain quantum entanglement in simple terms."
"How do you calculate compound interest?"

2. Few-Shot Prompting

Questions with example Q&A pairs provided to guide the model:

Question: What is the capital of France?
Answer: Paris

Question: What is the capital of Japan?
Answer: Tokyo

Question: What is the capital of Brazil?
Answer: Brasília

Question: What is the capital of Sweden?

3. Chain-of-Thought Prompting

Questions that encourage step-by-step reasoning:

"If I have 5 apples and give 2 to my friend, then buy 3 more and eat 1, how many apples do I have left? Let's think step-by-step."
"A train travels at 60 mph. How far will it travel in 2.5 hours? Let's think step-by-step."
"If a shirt costs $25 and is on sale for 20% off, then there's an additional 10% discount at checkout, what is the final price? Let's think step-by-step."

4. Self-Consistency

Running the same reasoning questions multiple times to check for consistency:

"What is 15 × 27? Think carefully and solve this step-by-step."
"If today is Tuesday, what day will it be after 19 days? Think carefully and solve this step-by-step."
"John has twice as many marbles as Tom. Tom has 5 fewer marbles than Sarah. Sarah has 15 marbles. How many marbles does John have? Think carefully and solve this step-by-step."

Results and Analysis

The experiment results are stored in the app/results/ directory, organized by technique and model. A comprehensive analysis can be found in the report/ollama-report.md file.

Key Findings

The smaller the model got the faster it was to answer
Generally the larger models were more correct
There were cases where they all got answers incorrect

Requirements and Dependencies

Prerequisites

Docker and Docker Compose
Git
At least 8GB of RAM for running the models
Approximately 10GB of disk space for model storage

Models

The project uses models that will be automatically downloaded via Ollama when first run. Each model has different size requirements:

Llama2: ~3.8GB
Mistral: ~4.1GB
TinyLlama: ~1.1GB

Limitations and Future Work

Models run locally and have more limited capabilities compared to cloud-based LLMs
First inference can be slow as models load into memory
(Add more limitations you discovered during testing)

Future improvements could include:

Expanding to include additional open-source models
Quantitative analysis of response quality
(Add your ideas for future work)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CS5393 Midterm Project – Exploring Open-Source LLMs with Ollama

Project Overview

How to Run the Project

1. Clone the Repository

2. Set Up Docker Environment

3. Run the Models

4. Project Structure

Prompt Engineering Experiments

Models Tested

Prompt Engineering Techniques

1. Zero-Shot Prompting

2. Few-Shot Prompting

3. Chain-of-Thought Prompting

4. Self-Consistency

Results and Analysis

Key Findings

Requirements and Dependencies

Prerequisites

Models

Limitations and Future Work

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
model_outputs		model_outputs
report		report
results		results
.DS_Store		.DS_Store
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml

Gitlio11/CS5393-Midterm

Folders and files

Latest commit

History

Repository files navigation

CS5393 Midterm Project – Exploring Open-Source LLMs with Ollama

Project Overview

How to Run the Project

1. Clone the Repository

2. Set Up Docker Environment

3. Run the Models

4. Project Structure

Prompt Engineering Experiments

Models Tested

Prompt Engineering Techniques

1. Zero-Shot Prompting

2. Few-Shot Prompting

3. Chain-of-Thought Prompting

4. Self-Consistency

Results and Analysis

Key Findings

Requirements and Dependencies

Prerequisites

Models

Limitations and Future Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages