Skip to content

Gitlio11/CS5393-Midterm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CS5393 Midterm Project – Exploring Open-Source LLMs with Ollama

Project Overview

This midterm project explores the capabilities and limitations of open-source Large Language Models (LLMs) running locally via Ollama. The project focuses on:

  1. Basic model exploration across four types of tasks.
  2. Focused experimentation on prompt engineering techniques.

To ensure reproducibility and solve environment issues, all experiments were conducted using Docker containers.

How to Run the Project

1. Clone the Repository

git clone <https://github.com/Gitlio11/CS5393-Midterm>
cd CS5393-MIDTERM

2. Set Up Docker Environment

# Build and start the Docker containers
docker-compose up

3. Run the Models

In a separate terminal:

# Run one of the available models
ollama run llama2

ollama run mistral

ollama run tinyllama

4. Project Structure

CS5393-MIDTERM/
├── app/
│   ├── main.py
│   ├── requirements.txt
│   └── results/        # This is where experiment results are stored
├── model_outputs/      # I place the model response samples
│   ├── llama2/
│   ├── mistral/
│   └── tinyllama/
├── report/             # Final analysis and report
├── docker-compose.yml
├── Dockerfile
└── README.md

Prompt Engineering Experiments

This project tests four different prompt engineering techniques across three models:

Models Tested

  • Llama2: Meta's open-source LLM known for general-purpose capabilities
  • Mistral: A newer model with strong reasoning capabilities
  • TinyLlama: A smaller, more efficient model

Prompt Engineering Techniques

1. Zero-Shot Prompting

Direct questions without examples:

  • "What is the capital of Sweden?"
  • "Explain quantum entanglement in simple terms."
  • "How do you calculate compound interest?"

2. Few-Shot Prompting

Questions with example Q&A pairs provided to guide the model:

Question: What is the capital of France?
Answer: Paris

Question: What is the capital of Japan?
Answer: Tokyo

Question: What is the capital of Brazil?
Answer: Brasília

Question: What is the capital of Sweden?

3. Chain-of-Thought Prompting

Questions that encourage step-by-step reasoning:

  • "If I have 5 apples and give 2 to my friend, then buy 3 more and eat 1, how many apples do I have left? Let's think step-by-step."
  • "A train travels at 60 mph. How far will it travel in 2.5 hours? Let's think step-by-step."
  • "If a shirt costs $25 and is on sale for 20% off, then there's an additional 10% discount at checkout, what is the final price? Let's think step-by-step."

4. Self-Consistency

Running the same reasoning questions multiple times to check for consistency:

  • "What is 15 × 27? Think carefully and solve this step-by-step."
  • "If today is Tuesday, what day will it be after 19 days? Think carefully and solve this step-by-step."
  • "John has twice as many marbles as Tom. Tom has 5 fewer marbles than Sarah. Sarah has 15 marbles. How many marbles does John have? Think carefully and solve this step-by-step."

Results and Analysis

The experiment results are stored in the app/results/ directory, organized by technique and model. A comprehensive analysis can be found in the report/ollama-report.md file.

Key Findings

  • The smaller the model got the faster it was to answer
  • Generally the larger models were more correct
  • There were cases where they all got answers incorrect

Requirements and Dependencies

Prerequisites

  • Docker and Docker Compose
  • Git
  • At least 8GB of RAM for running the models
  • Approximately 10GB of disk space for model storage

Models

The project uses models that will be automatically downloaded via Ollama when first run. Each model has different size requirements:

  • Llama2: ~3.8GB
  • Mistral: ~4.1GB
  • TinyLlama: ~1.1GB

Limitations and Future Work

  • Models run locally and have more limited capabilities compared to cloud-based LLMs
  • First inference can be slow as models load into memory
  • (Add more limitations you discovered during testing)

Future improvements could include:

  • Expanding to include additional open-source models
  • Quantitative analysis of response quality
  • (Add your ideas for future work)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published