LLM Inference with TinyLlama — Local Language Model Deployment

Problem. Experiment with running and fine-tuning lightweight large language models locally, without depending on cloud-based APIs.

Approach.

Set up a Conda environment for reproducibility and CUDA compatibility.
Loaded unsloth.Q4_K_M.gguf in GGUF format using llama.cpp bindings.
Implemented text generation pipeline within Jupyter, testing interactive prompts.
Explored model quantization (Q4_K_M) for efficient inference on limited hardware.
Experimented with temperature, max_tokens, and context length for response variety.

Tools & Environment.

🦙 Llama 3 Integration

This project uses the Llama 3 (Unsloth Quantized) model for local inference.

Model: unsloth.Q4_K_M.gguf
Size: ~4.9 GB
Framework: llama-cpp-python
Environment management: Conda (environment.yml)
Libraries: transformers, llama-cpp-python, torch, accelerate, sentencepiece

Results (qualitative).

Successfully ran conversational inference locally with controllable generation parameters.
Quantized model balanced speed and coherence well on consumer-grade GPU/CPU.
Demonstrated how small LLMs can power local chatbots or assistive apps.

What I Learned.

Model loading from GGUF snapshots via Hugging Face Hub.
Environment isolation using Conda for LLM workloads.
Balancing performance vs quality through quantization and prompt tuning.

Quick Start

# 1. create conda environment
conda env create -f environment.yml
conda activate tinyllama

git clone https://github.com/Joe-Naz01/llm_llama.git
cd llm_llama

# 2. launch notebook
jupyter notebook llm_llama.ipynb

# 3. to download the model
huggingface-cli download andriJulian/gguf_llama3_classification unsloth.Q4_K_M.gguf --local-dir ./models

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
llm_llama.ipynb		llm_llama.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Inference with TinyLlama — Local Language Model Deployment

🦙 Llama 3 Integration

Quick Start

About

Uh oh!

Releases

Packages

Languages

Joe-Naz01/llm_llama

Folders and files

Latest commit

History

Repository files navigation

LLM Inference with TinyLlama — Local Language Model Deployment

🦙 Llama 3 Integration

Quick Start

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages