# Generative AI Assignment â€” Ollama Setup & Usage
**Course:** Gen AI  
**Author:** Mehdy Mokhtari  
**Date:** 1/8/1404

---

## ðŸ“˜ Introduction

In this assignment, we explore **Ollama**, an open-source platform for running and serving large language models (LLMs) locally or in Google Colab. The goal is to understand how to **install, configure, and interact with a local language model (LLaMA 3.1 â€“ 7B)** through **Ollama** and **LangChain**.

By the end of this part, you will:
- Install and run **Ollama** within Google Colab.
- Set up and serve a **local LLM instance** (LLaMA 3.1, 7B parameters).
- Connect to this instance using **LangChainâ€™s `langchain_ollama`** library.
- Send and analyze **prompts in English and Persian**.
- Understand how local LLM serving works compared to cloud-hosted APIs.

---

## What Weâ€™ll Learn

- Basics of **Ollama installation** and running models locally.
- How to **serve models** and keep them active in the Colab environment.
- How to **connect to Ollama** through the **LangChain** interface.
- How to **test model performance** by sending prompts and observing outputs.
- Foundational skills for working with **self-hosted AI models** in constrained environments.



## 1. Setup & Install Ollama in Colab

In [4]:
# Update the package list and install system utilities that help detect hardware (GPU)
!apt update && apt install -y pciutils lshw

[33m0% [Working][0m            Hit:1 https://cli.github.com/packages stable InRelease
[33m0% [Connecting to archive.ubuntu.com (185.125.190.82)] [Connecting to security.[0m                                                                               Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:3 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:4 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:5 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:6 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Hit:7 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:8 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:9 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:10 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:11 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Reading package lists... Do

In [5]:
# Check if a GPU (like T4) is available in your Colab runtime
!nvidia-smi

Fri Oct 24 17:41:14 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   39C    P8              9W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [6]:
# Download and install Ollama on the current Colab environment
!curl -fsSL https://ollama.com/install.sh | sh

>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> NVIDIA GPU installed.
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


In [7]:
# Start the Ollama server in the background so it can handle requests
# !nohup ollama serve &


# Start the Ollama service so other cells can talk to it
!nohup ollama serve > /dev/null 2>&1 &
# Give it a second to boot up
!sleep 2
# Check that the API is live (returns JSON, maybe empty)
!curl -s http://localhost:11434/api/tags


{"models":[{"name":"llama3.1:latest","model":"llama3.1:latest","modified_at":"2025-10-24T17:32:47.547414907Z","size":4920753328,"digest":"46e0c10c039e019119339687c3c1757cc81b9da49709a3b3924863ba87ca666e","details":{"parent_model":"","format":"gguf","family":"llama","families":["llama"],"parameter_size":"8.0B","quantization_level":"Q4_K_M"}}]}

In [8]:
# # Download (pull) the Llama 3.1 model (7B) so itâ€™s available locally for inference
# !ollama pull llama3.1

# Trigger model download without entering chat mode
!curl -s http://localhost:11434/api/pull -d '{"name":"llama3.1"}' | tail -n 1


{"status":"success"}


In [9]:
# Lists all downloaded models
!ollama list

NAME               ID              SIZE      MODIFIED               
llama3.1:latest    46e0c10c039e    4.9 GB    Less than a second ago    


In [10]:
# Show details about llama3.1
!ollama show llama3.1

  Model
    architecture        llama     
    parameters          8.0B      
    context length      131072    
    embedding length    4096      
    quantization        Q4_K_M    

  Capabilities
    completion    
    tools         

  Parameters
    stop    "<|start_header_id|>"    
    stop    "<|end_header_id|>"      
    stop    "<|eot_id|>"             

  License
    LLAMA 3.1 COMMUNITY LICENSE AGREEMENT            
    Llama 3.1 Version Release Date: July 23, 2024    
    ...                                              



In [11]:
# Confirm via API
!curl -s http://localhost:11434/api/tags

{"models":[{"name":"llama3.1:latest","model":"llama3.1:latest","modified_at":"2025-10-24T17:42:03.721136849Z","size":4920753328,"digest":"46e0c10c039e019119339687c3c1757cc81b9da49709a3b3924863ba87ca666e","details":{"parent_model":"","format":"gguf","family":"llama","families":["llama"],"parameter_size":"8.0B","quantization_level":"Q4_K_M"}}]}

In [12]:
# Run the Llama 3.1 model interactively to test it â€” this also ensures itâ€™s properly loaded
# !ollama run llama3.1


# Run the model once with a single prompt
!ollama run llama3.1 "Say hello from Llama 3.1 in one sentence."


[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h

## 2. Connect to Ollama & Use it

In [13]:
# !pip install -q langchain langchain-core langchain-community langchain-ollama

In [14]:
from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

In [15]:
# Create a connection to the locally served Ollama model
llm = ChatOllama(
    model="llama3.1",
    base_url="http://localhost:11434"
)

In [16]:
# --- Simple test (English prompt) ---
response_en = llm.invoke("Give me a short description of what Generative AI is.")
print("English Response:\n", response_en, "\n")

English Response:
 content='Generative AI refers to a subset of Artificial Intelligence (AI) that uses algorithms and mathematical models to generate new, original content such as images, music, videos, text, or even code. This type of AI is trained on large datasets and learns patterns, relationships, and structures within the data, allowing it to create novel outputs that are often indistinguishable from human-created ones.\n\nGenerative AI has applications in various fields, including:\n\n* Art: generating realistic images, paintings, or sculptures\n* Music: composing original music pieces\n* Writing: creating short stories, articles, or even entire books\n* Design: generating new product designs or visual concepts\n\nThe goal of Generative AI is to create something new and valuable from scratch, rather than simply manipulating or transforming existing content.' additional_kwargs={} response_metadata={'model': 'llama3.1', 'created_at': '2025-10-24T17:42:13.810841072Z', 'done': True,

In [17]:
# --- Simple test (Persian prompt) ---
response_fa = llm.invoke("ØªÙˆØ¶ÛŒØ­ Ú©ÙˆØªØ§Ù‡ÛŒ Ø¯Ø±Ø¨Ø§Ø±Ù‡ Ù‡ÙˆØ´ Ù…ØµÙ†ÙˆØ¹ÛŒ Ù…ÙˆÙ„Ø¯  Ø¨Ø¯Ù‡.")
print("Persian Response:\n", response_fa, "\n")

Persian Response:
 content='Ù‡ÙˆØ´ Ù…ØµÙ†ÙˆØ¹ÛŒ Ù…ÙˆÙ„Ø¯ ÛŒÚ© Ø²ÛŒØ± Ø´Ø§Ø®Ù‡ Ø§Ø² Ù‡ÙˆØ´ Ù…ØµÙ†ÙˆØ¹ÛŒ Ø§Ø³Øª Ú©Ù‡ Ø¨Ø§ ØªÙˆÙ„ÛŒØ¯ Ù…Ø­ØªÙˆØ§ÛŒà¹ƒà¸«à¸¡ Ùˆ Ø´Ø¨ÛŒÙ‡ Ø¨Ù‡ ÙˆØ§Ù‚Ø¹ÛŒØª Ø¨Ø±Ø§ÛŒ Ø§Ù†Ø³Ø§Ù†Ù‡Ø§ Ø±ÙˆØ¨Ø±Ùˆ Ø§Ø³Øª .' additional_kwargs={} response_metadata={'model': 'llama3.1', 'created_at': '2025-10-24T17:42:15.07943419Z', 'done': True, 'done_reason': 'stop', 'total_duration': 1254381756, 'load_duration': 188552385, 'prompt_eval_count': 25, 'prompt_eval_duration': 38076354, 'eval_count': 38, 'eval_duration': 930497964, 'model_name': 'llama3.1'} id='run--8e7f05dd-7719-48e2-9cc9-64bf2645f489-0' usage_metadata={'input_tokens': 25, 'output_tokens': 38, 'total_tokens': 63} 



In [18]:
# --- Optional: use a LangChain prompt template for more structured interaction ---
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a concise and knowledgeable AI assistant."),
    ("human", "{question}")
])
chain = prompt | llm | StrOutputParser()

In [19]:
print("Example (LangChain pipeline):")
print(chain.invoke({"question": "Compare local and cloud-based LLMs in 3 bullet points."}))

Example (LangChain pipeline):
Here's a comparison of local and cloud-based Large Language Models (LLMs) in 3 bullet points:

â€¢ **Processing Power**: Cloud-based LLMs have access to vast amounts of processing power, making them capable of handling complex tasks and large datasets. Local LLMs, on the other hand, are limited by the hardware specifications of their host device.

â€¢ **Data Storage and Management**: Cloud-based LLMs store data externally, which allows for easier management, scalability, and collaboration. Local LLMs require local storage, which can lead to issues with data synchronization and access control.

â€¢ **Connectivity and Security**: Cloud-based LLMs rely on internet connectivity, which introduces security risks if not properly managed. Local LLMs, being self-contained, are more secure but may face limitations in terms of model updates, maintenance, and integration with external services.
