# Run an LLM with Ollama server in a Colab Notebook

This Colab notebook demonstrates how to easily run an LLM using Ollama.  We’ll set up an Ollama server within Colab, allowing you to interact with powerful language models directly from your browser.

To efficiently run large language models (LLMs), this setup combines Ollama's computational capabilities with Colab’s accessible cloud environment, allowing users to execute advanced AI tasks directly from their browsers without needing local resources.

Let’s get started!

By: Sebastian Bassi [DNALinux.com](https://dnalinux.com)

<br>
<br>
<br>

**Note**: If you are seeing this notebook in GitHub or in a non-colab server, press the following button to run it in Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DNALinux/ai/blob/main/notebooks/Ollama_notebook.ipynb)

# Preparation work (install dependencies)

In [None]:
!apt install pciutils lshw
!curl -fsSL https://ollama.com/install.sh | sh


# Start the LLM server (Ollama)

In [2]:
import os
import threading
import subprocess
import requests
import json

def ollama():
    os.environ['OLLAMA_HOST'] = '0.0.0.0:11434'
    os.environ['OLLAMA_ORIGINS'] = '*'
    subprocess.Popen(["ollama", "serve"])


ollama_thread = threading.Thread(target=ollama)
ollama_thread.start()

# Download an LLM model


Your choice of LLM model depends on your GPU's capabilities and your tolerance for wait times.

## Key Considerations

Larger models (e.g., **Llama4** or **Llama3.3:70B**) require significant GPU RAM (over 50 GB). On platforms like Google Colab, this is only feasible with an A100 GPU. Don't use T4 GPUs for these models, as they are incompatible.

Speed vs. Quality: Larger models deliver higher-quality outputs but respond slower. Smaller models are faster but produce less accurate results.

Balance: For a middle ground between speed, resource usage, and output quality, consider **Gemma:12B** or **Llama3.1:8B**.

T4 Compatibility: **Phi4** offers strong performance and, despite its size, is compatible with T4 GPUs.

Don't use V2 TPU on Google Colab since it is slow for most tasks from the notebook.

In [3]:
# @title Select models to download
import subprocess
# @markdown ---

# @markdown Select a LLM model:

LLM_model = "gemma3:4b" # @param ["phi4", "deepseek-r1:7b", "gemma3:4b", "gemma3:12b", "llama4", "llama3.3:70b", "llama3.2:3b", "llama3.1:8b"]
!ollama pull {LLM_model}
# @markdown ---
# @markdown ### For more information on available models, check [Ollama](https://ollama.com/search)

!ollama list

[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G

Next cell reads all the PDF files in the **data_dir** directory, and it is processed into a Vector database named **db_name** and located at **db_dir**

In [4]:
# @title Run a query against the LLM
# @markdown Select the model and enter the prompt

# @markdown ---
# @markdown ### Select a model:
LLM_model = "gemma3:4b" # @param ["phi4", "gemma3:4b", "gemma3:12b", "llama4", "llama3.3:70b", "llama3.2:3b", "llama3.1:8b"]
# @markdown ### Enter a prompt:
query = "If a train travels at 60 mph for 2 hours and then 40 mph for 3 hours, what is the total distance traveled?" # @param {type:"string"}
escaped_input = query.replace("'", "\\'")
# @markdown ---

!ollama run {LLM_model} {escaped_input}


[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h

In [5]:
# @title Run a query against the LLM
# @markdown Select the model and enter the prompt

# @markdown ---
# @markdown ### Select a model:
LLM_model = "gemma3:4b" # @param ["phi4", "gemma3:4b", "gemma3:12b", "llama4", "llama3.3:70b", "llama3.2:3b", "llama3.1:8b"]
# @markdown ### Enter a prompt:
query = "Generate a haiku about artificial intelligence" # @param {type:"string"}
escaped_input = query.replace("'", "\\'")
# @markdown ---

!ollama run {LLM_model} {escaped_input}


[?2026h[?25l[1G[?25h[?2026l[?25l[?2026h[?25l[1G[?25h[?2026l[2K[1G[?25hCode[?25l[?25h learns[?25l[?25h and[?25l[?25h evolves[?25l[?25h,[?25l[?25h
[?25l[?25hMim[?25l[?25hicking[?25l[?25h a[?25l[?25h human[?25l[?25h mind[?25l[?25h,[?25l[?25h
[?25l[?25hFuture[?25l[?25h'[?25l[?25hs[?25l[?25h logic[?25l[?25h blooms[?25l[?25h.[?25l[?25h

[?25l[?25h

In [6]:
# @title Run a query against the LLM
# @markdown Select the model and enter the prompt

# @markdown ---
# @markdown ### Select a model:
LLM_model = "gemma3:4b" # @param ["phi4", "gemma3:4b", "gemma3:12b", "llama4", "llama3.3:70b", "llama3.2:3b", "llama3.1:8b"]
# @markdown ### Enter a prompt:
query = "What is the difference between mitosis and meiosis in cell biology?" # @param {type:"string"}
escaped_input = query.replace("'", "\\'")
# @markdown ---

!ollama run {LLM_model} {escaped_input}


[?2026h[?25l[1G[?25h[?2026l[?25l[?2026h[?25l[1G[?25h[?2026l[2K[1G[?25hOkay[?25l[?25h,[?25l[?25h let[?25l[?25h'[?25l[?25hs[?25l[?25h break[?25l[?25h down[?25l[?25h the[?25l[?25h differences[?25l[?25h between[?25l[?25h mitosis[?25l[?25h and[?25l[?25h meiosis[?25l[?25h –[?25l[?25h they[?25l[?25h'[?25l[?25hre[?25l[?25h both[?25l[?25h processes[?25l[?25h of[?25l[?25h cell[?25l[?25h division[?25l[?25h,[?25l[?25h but[?25l[?25h they[?25l[?25h serve[?25l[?25h dramatically[?25l[?25h different[?25l[?25h purposes[?25l[?25h.[?25l[?25h Here[?25l[?25h'[?25l[?25hs[?25l[?25h a[?25l[?25h detailed[?25l[?25h comparison[?25l[?25h:[?25l[?25h

[?25l[?25h**[?25l[?25h1[?25l[?25h.[?25l[?25h Mit[?25l[?25hosis[?25l[?25h:[?25l[?25h Cell[?25l[?25h Division[?25l[?25h for[?25l[?25h Growth[?25l[?25h &[?25l[?25h Repair[?25l[?25h**[?25l[?25h

[?25l[?25h*[?25l[?25h **[?25l[?25hPurpose[?25l[?25h:**[

# Advanced options

In [None]:
import tempfile


# @title Adjust model parameters
# @markdown To adjust model parameter in Ollama, you need to create a derived model. This cell will generate a derived model.

# @markdown **Warning**: Not all model support all parameter. Using wrong parameters may generate a degraded model.

# @markdown ---
# @markdown Select a model to change (input model):
LLM_model = "gemma3:4b" # @param ["phi4", "gemma3:4b", "gemma3:12b", "llama4", "llama3.3:70b", "llama3.2:3b", "llama3.1:8b"]
# @markdown Enter new model name:
new_model = "genma3T07" # @param {type:"string"}
# @markdown Enter new template file name (if blank will use a random name):
tpl_fn = "" # @param {type:"string","placeholder":"Modelfile"}

if tpl_fn == "":
    # make a random filename
    tpl_fn = tempfile.mktemp(suffix='.txt')

# @markdown Temperature:  Increasing the temperature will make the model answer more creatively.
temp = 0.7 # @param {"type":"slider","min":0,"max":1,"step":0.05}
# @markdown num_ctx: Sets the size of the context window used to generate the next token
num_ctx = 4018 # @param {"type":"slider","min":1024,"max":18000,"step":1}
# @markdown seed: Sets the size of the context window used to generate the next token
seed = 4482 # @param {"type":"slider","min":0,"max":10000,"step":1}
# @markdown Maximum number of tokens to predict when generating text. (Default: -1, infinite generation)
num_predict = 4790 # @param {"type":"slider","min":-1,"max":10000,"step":1}
# @markdown Top K: Reduces the probability of generating nonsense. A higher value will give more diverse answers, while a lower value will be more conservative.
top_k = 20 # @param {"type":"slider","min":1,"max":100,"step":1}
# @markdown Top P: Works together with top-k. A higher value will lead to more diverse text, while a lower value will generate more focused and conservative text.
top_p = 0.47 # @param {"type":"slider","min":0,"max":1,"step":0.01}
# @markdown Min P: Alternative to the top_p, and aims to ensure a balance of quality and variety. The parameter p represents the minimum probability for a token to be considered, relative to the probability of the most likely token. For example, with p=0.05 and the most likely token having a probability of 0.9, logits with a value less than 0.045 are filtered out.
min_p = 0 # @param {"type":"slider","min":0,"max":1,"step":0.01}


# @markdown ---

# @markdown [More about model parameters](https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values)

# @markdown ---
model_file = f"""FROM {LLM_model}
PARAMETER temperature {temp}
PARAMETER num_ctx {num_ctx}
PARAMETER seed {seed}
PARAMETER num_predict {num_predict}
PARAMETER top_k {top_k}
PARAMETER top_p {top_p}
PARAMETER min_p {min_p}
"""

#mdir = "/content/miniforge3/envs/ml/lib/python3.10/site-packages/ollama/models/"

with open(f"{tpl_fn}", "w") as f:
    f.write(model_file)


!ollama create {new_model} -f {tpl_fn}

