Running Ollama on Google Colab: A Step-by-Step Guide
https://techxplainator.com/running-ollama-on-google-colab-a-step-by-step-guide/

LEARNING todo: 
* run the ollama on colab + ngrok on colab to forward traffic into public URL. 
* Then locally use the public URL to use the ollama on colab.

In [1]:
# Download and run the Ollama Linux install script
!curl -fsSL https://ollama.com/install.sh | sh

>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


In [2]:
# Get Ngrok authentication token from colab secrets environment
from google.colab import userdata
NGROK_AUTH_TOKEN = userdata.get('NGROK_AUTH_TOKEN')

In [3]:
# Install:
#  1. aiohttp for concurrent subprocess execution in Jupyter Notebooks
#  2. pyngrok for Ngrok wrapper
!pip install aiohttp pyngrok

import asyncio
import os

# Set LD_LIBRARY_PATH so the system NVIDIA library becomes preferred
# over the built-in library. This is particularly important for
# Google Colab which installs older drivers
os.environ.update({'LD_LIBRARY_PATH': '/usr/lib64-nvidia'})

# Define run - a helper function to run subcommands asynchronously.
# The function takes in 2 arguments:
#  1. command
#  2. environment variable
async def run(cmd):
  print('>>> starting', *cmd)
  p = await asyncio.subprocess.create_subprocess_exec(
      *cmd,
      stdout=asyncio.subprocess.PIPE,
      stderr=asyncio.subprocess.PIPE
  )


# This function is designed to handle large amounts of text data efficiently.
# It asynchronously iterate over lines and print them, stripping and decoding as needed.
  async def pipe(lines):
    async for line in lines:
      print(line.strip().decode('utf-8'))


# Gather the standard output (stdout) and standard error output (stderr) streams of a subprocess and pipe them through
# the `pipe()` function to print each line after stripping whitespace and decoding UTF-8.
# This allows us to capture and process both the standard output and error messages from the subprocess concurrently.
  await asyncio.gather(
      pipe(p.stdout),
      pipe(p.stderr),
  )


# Authenticate with Ngrok
await asyncio.gather(
  run(['ngrok', 'config', 'add-authtoken', NGROK_AUTH_TOKEN])
)

Collecting pyngrok
  Downloading pyngrok-7.2.11-py3-none-any.whl.metadata (9.4 kB)
Downloading pyngrok-7.2.11-py3-none-any.whl (25 kB)
Installing collected packages: pyngrok
Successfully installed pyngrok-7.2.11
>>> starting ngrok config add-authtoken 2yuhnsfT8AInqrbK68XdwfKdoLB_7bxsHxazyy1H5ZcjUUZW
Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml


[None]

In [4]:
import subprocess, atexit, time

ollama_proc = subprocess.Popen(['ollama', 'serve'])
ngrok_proc  = subprocess.Popen(
    ['ngrok', 'http', '11434', '--host-header', 'localhost:11434',
     '--domain', 'nearby-adequately-python.ngrok-free.app']
)

# Ensure they’re killed when the notebook detaches
atexit.register(ollama_proc.kill)
atexit.register(ngrok_proc.kill)

print("Servers launched; you can now run more cells.")

Servers launched; you can now run more cells.


In [10]:
!ollama list

NAME               ID              SIZE      MODIFIED           
llama3:instruct    365c0bd3c000    4.7 GB    About a minute ago    


In [11]:
!curl -s http://127.0.0.1:11434/api/tags | jq .

[1;39m{
  [0m[34;1m"models"[0m[1;39m: [0m[1;39m[
    [1;39m{
      [0m[34;1m"name"[0m[1;39m: [0m[0;32m"llama3:instruct"[0m[1;39m,
      [0m[34;1m"model"[0m[1;39m: [0m[0;32m"llama3:instruct"[0m[1;39m,
      [0m[34;1m"modified_at"[0m[1;39m: [0m[0;32m"2025-06-23T18:00:36.905612106Z"[0m[1;39m,
      [0m[34;1m"size"[0m[1;39m: [0m[0;39m4661224676[0m[1;39m,
      [0m[34;1m"digest"[0m[1;39m: [0m[0;32m"365c0bd3c000a25d28ddbf732fe1c6add414de7275464c4e4d1c3b5fcb5d8ad1"[0m[1;39m,
      [0m[34;1m"details"[0m[1;39m: [0m[1;39m{
        [0m[34;1m"parent_model"[0m[1;39m: [0m[0;32m""[0m[1;39m,
        [0m[34;1m"format"[0m[1;39m: [0m[0;32m"gguf"[0m[1;39m,
        [0m[34;1m"family"[0m[1;39m: [0m[0;32m"llama"[0m[1;39m,
        [0m[34;1m"families"[0m[1;39m: [0m[1;39m[
          [0;32m"llama"[0m[1;39m
        [1;39m][0m[1;39m,
        [0m[34;1m"parameter_size"[0m[1;39m: [0m[0;32m"8.0B"[0m[1;39m,
        [0m[34

In [7]:
!ollama pull llama3:instruct

[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l[A[1G[?25h[?2026l[?2026h[?25l

In [12]:
# see oli originaalkäsk aga see käivitus ilma et serveeriks ühtegi mudelit.
# üleval viimaste käskudega laen mudeli alla ja kontrollin et ollama
# serveeriks mudelit

# Run multiple tasks concurrently:
#  1. Start the Ollama server.
#  2. Start ngrok to forward HTTP traffic from the local ollama api running on localhost:11434.
#     Instructions come from Ollama doc: https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-use-ollama-with-ngrok
await asyncio.gather(
    run(['ollama', 'serve']),

    # If you don't want to map to a static URL in Ngrok, uncomment line 9 and comment line 10 before running this cell
    # run(['ngrok', 'http', '--log', 'stderr', '11434', '--host-header', 'localhost:11434']),
    run(['ngrok', 'http', '--log', 'stderr', '11434', '--host-header', 'localhost:11434', '--domain', 'nearby-adequately-python.ngrok-free.app']),
)

>>> starting ollama serve
>>> starting ngrok http --log stderr 11434 --host-header localhost:11434 --domain nearby-adequately-python.ngrok-free.app
Error: listen tcp 127.0.0.1:11434: bind: address already in use
t=2025-06-23T18:02:37+0000 lvl=info msg="no configuration paths supplied"
t=2025-06-23T18:02:37+0000 lvl=info msg="using configuration at default config path" path=/root/.config/ngrok/ngrok.yml
t=2025-06-23T18:02:37+0000 lvl=info msg="open config file" path=/root/.config/ngrok/ngrok.yml err=nil
t=2025-06-23T18:02:37+0000 lvl=warn msg="can't bind default web address, trying alternatives" obj=web addr=127.0.0.1:4040
t=2025-06-23T18:02:37+0000 lvl=info msg="starting web service" obj=web addr=127.0.0.1:4041 allow_hosts=[]
t=2025-06-23T18:02:37+0000 lvl=eror msg="failed to reconnect session" obj=tunnels.session err="authentication failed: Your account is limited to 1 simultaneous ngrok agent sessions.\nYou can run multiple simultaneous tunnels from a single agent session by defining

[None, None]