<a href="https://colab.research.google.com/github/TouseeqZ/LLM-Collab-deployment/blob/main/ollama_collab_ngrok.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Install Ollama

In [1]:
# Download and run the Ollama Linux install script
!curl -fsSL https://ollama.com/install.sh | sh

>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
############################################################################################# 100.0%
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


 Retrieve Ngrok Token

In [2]:
# Get Ngrok authentication token from Colab secrets environment
from google.colab import userdata
NGROK_AUTH_TOKEN = userdata.get('NGROK_AUTH_TOKEN')

Install Dependencies and Set Up Environment

In [3]:
# Install necessary packages: aiohttp for async subprocess execution and pyngrok for Ngrok integration
!pip install aiohttp pyngrok

import asyncio
import os

# Set LD_LIBRARY_PATH to prioritize system NVIDIA libraries over built-in ones
os.environ.update({'LD_LIBRARY_PATH': '/usr/lib64-nvidia'})

# Define an async helper function to run commands asynchronously
async def run(cmd):
    print('>>> starting', *cmd)
    p = await asyncio.subprocess.create_subprocess_exec(
        *cmd,
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE
    )

    # Asynchronously process and print the output and error streams
    async def pipe(lines):
        async for line in lines:
            print(line.strip().decode('utf-8'))

    await asyncio.gather(
        pipe(p.stdout),
        pipe(p.stderr),
    )

# Authenticate with Ngrok using the token
await run(['ngrok', 'config', 'add-authtoken', NGROK_AUTH_TOKEN])

Collecting pyngrok
  Downloading pyngrok-7.2.0-py3-none-any.whl.metadata (7.4 kB)
Downloading pyngrok-7.2.0-py3-none-any.whl (22 kB)
Installing collected packages: pyngrok
Successfully installed pyngrok-7.2.0
>>> starting ngrok config add-authtoken 2mU5lm2KMeB8BmFQyOOWmzpbGOz_724i5RQ5rtsnUEzPJTXEn
Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml


Start Ollama Server and Ngrok Tunnel

In [None]:
# Run multiple tasks concurrently:
# 1. Start the Ollama server.
# 2. Start Ngrok to forward HTTP traffic from the local Ollama API running on localhost:11434.
await asyncio.gather(
    run(['ollama', 'serve']),
    run(['ngrok', 'http', '--log', 'stderr', '11434', '--host-header', 'localhost:11434']),
    # Uncomment the next line and replace with your Ngrok domain if using a static URL
    # run(['ngrok', 'http', '--log', 'stderr', '11434', '--host-header', 'localhost:11434', '--domain', 'insert-your-static-ngrok-domain-here']),
)

>>> starting ollama serve
>>> starting ngrok http --log stderr 11434 --host-header localhost:11434
Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIN7Bi8EU1hMgsSPu4z1YwatZP7rdiN1lL5jhcH2ldg6d

2024/09/25 10:38:41 routes.go:1153: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.