## Run Ollama in Colab
<a target="_blank" href="https://colab.research.google.com/github/LiorGazit/agentic_actions_locally_hosted/blob/main/run_ollama_in_colab.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

[List of available Ollama LLMs.](https://ollama.com/library)  
Note: This code will run in Colab but not in Windows. The reason is the Ollama setup. I do believe it would run on Linux in general, but haven't experimented outside of Google Colab.  

#### Here's a list of isses to take care of:
2. Break this notebook down to separate .py files to be sourced.  
3. **Managing Ollama Server Lifecycles:**
    Currently, you use a background process (ollama serve). Consider a controlled lifecycle using Docker containers or managed processes (e.g., via supervisord or systemd).  
1. Insert a Colab badge.  
4. [x] Add a `.gitignore`:  
       *.log 
5. [x] Apply "**Explicit Error Handling**" for each of the shell commands (see chat)  
6. [x] **Resource Monitoring & Logging:**  
    Capture and monitor resource utilization (CPU/GPU, memory usage) to ensure sustainable performance.  

In [3]:
monitor_resources = True

In [4]:
# This is a .py script to be sourced: (I will need to format it as a separate file to be called with a variable llm_name)
import shutil
import subprocess
from time import sleep
import requests

# Choice of LLM:
llm_name = "gemma3"  # "mistral-small"  # "mistral"

# Install Ollama via shell
if shutil.which('ollama') is None:
    print("Ollama not found, installing...")
    shell_output_curl_command = subprocess.run(
        'curl https://ollama.ai/install.sh | sh',
        capture_output=True, text=True, shell=True
    )
    if shell_output_curl_command.returncode != 0:
        raise RuntimeError(f"Error installing Ollama: {shell_output_curl_command.stderr}")
else:
    print("Ollama is already installed.")

# Start Ollama server in background
print("Starting Ollama server...")
process_serve = subprocess.Popen(
    'OLLAMA_HOST=127.0.0.1:11434 ollama serve > serve.log 2>&1 &',
    stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True
)
print(f"Started Ollama process with PID: {process_serve.pid}")

# Function to ensure Ollama is ready
def wait_for_ollama_ready(timeout=15):
    print("Waiting for Ollama server to be ready...")
    for _ in range(timeout):
        try:
            response = requests.get("http://localhost:11434")
            if response.status_code == 200:
                print("Ollama server is ready.")
                return
        except requests.exceptions.ConnectionError:
            sleep(1)
    raise RuntimeError("Ollama server failed to start within timeout.")

wait_for_ollama_ready()

# Pull LLM model
print(f"Pulling '{llm_name}' LLM model...")
shell_output_pull_LLM = subprocess.run(
    f'ollama pull {llm_name}', capture_output=True, text=True, shell=True
)
if shell_output_pull_LLM.returncode != 0:
    raise RuntimeError(f"Error pulling '{llm_name}': {shell_output_pull_LLM.stderr}")

# Verify available models
shell_output_models_list = subprocess.run(
    'ollama list', capture_output=True, text=True, shell=True
)
if shell_output_models_list.returncode != 0:
    raise RuntimeError(f"Error listing models: {shell_output_models_list.stderr}")
else:
    print(f"Available models:\n{shell_output_models_list.stdout}")

# Install LangChain Ollama integration
print("Installing langchain-ollama via pip.")
pip_langchainollama_command = subprocess.run(
    'pip install -U langchain-ollama', capture_output=True, text=True, shell=True
)
if pip_langchainollama_command.returncode != 0:
    raise RuntimeError(f"Error installing 'langchain-ollama': {pip_langchainollama_command.stderr}")

# Import and configure LLM
from langchain_ollama.llms import OllamaLLM
model = OllamaLLM(model=llm_name)

print("LLM setup complete and ready for use.")


Ollama is already installed.
Starting Ollama server...
Started Ollama process with PID: 1757
Waiting for Ollama server to be ready...
Ollama server is ready.
Pulling 'gemma3' LLM model...
Available models:
NAME             ID              SIZE      MODIFIED               
gemma3:latest    a2af6cc3eb7f    3.3 GB    Less than a second ago    

Installing langchain-ollama via pip.
LLM setup complete and ready for use.


In [5]:
# This is a .py script to be sourced:
# ------------- Resource Monitoring and Logging -------------
from datetime import datetime
import psutil
import requests
import threading

def log_resource_usage(logfile='resource_usage.log', duration=60, interval=5):
    """
    Logs CPU, Memory, and GPU utilization every `interval` seconds for `duration` seconds.
    """
    print(f"Starting resource monitoring for {duration} seconds (logged to '{logfile}')...")
    end_time = datetime.now().timestamp() + duration
    with open(logfile, 'w') as f:
        f.write("Timestamp,CPU_%,Memory_%,GPU_%\n")
        while datetime.now().timestamp() < end_time:
            timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
            cpu_usage = psutil.cpu_percent(interval=1)
            memory_usage = psutil.virtual_memory().percent
            gpu_usage = get_gpu_usage()
            log_line = f"{timestamp},{cpu_usage},{memory_usage},{gpu_usage}\n"
            f.write(log_line)
            f.flush()  # Flush header immediately
            print(log_line.strip())
            sleep(max(0, interval - 1))

def get_gpu_usage():
    """
    Returns GPU utilization (%) if an NVIDIA GPU is present, else returns 'N/A'.
    Requires NVIDIA GPUs and nvidia-smi installed.
    """
    try:
        gpu_query = subprocess.check_output(
            "nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits",
            shell=True, stderr=subprocess.DEVNULL, text=True
        )
        gpu_usage = gpu_query.strip().split("\n")[0]  # first GPU
        return gpu_usage
    except Exception:
        return "N/A"


# Start resource logging (in parallel, won't block):
monitor_thread = threading.Thread(
    target=log_resource_usage,
    kwargs={'duration': 3600, 'interval': 10},
    daemon=True
)

if monitor_resources:
  monitor_thread.start()

Starting resource monitoring for 3600 seconds (logged to 'resource_usage.log')...


In [6]:
from langchain_core.prompts import ChatPromptTemplate

template = """Question: {question}

Answer: Provide concise and simple answer!"""

prompt = ChatPromptTemplate.from_template(template)

chain = prompt | model

print(chain.invoke({"question": "What is a good way to continue this sentence: 'you is a ...'? It has to by syntactically correct!"}))

2025-05-11 14:24:40,95.5,8.8,0
2025-05-11 14:24:50,57.5,42.8,0
2025-05-11 14:25:00,52.5,42.8,0
2025-05-11 14:25:10,54.0,42.9,0
2025-05-11 14:25:20,53.2,42.8,0
2025-05-11 14:25:30,99.5,42.9,0
2025-05-11 14:25:40,52.5,42.7,0
2025-05-11 14:25:50,52.5,42.5,0
2025-05-11 14:26:00,52.8,42.6,0
2025-05-11 14:26:10,52.8,42.6,0
2025-05-11 14:26:20,99.5,42.7,0
2025-05-11 14:26:30,52.5,42.5,0
2025-05-11 14:26:40,52.0,42.6,0
2025-05-11 14:26:50,53.2,42.2,0
2025-05-11 14:27:00,53.0,42.4,0
You are a friend.
