## Run Ollama in Colab
<a target="_blank" href="https://colab.research.google.com/github/LiorGazit/agentic_actions_locally_hosted/blob/main/run_ollama_in_colab.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

[List of available Ollama LLMs.](https://ollama.com/library)  
Note: This code will run in Colab but not in Windows. The reason is the Ollama setup. I do believe it would run on Linux in general, but haven't experimented outside of Google Colab.  

#### Here's a list of isses to take care of:
3. Make the monitoring chuck be a .py file as well  
4. Enhance the `spin_up_LLM()` function to accommode for a remote LLM by OpenAI  
5. **Managing Ollama Server Lifecycles:**
    Currently, you use a background process (ollama serve). Consider a controlled lifecycle using Docker containers or managed processes (e.g., via supervisord or systemd).  
6. [x] Break this notebook down to separate .py files to be sourced.  
7. [x] Insert a Colab badge.  
8. [x] Add a `.gitignore`:  
       *.log
9. [x] Apply "**Explicit Error Handling**" for each of the shell commands (see chat)  
10. [x] **Resource Monitoring & Logging:**  
    Capture and monitor resource utilization (CPU/GPU, memory usage) to ensure sustainable performance.  

In [None]:
monitor_resources = True

In [4]:
# 1) import our helper
from spin_up_LLM import spin_up_LLM

# 2) choose your model name and mode
llm_name = "gemma3"
mode    = "local"   # or "remote" in future

# 3) spin it up
model = spin_up_LLM(chosen_llm=llm_name, local_or_remote=mode)

# 4) use it immediately!
from langchain_core.prompts import ChatPromptTemplate

template = """Question: {question}

Answer: Let's think step by step. Provide your answer in concise bullet points!"""

prompt = ChatPromptTemplate.from_template(template)
chain  = prompt | model

print(chain.invoke({"question": "solve 1+2"}))


🚀 Installing Ollama...
🚀 Starting Ollama server...
→ Ollama PID: 5610
⏳ Waiting for Ollama to be ready…
🚀 Pulling model 'gemma3'…
Available models:
NAME             ID              SIZE      MODIFIED               
gemma3:latest    a2af6cc3eb7f    3.3 GB    Less than a second ago    

🚀 Installing langchain-ollama…
*   1 + 2
*   3



In [None]:
# This is a .py script to be sourced:
# ------------- Resource Monitoring and Logging -------------
from datetime import datetime
import psutil
import requests
import threading

def log_resource_usage(logfile='resource_usage.log', duration=60, interval=5):
    """
    Logs CPU, Memory, and GPU utilization every `interval` seconds for `duration` seconds.
    """
    print(f"Starting resource monitoring for {duration} seconds (logged to '{logfile}')...")
    end_time = datetime.now().timestamp() + duration
    with open(logfile, 'w') as f:
        f.write("Timestamp,CPU_%,Memory_%,GPU_%\n")
        while datetime.now().timestamp() < end_time:
            timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
            cpu_usage = psutil.cpu_percent(interval=1)
            memory_usage = psutil.virtual_memory().percent
            gpu_usage = get_gpu_usage()
            log_line = f"{timestamp},{cpu_usage},{memory_usage},{gpu_usage}\n"
            f.write(log_line)
            f.flush()  # Flush header immediately
            print(log_line.strip())
            sleep(max(0, interval - 1))

def get_gpu_usage():
    """
    Returns GPU utilization (%) if an NVIDIA GPU is present, else returns 'N/A'.
    Requires NVIDIA GPUs and nvidia-smi installed.
    """
    try:
        gpu_query = subprocess.check_output(
            "nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits",
            shell=True, stderr=subprocess.DEVNULL, text=True
        )
        gpu_usage = gpu_query.strip().split("\n")[0]  # first GPU
        return gpu_usage
    except Exception:
        return "N/A"


# Start resource logging (in parallel, won't block):
monitor_thread = threading.Thread(
    target=log_resource_usage,
    kwargs={'duration': 3600, 'interval': 10},
    daemon=True
)

if monitor_resources:
  monitor_thread.start()

Starting resource monitoring for 3600 seconds (logged to 'resource_usage.log')...


In [5]:
from langchain_core.prompts import ChatPromptTemplate

template = """Question: {question}

Answer: Provide concise and simple answer!"""

prompt = ChatPromptTemplate.from_template(template)

chain = prompt | model

print(chain.invoke({"question": "What is a good way to continue this sentence: 'you is a ...'? It has to by syntactically correct!"}))

You are a friend.
