## Run Ollama in Colab
<a target="_blank" href="https://colab.research.google.com/github/LiorGazit/agentic_actions_locally_hosted/blob/main/run_ollama_in_colab.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

[List of available Ollama LLMs.](https://ollama.com/library)  
Note: This code will run in Colab but not in Windows. The reason is the Ollama setup. I do believe it would run on Linux in general, but haven't experimented outside of Google Colab.  

#### Here's a list of isses to take care of:
3. Make the monitoring chuck be a .py file as well  
4. Enhance the `spin_up_LLM()` function to accommode for a remote LLM by OpenAI  
5. **Managing Ollama Server Lifecycles:**
    Currently, you use a background process (ollama serve). Consider a controlled lifecycle using Docker containers or managed processes (e.g., via supervisord or systemd).  
6. [x] Break this notebook down to separate .py files to be sourced.  
7. [x] Insert a Colab badge.  
8. [x] Add a `.gitignore`:  
       *.log
9. [x] Apply "**Explicit Error Handling**" for each of the shell commands (see chat)  
10. [x] **Resource Monitoring & Logging:**  
    Capture and monitor resource utilization (CPU/GPU, memory usage) to ensure sustainable performance.  

In [1]:
from spin_up_LLM import spin_up_LLM
from monitor_resources import start_resource_monitoring

In [2]:
# Choose your model name and mode
llm_name = "gemma3"
mode = "local"   # or "remote" in future

# Toggle monitoring on/off:
monitor_resources = True

In [3]:
# Sping up an LLM:
model = spin_up_LLM(chosen_llm=llm_name, local_or_remote=mode)

🚀 Starting Ollama server...
→ Ollama PID: 9603
⏳ Waiting for Ollama to be ready…
🚀 Pulling model 'gemma3'…
Available models:
NAME             ID              SIZE      MODIFIED               
gemma3:latest    a2af6cc3eb7f    3.3 GB    Less than a second ago    

🚀 Installing langchain-ollama…


In [4]:
# Resource monitoring (via monitor_resources.py)
if monitor_resources:
    # logs for 1h, every 10s, into 'resource_usage.log'
    monitor_thread = start_resource_monitoring(
        duration=3600,
        interval=10,
        logfile='resource_usage.log'
    )


Starting resource monitoring for 3600s (logging every 10s to 'resource_usage.log')
→ Resource monitoring started (daemon thread).


In [5]:
from langchain_core.prompts import ChatPromptTemplate

template = """Question: {question}

Answer: Provide concise and simple answer!"""

prompt = ChatPromptTemplate.from_template(template)

chain = prompt | model

print(chain.invoke({"question": "What is a good way to continue this sentence: 'you is a ...'? It has to by syntactically correct!"}))

2025-05-24 21:00:33,81.5,10.7,0
You are a friend.
