### "Hello World" example to connect to a local Ollama model running in a container ###

In [4]:
import os
import ollama
import logging
import time

logger = logging.getLogger(__name__)

# Import this module with autoreload
%load_ext autoreload
%autoreload 2
import llmtools
from llmtools.ollamamodel import Ollama as OllamaModel

print(f'Package version: {llmtools.__version__}')
print(f'Authors: {llmtools.__authors__}')

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Package version: v0.0.1
Authors: The Core for Computational Biomedicine at Harvard Medical School
https://dbmi.hms.harvard.edu/about-dbmi/core-computational-biomedicine


In [5]:
# Directories and files
data_dir = os.environ.get('DATA')
print(f'Data directory: {data_dir}')

Data directory: /app/data


In [6]:
# Let's check which models we have available
print(OllamaModel().list_models())

# Pull a model for demonstration
success = OllamaModel().pull_model(model_name='gemma3:latest')
print(success)

['gemma3:latest']


gemma3:latest: 100%|██████████| 489/489 [00:00<00:00, 2.56kB/s, success]                


True


In [7]:
# Run a simple prompt
system_prompt = 'You are a powerful AI system.'
user_prompt = 'Explain in a short paragraph how a transformer-based language model works.'

# Package messages
messages = OllamaModel.create_messages(system_prompt=system_prompt, user_prompt=user_prompt)
display(*messages)

# Send messages to model
model_name = 'gemma3:latest'
temperature = 0.7
client = OllamaModel().create_client()
response = client.chat(model=model_name,
                       messages=messages,
                       options={'temperature': temperature})
print()
print(response.message.content)

{'role': 'system', 'content': 'You are a powerful AI system.'}

{'role': 'user',
 'content': 'Explain in a short paragraph how a transformer-based language model works.'}


Okay, here’s a breakdown of how transformer-based language models work, in a nutshell:

At their core, transformers rely on a mechanism called “attention.” Instead of processing words sequentially like older models, they look at *all* words in a sentence simultaneously. This “attention” allows the model to understand the relationships between words, regardless of their distance. The model learns to assign weights to these relationships, indicating how important each word is to the context of others. It then uses these weights to predict the next word in a sequence, constantly refining its understanding of the text as it goes. Essentially, they’re exceptionally good at capturing long-range dependencies within language. 

---

Do you want me to delve deeper into a specific aspect, like attention mechanisms, training, or applications?
