<a href="https://colab.research.google.com/github/beatriz-fulgencio/LLM-workshop/blob/main/exercises/class1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Part One: Environment Setup & Ollama Basics

**Goals**
- Prepare Google Colab (or local) environment
- Install pinned versions of required libraries (LlamaIndex, embeddings, etc.)
- Install and run **Ollama** locally on Colab
- Sanity‑check GPU/CPU and tokenization

> ✅ You **only** need to run this once per runtime. Subsequent notebooks assume these deps are installed.


In [None]:
#@title 🔧 Runtime checks
!nvidia-smi || echo "No GPU found (that's fine, just slower)"

In [None]:
!pip install colab-xterm

In [None]:
%reload_ext colabxterm

## Start a Colab terminal & install Ollama
If you want to actually run models **locally** with Ollama in Colab, you can try the script below. Note that Colab VMs may kill long-running processes.

This is just an example on how LLMs can be run on Command Line Interfaces


###Lanch a CLI in colab and serve Ollama

run the following commands in the cli


```
curl -fsSL https://ollama.com/install.sh | sh
ollama serve & u
ollama pull orca-mini
ollama pull mistral
```

After the models are downloaded run


```
ollama run orca-mini
ollama run mistral
```

### run this to launch CLI


In [None]:
xterm

##Question instructions
1. Create a text field after each question to answer it
2. You may use the provided material or other resources - even know LMMs (ChatGPT) - to complete these tasks
  

### Question 1
  Give a brief description of how LLMs architecture works and specially how Ollama architecture works.

### Question 2
  What does the 1b and 7b as characteristics of the models?

### Question 3

Try running both models. In a note, write down their use of CPU, GPU RAM.
Try asking `What are Large Language Models` and compare the answers. What are the main differences you encountered?
Now, try asking ` What were the brazilian movies seen in cannes in 2025?` what does the model answer?

### Question 4

Now run

```
    ollama pull llama3.2-vision:11b
```

in the terminal and after it is pulled, run the model and feed it an image with the following command:


```
ollama run llama3.2-vision

/[path_to_img] [question]"
```

What process was done on the image so the models understands it?

# Part Two: Quick LLM Sanity Test and testing Llamaindex with simple RAG
Below we will:
1. Dwnload imports
2. Configure LlamaIndex global `Settings`
3. Run a simple prompt to ensure everything works


In [None]:
#@title install pip dependencies
!pip install llama-index llama-index-llms-ollama llama-index-embeddings-huggingface llama_index-readers-file langchain_community

### Load Data
1. Download a the simple HTML file from https://letterboxd.com/journal/best-of-cannes-2025/ and load it into this colab

In [None]:
from llama_index.core import SimpleDirectoryReader

def loader(path):
    documents = SimpleDirectoryReader(path, required_exts=['.html']).load_data()
    return documents

documents = loader("/content")
doc = documents[0]

### Question 5:
What other data loader could have been used that would have prevented us to download and store the html file in a directory first hand?

In [None]:
#insert new code here

### Question 6:
Look at the generated document and its attributes. What can you say about the quality of the text ? Are all the information useful for future prompts answering ? What are the costs of having useless data in our document ? Can we give this directly to a LLM as an input ? What step is missing ?

In [None]:
#insert new code here

Now we are going to chunk your text before embedding to make it easier to proccess

In [None]:
from llama_index.core.node_parser import SentenceSplitter

# Tune chunk sizes to balance context vs. retrieval precision
splitter = SentenceSplitter(chunk_size=512, chunk_overlap=50)
nodes = splitter.get_nodes_from_documents([doc]) #you can change doc to the name of your saved document



In [None]:
#@title Embedding
from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import VectorStoreIndex

Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
index = VectorStoreIndex(nodes)


INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: BAAI/bge-base-en-v1.5
INFO:sentence_transformers.SentenceTransformer:2 prompts are loaded, with the keys: ['query', 'text']


# Question 7:
What is the embedder used in our case ? What other embedder can be used and what are their differences ?

In [None]:
#@title Query
from llama_index.llms.ollama import Ollama
Settings.llm = Ollama(model="mistral", request_timeout=360.0)
qe = index.as_query_engine()
print(qe.query("What were the brazilian movies seen in cannes in 2025?"))

INFO:httpx:HTTP Request: POST http://localhost:11434/api/show "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"


 In the provided context, two Brazilian films were showcased at Cannes in 2025. These are "The Secret Agent" directed by Kleber Mendonça Filho and "Sentimental Value" directed by Joachim Trier.


###Question 8:
What does the model answer? Is it more accurate than before?

#Part 3: Workflow
The goal was to create a system that can automatically recommend books on a specific topic and then further analyze those books by categorizing them into genres. This automation is achieved by leveraging the capabilities of a large language model (LLM).
For this I defined a workflow structure, BookFlow consisting of two steps:
`generate_book_list`: uses the LLM to generate a list of books about a given topic.
`classify_books`: takes the generated book list and uses the LLM to classify the books into different genres.


In [None]:
#add your code here