## Exercise 1 - Get started with Ollama


Download and install ollama from here
https://ollama.com/download/

As part of installing the python environment, you already installed the [ollama](https://pypi.org/project/ollama/) python package.

Run the next commands in a Terminal.

Now download and install the llama3 and the nomic embed model: 
```
ollama pull llama3
```
Run the llama3 model and write some prompts:
```
ollama run llama3
```

### Interact with Ollama in Python

In [None]:
from llama_index.llms.ollama import Ollama
from llama_index.core import Settings

We can also interact with Ollama from Python, e.g. using the LlamaIndex Framework.

In [None]:
# ollama

# Language model from Ollama
# llm = Ollama(model="llama3.2:1b", request_timeout=120.0)
llm = Ollama(model="llama3", request_timeout=120.0)

# Set it as the default LLM in LlamaIndex
Settings.llm = llm

In [None]:
prompt = "What is EPFL?"
response = llm.stream_complete(prompt)

for r in response:
    print(r.delta, end="")

## Exercise 2 - Create a Vector Database

### Embedding model

In [None]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import SimpleDirectoryReader, StorageContext, VectorStoreIndex
import pandas as pd
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore


First, let's load an Word Embedding model from Huggingface. We will be using BAAI for this workshop, but any other model would be fine.

In [None]:
# Embeddings model from HuggingFace
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

# Set it as the default embedding model in LlamaIndex
Settings.embed_model = embed_model

### Reading documents with LlamaIndex

Let's now use LlamaIndex to read our documents and parse them. Make sure the the Gospel of the Flying spaghetti monster pdf is in the `/docs` folder

In [None]:
documents = SimpleDirectoryReader("./docs", recursive=True).load_data()

In [None]:
documents[:5]

### ChromaDB

We can now embed our documents and store them in a ChromaDB.

In [None]:
# Ephermeral client for Chroma
chroma_client = chromadb.EphemeralClient()
# chroma_collection = chroma_client.create_collection("mydocs")
chroma_collection = chroma_client.get_or_create_collection("mydocs")

# Vector store
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
# Storage context
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# use this to set custom chunk size and splitting
# https://docs.llamaindex.ai/en/stable/module_guides/loading/node_parsers/

In [None]:
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
    embed_model=embed_model,
    show_progress=True,
)

In [None]:
result = chroma_collection.get()
result.keys()

In [None]:
data = {
    "IDs": result["ids"],
    "Documents": result["documents"],
    "Metadata": result["metadatas"],
}

df = pd.DataFrame(data)

In [None]:
df.head()

In [None]:
result["metadatas"][0].keys()

## Exercise 3 - Your First RAG!

### Retrieve

Let's first practice the retrieval of a document based on a the similarity with a query

In [None]:
retriever = index.as_retriever(
    similarity_top_k=3,
)

retriever.retrieve("What is Red Teaming?")

### Query

We can also query our LLM directly, using the retrieved documents as source.

In [None]:
query_engine = index.as_query_engine(
    llm=llm,
    similartiy_top_k=3,
)

response = query_engine.query("What is Red Teaming?")

In [None]:
print(response)

We can check which sources were identified to be the most relevant

In [None]:
print(response.get_formatted_sources())

In [None]:
query_engine = index.as_query_engine(
    llm=llm,
    similartiy_top_k=3,
    streaming=True,
)

response = query_engine.query("What is Red Teaming?")

In [None]:
response.print_response_stream()

In [None]:
query_engine = index.as_query_engine(
    llm=llm,
    similartiy_top_k=3,
    streaming=True,
)

response = query_engine.query("What is Red Teaming?")

Finally, let's practice the reprompting of our LLM with a custom template, in which the relevant context will be fed.

In [None]:
from llama_index.core import PromptTemplate

# custome prompt template
template = (
    "Imagine you are an advanced AI expert in cyber security laws, with access to all current and relevant legal documents, "
    "case studies, and expert analyses. Your goal is to provide insightful, accurate, and concise answers to questions in this domain.\n\n"
    "Here is some context related to the query:\n"
    "-----------------------------------------\n"
    "{context_str}\n"
    "-----------------------------------------\n"
    "Considering the above information, please respond to the following inquiry with detailed references to applicable laws, "
    "precedents, or principles where appropriate:\n\n"
    "Question: {query_str}\n\n"
    "Answer succinctly, starting with the phrase 'According to cyber security law,' and ensure your response is understandable to someone without a legal background."
)
qa_template = PromptTemplate(template)


query_engine = index.as_query_engine(
    llm=llm,
    similartiy_top_k=3,
    streaming=True,
    text_qa_template=qa_template,
)

response = query_engine.query("What is Red Teaming?")

In [None]:
response.print_response_stream()