## Different demos with Local LLM

### Llmaindex with Ollama

Examples on how to use LlamaIndex local LLM (Ollama - gemm3).

#### Install necessary packages

In [None]:
!pip install -U llama-index-llms-ollama llama-index

#### Basic usage

1. Completion

In [1]:
from llama_index.llms.ollama import Ollama

windows_host_ip = "172.17.80.1"
ollama_port = 11434
gemma3 = Ollama(
    model="gemma3", 
    base_url=f"http://{windows_host_ip}:{ollama_port}", 
    request_timeout=100.0)

resp = gemma3.complete("Summarize in 2 sentences who Paul Graham is.")
print(resp)

Paul Graham is a highly influential American computer scientist, philosopher, and venture capitalist best known for his role in founding Y Combinator, a startup accelerator. He’s also a prolific writer and essayist, particularly known for his ideas on technology, hacking, and the importance of building simple, elegant systems.


2. Chat message

In [2]:
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name?"),
]

resp = gemma3.chat(messages)
print(resp)

assistant: Blast yer eyes, lad! Ye want to know me name, do ye? Well, they call me One-Eyed Silas, though some just shout “Silas, ye scurvy dog!” which, frankly, I’ve grown rather fond of.  It's a name earned, ye see. Lost me eye to a particularly grumpy kraken – long story, best told with a tankard o' rum!  

So, aye, Silas it is.  Silas Blackheart, at yer service... or perhaps at yer demise, depending on how ye treat a pirate!  Now, what brings ye to these waters, eh?  Don’t be standin' there gawkin', a fresh face!


3. Streaming

In [7]:
resp = gemma3.stream_complete("Using Gen AI SDK or Llmaindex how to decide When to use complete vs chat?")

for r in resp:
    print(r.delta, end="")

Okay, let's break down how to decide between using a complete retrieval vs. a chat-based approach with Gen AI SDKs (like Cohere's) or LLM indexes (like LlamaIndex). The choice depends heavily on the *type of question* you're trying to answer and the *desired level of control and nuance*.

**1. Understanding the Approaches**

* **Complete Retrieval (Vector Search + LLM):**
    * **How it Works:** You create vector embeddings of your documents. These embeddings represent the *meaning* of the text. You then use a vector search engine (like Pinecone, ChromaDB, Weaviate) to find the *most similar* embeddings to a user's query.  Finally, the most relevant documents are fed to a Large Language Model (LLM) to generate a response.
    * **Strengths:**
        * **Precision:** Excellent for factual questions where the answer is directly contained within the retrieved documents.
        * **Scalability:** Vector search is highly efficient for large document sets.
        * **Contextualized by Spe