# llama-ipfs Example in Google Colab

This notebook demonstrates how to use llama-cpp-python with IPFS models using the llama-ipfs integration.

## Step 1: Install Required Packages

In [None]:
# Install llama-cpp-python, huggingface-hub, and llama-ipfs
!pip install llama-cpp-python huggingface-hub llama-ipfs --quiet

## Step 2: Activate IPFS Integration

In most environments, you would use `llama-ipfs activate`, but in Google Colab we need to apply the patch explicitly:

In [None]:
# Standard activation approach (commented out)
# llama-ipfs activate

# For Google Colab, we need to apply the patch explicitly:
import llama_ipfs

# Apply the patch manually
llama_ipfs.activate()

# Verify that the patch was applied successfully
print(f"IPFS patch active: {llama_ipfs.status()}")

## Step 3: Load and Use an IPFS-hosted Model

Now we can load llama-cpp models directly from IPFS using their CID:

In [None]:
from llama_cpp import Llama

# Load model directly from IPFS
repo_id = "ipfs://bafybeie7quk74kmqg34nl2ewdwmsrlvvt6heayien364gtu2x6g2qpznhq"
# Equivalent HuggingFace model: repo_id = "aisuko/gpt2-117M-gguf"
filename = "ggml-model-Q4_K_M.gguf"

# Load the model with minimal verbosity
llm = Llama.from_pretrained(repo_id=repo_id, filename=filename, verbose=False)

# Prepare a simple Q&A example
context = (
    "France is a country in Western Europe known for its rich history and culture. "
    "It is home to many famous landmarks including the Eiffel Tower. Its capital is Paris."
)
question = "What is the capital of France?"
prompt = f"Context: {context}\nQuestion: {question}\nBased solely on the above context, answer the question in one word:"

# Generate the answer
output = llm(
    prompt,
    max_tokens=10,
    temperature=0.0,
    top_p=0.95,
    repeat_penalty=1.0,
    stop=["\n"]
)

# Extract and print the answer
answer = output['choices'][0]['text'].strip()
print("Answer:", answer)

## How It Works

The `llama-ipfs` package patches the llama-cpp-python library to:

1. Recognize `ipfs://` URIs as valid model identifiers
2. Download model files from IPFS nodes or gateways
3. Cache models locally for faster loading in subsequent runs

This allows you to load models from a decentralized network without changing any of your existing code!