
```bash
# 1. Download the llamafile-ized model
wget https://huggingface.co/jartine/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile

# 2. Make executable 
chmod +x TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile

# 3. Run in server mode
./TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile --server --nobrowser --embedding --port 8080
```

In [8]:
# Install llama-index
!pip install llama-index-core
# Install llamafile integrations and SimpleWebPageReader
!pip install llama-index-embeddings-llamafile llama-index-llms-llamafile llama-index-readers-web
!pip install llama-index 

Collecting llama-index
  Downloading llama_index-0.10.67.post1-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-agent-openai<0.3.0,>=0.1.4 (from llama-index)
  Downloading llama_index_agent_openai-0.2.9-py3-none-any.whl.metadata (729 bytes)
Collecting llama-index-cli<0.2.0,>=0.1.2 (from llama-index)
  Downloading llama_index_cli-0.1.13-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-embeddings-openai<0.2.0,>=0.1.5 (from llama-index)
  Downloading llama_index_embeddings_openai-0.1.11-py3-none-any.whl.metadata (655 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.2.0 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.2.7-py3-none-any.whl.metadata (3.8 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama-index)
  Downloading llama_index_legacy-0.9.48.post3-py3-none-any.whl.metadata (8.5 kB)
Collecting llama-index-llms-openai<0.2.0,>=0.1.27 (from llama-index)
  Downloading llama_index_llms_openai-0.1.29-py3-none-an

In [9]:
# Configure LlamaIndex
from llama_index.core import Settings
from llama_index.embeddings.llamafile import LlamafileEmbedding
from llama_index.llms.llamafile import Llamafile
from llama_index.core.node_parser import SentenceSplitter

Settings.embed_model = LlamafileEmbedding(base_url="http://localhost:8080")

Settings.llm = Llamafile(
    base_url="http://localhost:8080",
    temperature=0,
    seed=0
)

Settings.transformations = [
    SentenceSplitter(
        chunk_size=256, 
        chunk_overlap=5
    )
]

In [10]:
# Load local data
from llama_index.core import SimpleDirectoryReader
local_doc_reader = SimpleDirectoryReader(input_dir='./data')
docs = local_doc_reader.load_data(show_progress=True)

Loading files: 100%|██████████| 2/2 [00:00<00:00, 151.09file/s]


In [11]:
# Add Wikipedia page
from llama_index.readers.web import SimpleWebPageReader
urls = [
    'https://pl.wikipedia.org/wiki/Nad_Niemnem',
]
web_reader = SimpleWebPageReader(html_to_text=True)
docs.extend(web_reader.load_data(urls))

In [12]:

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(
    docs,
    show_progress=True,
)

index.storage_context.persist(persist_dir="./storage")

Parsing nodes:   0%|          | 0/3 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/94 [00:00<?, ?it/s]

In [14]:
query_engine = index.as_query_engine()
print(query_engine.query("What is revenue?"))

Revenue is the amount of money earned by a business or organization from its activities. It is the profit earned from selling goods or services, or from other sources of income.</s>
