# Example of using Llamaindex framework for Retrieval Augmented Generation

## Llamaindex setup
### Download llamafile
Download the llamafile with model. Llamafile can contain any LLM. Framework enabled to run it as a local server and use via API. 
TinyLlama-1.1B-Chat-v1.0 model is used for purpose of this example

`wget https://huggingface.co/jartine/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile`

Make executable 

`chmod +x TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile`

Run in server mode

`./TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile --server --nobrowser --embedding --port 8080`

### Install Llamaindex Python library

In [16]:
# Install llama-index
!pip install llama-index 
# Install llamafile integrations and SimpleWebPageReader
!pip install llama-index-embeddings-llamafile llama-index-llms-llamafile llama-index-readers-web



## Configuration

In [17]:
# Configure LlamaIndex
from llama_index.core import Settings
from llama_index.embeddings.llamafile import LlamafileEmbedding
from llama_index.llms.llamafile import Llamafile
from llama_index.core.node_parser import SentenceSplitter

#configure object to encode text into vector using started endpoint
Settings.embed_model = LlamafileEmbedding(base_url="http://localhost:8080")

#configure object that will use model endpoint
Settings.llm = Llamafile(
    base_url="http://localhost:8080",
    temperature=0,
    seed=0
)

#configure split text to chunks 
Settings.transformations = [
    SentenceSplitter(
        chunk_size=256, 
        chunk_overlap=5
    )
]

In [18]:
# Load local data with some cryptocurrencies descriptions
from llama_index.core import SimpleDirectoryReader
# create reader for local documents
local_doc_reader = SimpleDirectoryReader(input_dir='./data/cryptocurrency_wikipedia')
# create collection that will contain all documents used for retrieval
docs = local_doc_reader.load_data(show_progress=True)


Loading files: 100%|██████████| 3/3 [00:00<00:00, 2713.59file/s]


In [19]:
# Add Wikipedia pages
from llama_index.readers.web import SimpleWebPageReader
urls = [
    'https://en.wikipedia.org/wiki/Bitcoin',
    'https://en.wikipedia.org/wiki/Ethereum',
    'https://en.wikipedia.org/wiki/Dogecoin'
]
# create reader that can fetch websites content
web_reader = SimpleWebPageReader(html_to_text=True)
# add fetched content to docs collection
docs.extend(web_reader.load_data(urls))

In [None]:
from llama_index.core import VectorStoreIndex

# create index storage with embedded documents
index = VectorStoreIndex.from_documents(
    docs,
    show_progress=True,
)

# dump storage locally
index.storage_context.persist(persist_dir="./storage")

Parsing nodes:   0%|          | 0/6 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/545 [00:00<?, ?it/s]

In [10]:
# create query type engine to ask questions to llm providing documents from index
query_engine = index.as_query_engine()
# ask questions about data from storage
print(query_engine.query("What is coinye?"))

NameError: name 'index' is not defined

In [None]:
print(query_engine.query("Who created dodgecoin?"))

In [None]:
print(query_engine.query("Is bitcoin stable?"))