# Example of using Llamaindex framework for Retrieval Augmented Generation
This notebook shows how to run Llamaindex framework locally to create virtual AI assistant based on RAG (Retrieval Augmented Generation).
For dataset to search for source information wikipedia articles about cryptocurrencies were used.

## Llamaindex setup
### Download llamafile
Download the llamafile with model. Llamafile can contain any LLM. Framework enabled to run it as a local server and use via API. 
TinyLlama-1.1B-Chat-v1.0 model is used for purpose of this example

` wget https://huggingface.co/Mozilla/TinyLlama-1.1B-Chat-v1.0-llamafile/resolve/main/TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile`

Make executable 

`chmod +x TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile`

Run in server mode

`./TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile --server --nobrowser --embedding --port 8081`

### Install Llamaindex Python library

In [1]:
# Install llama-index
!pip install llama-index 
# Install llamafile integrations and SimpleWebPageReader
!pip install llama-index-embeddings-llamafile llama-index-llms-llamafile llama-index-readers-web

Collecting llama-index
  Downloading llama_index-0.12.3-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-agent-openai<0.5.0,>=0.4.0 (from llama-index)
  Downloading llama_index_agent_openai-0.4.0-py3-none-any.whl.metadata (726 bytes)
Collecting llama-index-cli<0.5.0,>=0.4.0 (from llama-index)
  Downloading llama_index_cli-0.4.0-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core<0.13.0,>=0.12.3 (from llama-index)
  Downloading llama_index_core-0.12.3-py3-none-any.whl.metadata (2.5 kB)
Collecting llama-index-embeddings-openai<0.4.0,>=0.3.0 (from llama-index)
  Downloading llama_index_embeddings_openai-0.3.1-py3-none-any.whl.metadata (684 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.4.0 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.6.3-py3-none-any.whl.metadata (3.8 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama-index)
  Downloading llama_index_legacy-0.9.48.post4-py3-none-any.whl.metadata (8.5 kB

## Configuration

In [34]:
# Configure LlamaIndex
from llama_index.core import Settings
from llama_index.embeddings.llamafile import LlamafileEmbedding
from llama_index.llms.llamafile import Llamafile
from llama_index.core.node_parser import SentenceSplitter

#configure object to encode text into vector using started endpoint
Settings.embed_model = LlamafileEmbedding(base_url="http://localhost:8081")

#configure object that will use model endpoint
Settings.llm = Llamafile(
    base_url="http://localhost:8081",
    temperature=0,
    seed=0
)

#configure split text to chunks 
Settings.transformations = [
    SentenceSplitter(
        chunk_size=256, 
        chunk_overlap=5
    )
]

In [35]:
# Load local data with some cryptocurrencies descriptions
from llama_index.core import SimpleDirectoryReader
# create reader for local documents
local_doc_reader = SimpleDirectoryReader(input_dir='./data/cryptocurrency_wikipedia')
# create collection that will contain all documents used for retrieval
docs = local_doc_reader.load_data(show_progress=True)














Loading files: 100%|██████████| 3/3 [00:00<00:00, 2289.88file/s][A[A[A[A[A[A[A[A[A[A


In [36]:
# Add Wikipedia pages
from llama_index.readers.web import SimpleWebPageReader
urls = [
    'https://en.wikipedia.org/wiki/Bitcoin',
    'https://en.wikipedia.org/wiki/Ethereum',
    'https://en.wikipedia.org/wiki/Dogecoin'
]
# create reader that can fetch websites content
web_reader = SimpleWebPageReader(html_to_text=True)
# add fetched content to docs collection
docs.extend(web_reader.load_data(urls))

In [37]:
from llama_index.core import VectorStoreIndex

# create index storage with embedded documents
index = VectorStoreIndex.from_documents(
    docs,
    show_progress=True,
)

# dump storage locally
index.storage_context.persist(persist_dir="./storage")

Parsing nodes:   0%|          | 0/6 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/547 [00:00<?, ?it/s]

HTTPStatusError: Server error '503 Service Unavailable' for url 'http://localhost:8081/embedding'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/503

In [38]:
# create query type engine to ask questions to llm providing documents from index
query_engine = index.as_query_engine()
# ask questions about data from storage
print(query_engine.query("What is Coinye?"))

NameError: name 'index' is not defined

In [39]:
print(query_engine.query("Is Dogecoin stable?"))

NameError: name 'query_engine' is not defined

In [28]:
print(query_engine.query("Is Bitcoin good for environment?"))

Yes, Bitcoin is considered to be a good option for environment as it does not require any physical infrastructure, such as mining facilities, to operate. It is a decentralized and secure cryptocurrency that uses blockchain technology to verify transactions and maintain a decentralized ledger. This means that Bitcoin does not require any intermediaries or third-party entities to process transactions, which reduces the carbon footprint associated with traditional financial systems. Additionally, Bitcoin's energy consumption is significantly lower than that of traditional financial systems, making it a more sustainable option for the environment.</s>
