# Basic llama index usage

- Install / Setup
- Load Documents
- Create & Store Index
- Query Index

## install libraries

- This section will install the necessary libraries for the dataloaders and index implementations.

### what is llama-hub ?

llama-hub is a library/framework for llm applications with a weight on data integrations.
it uses langchain (another well-known library) and the openai drivers under the hood.

In [None]:
!pip install llama_index
!pip install llama_hub

## setup

- set logging to stream to stdout and loglevel to info
- set openai api key

### why openai api key?

llama-hub and many other llm app-frameworks or libraries use (per default) openai high-level api for embeddings (the interaction with the model).

![](images/chatbot_graph-white-bg-v3.png)

In [None]:
import os
import logging
import sys
import getpass

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

# using this to connect to openai
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
#os.environ["OPENAI_API_BASE"] = "https://api.openai.com/v1"

## load data from website via sitemap

### what is a sitemap?

A sitemap is a file where you provide information about the pages, videos, and other files on your site, and the relationships between them. 

Learn more:
https://developers.google.com/search/docs/crawling-indexing/sitemaps/overview

### why use a sitemap in this case?

Many websites have interesting content and already offer a sitemap.xml. So I can just choose my input by choosing a website of my interest.
This is how I can create a custom pool of knowledge for my chatbot.

### how do we get the data?

using the [llama_hub loader for sitemaps](https://llama-hub-ui.vercel.app/l/web-sitemap)

In [None]:
from llama_hub.web.sitemap.base import SitemapReader

# for jupyter notebooks uncomment the following two lines of code:
import nest_asyncio
nest_asyncio.apply()

loader = SitemapReader(html_to_text=True)
documents = loader.load_data(sitemap_url='https://deepshore.de/sitemap.xml', filter='https://deepshore.de/knowledge')

print(len(documents))

## Create Index from Documents

### why
a vector index stores/indexes vector embeddings for fast retrieval and similarity search.

![](images/how-does-vector-store-work.png)

In [None]:
from llama_index import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)

index.storage_context.persist()

## load index from from disk storage

In [None]:
from llama_index import StorageContext, load_index_from_storage

# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir="./storage")
# load index
index = load_index_from_storage(storage_context)

## query index

simply query the index and look at the response

In [None]:
query_engine = index.as_query_engine()
response = query_engine.query("Was ist k6.io? Wofür benutzt man es?")

print(response)