# LlamaIndex Episode 1 🦙

## Overview

* What is LlamaIndex?

        * LlamaHub (data loaders)

* How to setup Weaviate

        * Create schema


* Adding Data to Weaviate using LlamaIndex

        *  Data loader examples

* Chunking up your data

* Connecting Weaviate instance to LlamaIndex

* Simple query engine

## What is [LlamaIndex](https://www.llamaindex.ai/)?

#### Framework that enables you to connect LLMs and storage providers together seamlessly.
#### LlamaIndex 🤝 Weaviate ➡ Ultimate RAG stack

#### [LlamaHub](https://llama-hub-ui.vercel.app/): Enables you to connect to a number of external data sources (Notion, Slack, Web pages, and more!)

## Setting up Weaviate

1. Embedded 

2. WCS

3. Docker

### Embedded

In [None]:
import weaviate

# Need the latest version of the Weaviate python client (3.21)

client = weaviate.Client(embedded_options=weaviate.EmbeddedOptions())

### WCS

In [None]:
import weaviate

client = weaviate.Client(
  url="https://llamaindex-episode1-samtusdu.weaviate.network",  # URL of your Weaviate instance
  additional_headers={
    "X-OPENAI-Api-Key": "sk-key", # Replace with your OpenAI key
  }
)

client.schema.get()  # Get the schema to test connection

### Docker

In [4]:
import weaviate 

client = weaviate.Client("http://localhost:8081")

### Schema

In [5]:
schema = {
   "classes": [
       {
           "class": "BlogPost",
           "description": "Blog post from the Weaviate website.",
           "vectorizer": "text2vec-openai",
           "moduleConfig": {
               "generative-openai": { 
                    "model": "gpt-3.5-turbo"
                }
           },
           "properties": [
               {
                  "name": "Content",
                  "dataType": ["text"],
                  "description": "Content from the blog post",
               }
            ]
        }
    ]
}

client.schema.delete_all()

client.schema.create(schema)

print("Schema was created.")

Schema was created.


## Adding Data to Weaviate using LlamaIndex

### SimpleDirectoryReader: Read files in your filesystem

In [6]:
from llama_index import SimpleDirectoryReader

blogs = SimpleDirectoryReader('../data').load_data()

  return super().__new__(cls, name, bases, dct)
  class JsonSpec(BaseModel):
  class CreateDraftMessageSchema(BaseModel):
  class SearchEventsInput(BaseModel):
  class SearchEmailsInput(BaseModel):
  class SendEventSchema(BaseModel):
  class SendMessageSchema(BaseModel):
  class Reference(BaseModel):
  class Example(BaseModel):
  class Encoding(BaseModel):
  class Discriminator(BaseModel):
  class ExternalDocumentation(BaseModel):
  class XML(BaseModel):
  class Schema(BaseModel):
  class MediaType(BaseModel):
  class Parameter(BaseModel):
  class Header(Parameter):
  class ServerVariable(BaseModel):
  class Server(BaseModel):
  class Link(BaseModel):
  class RequestBody(BaseModel):
  class Response(BaseModel):
  class Operation(BaseModel):
  class PathItem(BaseModel):
  class OAuthFlow(BaseModel):
  class OAuthFlows(BaseModel):
  class SecurityScheme(BaseModel):
  class Components(BaseModel):
  class Contact(BaseModel):
  class License(BaseModel):
  class Info(BaseModel):
  class Tag(

### SimpleWebPageReader: Web scraper that turns HTML to text

In [7]:
from llama_index import download_loader

SimpleWebPageReader = download_loader("SimpleWebPageReader")

loader = SimpleWebPageReader()
documents = loader.load_data(urls=['https://weaviate.io/blog/llamaindex-and-weaviate'])

### NotionPageReader: Loads documents from Notion

In [9]:
from llama_index import download_loader

NotionPageReader = download_loader('NotionPageReader')

integration_token = ("secret_key")
page_ids = ["40be241cac924a5aa887fa85e945dbf6"]
reader = NotionPageReader(integration_token=integration_token)
documents = reader.load_data(page_ids=page_ids)

  return pattern.translate(_special_chars_map)


KeyError: 'results'

### Creating Nodes

In [15]:
from llama_index.node_parser import SimpleNodeParser

parser = SimpleNodeParser.from_defaults()
nodes = parser.get_nodes_from_documents(blogs)

### Nodes to Weaviate

In [16]:
from llama_index.vector_stores import WeaviateVectorStore
from llama_index import VectorStoreIndex, StorageContext
from llama_index.storage.storage_context import StorageContext
import os

#os.environ["OPENAI_API_KEY"] = "sk-key"

# construct vector store
vector_store = WeaviateVectorStore(weaviate_client = client, index_name="BlogPost", text_key="content")

# setting up the storage for the embeddings
storage_context = StorageContext.from_defaults(vector_store = vector_store)

# set up the index
index = VectorStoreIndex.from_documents(nodes, storage_context = storage_context)

AttributeError: 'TextNode' object has no attribute 'get_doc_id'

### Query in LlamaIndex

In [None]:
query_engine = index.as_query_engine()
response = query_engine.query("What is the intersection between LLMs and search?")
print(response)

## Connecting Weaviate Instance to LlamaIndex

In [11]:
import weaviate
from llama_index.readers.weaviate.reader import WeaviateReader

# WCS
resource_owner_config = weaviate.AuthClientPassword(
  username = "erika@weaviate.io", 
  password = "<password>"
)

# initialize reader
reader = WeaviateReader("https://llamaindex-episode1-samtusdu.weaviate.network", auth_client_secret=resource_owner_config)


documents = reader.load_data(
    class_name="BlogPost", 
    properties=["content"], 
    separate_documents=True
)


# localhost
# reader = WeaviateReader("http://localhost:8080")

# documents = reader.load_data(
#     class_name="BlogPost", 
#     properties=["content"], 
#     separate_documents=True
# )

### Querying the existing class

In [None]:
from llama_index import ListIndex
import os

client = weaviate.Client(url="https://llamaindex-episode1-samtusdu.weaviate.network")

reader = WeaviateReader("https://llamaindex-episode1-samtusdu.weaviate.network")

query = """
{
  Get {
    BlogPost (
      bm25: {
        query: "What is ref2vec"
        properties: ["content"]
      },
      limit: 2
    ) {
      content
    }
  }
}
"""

documents = reader.load_data(graphql_query=query, separate_documents=True)

index = ListIndex.from_documents(documents)


query_engine = index.as_query_engine(response_mode="compact")
response = query_engine.query("what is ref2vec")
print(response)