## Configuration 

At the time of writing this notebook, you will need to install LlamaIndex version `0.8.0`.

## Connect to Weaviate

In [None]:
import weaviate

client = weaviate.Client(
    embedded_options=weaviate.embedded.EmbeddedOptions()
)

## Create Schema

In [2]:
schema = {
   "classes": [
       {
           "class": "BlogPost",
           "description": "Blog post from the Weaviate website.",
           "vectorizer": "text2vec-openai",
           "moduleConfig": {
               "generative-openai": { 
                    "model": "gpt-3.5-turbo"
                }
           },
           "properties": [
               {
                  "name": "Content",
                  "dataType": ["text"],
                  "description": "Content from the blog post",
               }
            ]
        }
    ]
}

client.schema.delete_all()

client.schema.create(schema)

print("Schema was created.")

Schema was created.


{"action":"hnsw_vector_cache_prefill","count":1000,"index_id":"blogpost_IhnmuS2PG63J","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2023-09-22T15:24:08Z","took":35458}


## Add Data

In [3]:
from llama_index import SimpleDirectoryReader

blogs = SimpleDirectoryReader('./data').load_data()

  from .autonotebook import tqdm as notebook_tqdm


## Setup Weaviate Vector Store

In [4]:
from llama_index.vector_stores import WeaviateVectorStore
from llama_index import VectorStoreIndex
from llama_index.storage.storage_context import StorageContext
import os

openai_api_key = os.environ["OPENAI_API_KEY"]

# construct vector store
vector_store = WeaviateVectorStore(weaviate_client = client, index_name="BlogPost", text_key="content")

# setting up the storage for the embeddings
storage_context = StorageContext.from_defaults(vector_store = vector_store)

# set up the index
index = VectorStoreIndex.from_documents(blogs, storage_context = storage_context)

## Query without Self-Corrrecting

In [5]:
base_query_engine = index.as_query_engine()
query = "What is Ref2Vec?"

response = base_query_engine.query(query)
print(response)


Ref2Vec is a method of representing a data object based on the objects it references. It uses the average, or centroid vector, of the cross-referenced vectors to represent the referencing object. This way, it can be used to find more relevant objects.


## Configure Self-Correcting Query Engine

In [14]:
from llama_index.evaluation.guideline_eval import GuidelineEvaluator, DEFAULT_GUIDELINES
from llama_index.response.schema import Response
from llama_index.indices.query.query_transform.feedback_transform import (
    FeedbackQueryTransformation,
)
from llama_index.query_engine.retry_query_engine import (
    RetryGuidelineQueryEngine,
)

# Guideline eval
guideline_eval = GuidelineEvaluator(
    guidelines=DEFAULT_GUIDELINES + "\nThe response should try to summarize where possible.\n"
    "The response should mention Weaviate and not be too vauge.\n"
)

In [15]:
typed_response = response if isinstance(response, Response) else response.get_response()
eval = guideline_eval.evaluate_response(query, typed_response)
print(f"Guideline eval evaluation result: {eval.feedback}")

feedback_query_transform = FeedbackQueryTransformation(resynthesize_query=True)
transformed_query = feedback_query_transform.run(query, {"evaluation": eval})
print(f"Transformed query: {transformed_query.query_str}")

Guideline eval evaluation result: The response does not mention Weaviate and is too vague. It should provide more specific information and use statistics or numbers when possible. It should also try to summarize where possible.
Transformed query: Here is a previous bad answer.

Ref2Vec is a method of representing a data object based on the objects it references. It uses the average, or centroid vector, of the cross-referenced vectors to represent the referencing object. This way, it can be used to find more relevant objects.
Here is some feedback from the evaluator about the response given.
The response does not mention Weaviate and is too vague. It should provide more specific information and use statistics or numbers when possible. It should also try to summarize where possible.
Now answer the question.

What is Ref2Vec and how does it work with Weaviate?


In [16]:
retry_guideline_query_engine = RetryGuidelineQueryEngine(
    base_query_engine, guideline_eval, resynthesize_query=True
)
retry_guideline_response = retry_guideline_query_engine.query(query)
print(retry_guideline_response)


Ref2Vec is a method of representing a data object based on the objects it references, developed by Weaviate. It uses the average, or centroid vector, of the cross-referenced vectors to represent the referencing object. This way, it can be used to find more relevant objects, such as in recommendation, knowledge graph representation, and representing long or complex multimodal objects. Ref2Vec combines vector search with the ability to link classes to other classes through cross-references, allowing for a better search experience.


