<a href="https://colab.research.google.com/github/d-kleine/Advent_of_HayStack/blob/main/2_Challenge_Haystack_Advent_Weaviate_Day.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advent of Haystack: Day 2

_Make a copy of this Colab to start_

In this challenge, your mission is to help a couple of fictional elves in the film "A Very Weaviate Christmas".
1. Find out what's happening in the film "A Very Weaviate Christmas"
2. This will lead you to a clue that will let you discover which Weaviate Collection to peak into.
3. While submitting the challenge, tell us what you find there!


### Components to use:
1. [`OpenAITextEmbedder`](https://docs.haystack.deepset.ai/docs/openaitextembedder)
2. [`OpenAIGenerator`](https://docs.haystack.deepset.ai/docs/openaigenerator)
3. [`PromptBuilder`](https://docs.haystack.deepset.ai/docs/promptbuilder)
4. [`WeaviateDocumentStore`](https://docs.haystack.deepset.ai/docs/weaviatedocumentstore)
5. [`WeaviateEmbeddingRetriever`](https://docs.haystack.deepset.ai/reference/integrations-weaviate#weaviateembeddingretriever)


🎄 **Your task is to complete steps 3 and 4**. But make sure you run the code cells before. You should know what each prior step is doing.

## 1) Setup and Installation

In [1]:
# pip install haystack-ai weaviate-haystack

To get started, first provide your API keys below. We're providing you with a read-only API Key for Weaviate.

For this challenge, we've prepared a Weaviate Collection for you which contains lots of movies and their overviews.

In [2]:
import os
from getpass import getpass

os.environ["WEAVIATE_API_KEY"] = "b3jhGwa4NkLGjaq3v1V1vh1pTrlKjePZSt91"

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key:")

## 2) Weaviate Setup

Next, you can connect to the right `WeaviateDocumentStore` (we've already added the right code for you below with the client URL in place).

In this document store, there are many movies, their titles and ther overviews.

In [3]:
from haystack_integrations.document_stores.weaviate import WeaviateDocumentStore, AuthApiKey
import os


auth_client_secret = AuthApiKey()
document_store = WeaviateDocumentStore(url="https://zgvjwlycsr6p5j1ziuyea.c0.europe-west3.gcp.weaviate.cloud",
                                       auth_client_secret=auth_client_secret)

## 3) The RAG Pipeline

Now, you're on your own. Complete the code blocks below.

First, create a RAG pipeline that can answer questions based on the overviews of the movies in your `document_store`.

⭐️ You should then be able to run the pipeline and answer the questions "What happens in the film 'A Very Weaviate Christmas'?"

**💚 Hint 1:** The embedding model that was used to populate the vectors was `text-embedding-3-small` by OpenAI.

**💙 Hint 2:** We've added an import to the OpenAIGenerator but feel free to use something else!

In [4]:
from haystack import Pipeline
from haystack.components.embedders import OpenAITextEmbedder
from haystack.components.generators import HuggingFaceLocalGenerator
from haystack.components.builders import PromptBuilder
from haystack_integrations.components.retrievers.weaviate import WeaviateEmbeddingRetriever
from transformers import AutoTokenizer # added

model = "meta-llama/Llama-3.2-1B-Instruct"
eos_token_id = AutoTokenizer.from_pretrained(model).eos_token_id

template = """Given the information below, answer the query. Only use the 
provided context to generate the answer

    Context:

    {% for document in documents %}
        {{ document.content }}
    {% endfor %}

    Question: {{ query }}
    
    Answer:
"""

text_embedder = OpenAITextEmbedder(model="text-embedding-3-small")
retriever = WeaviateEmbeddingRetriever(document_store=document_store, top_k=5)
prompt_builder = PromptBuilder(template=template)
generator = HuggingFaceLocalGenerator(model=model,
                                      task="text-generation",
                                      generation_kwargs={
                                        "do_sample": False,
                                        "top_p": None,
                                        "temperature": None,
                                        "pad_token_id": eos_token_id
                                        })

In [5]:
query = "What happens in the film 'A Very Weaviate Christmas'?"

rag = Pipeline()

rag.add_component("text_embedder", text_embedder)
rag.add_component("retriever", retriever)
rag.add_component("prompt_builder", prompt_builder)
rag.add_component("llm", generator)

rag.connect("text_embedder.embedding", "retriever.query_embedding")
rag.connect("retriever", "prompt_builder.documents")
rag.connect("prompt_builder", "llm") 

reply = rag.run({"text_embedder": {"text": query}, "prompt_builder": {"query": query}})

print(reply["llm"]["replies"][0])

Device set to use cuda:0


 In the film 'A Very Weaviate Christmas', Daniel and Philip, two elves, are on a mission to recover stolen vectors from an intruder in Santa's Grotto. Meanwhile, Jonah, the son of Sam Baldwin, is trying to find a new wife for his dad, and Annie Reed is having doubts about her relationship. The Grinch tries to rob Whoville of Christmas, but a dash of kindness from Cindy Lou Who helps him melt his heart. Charlie Simms is trying to earn money for his flight home to Gresham, and Mia and Sebastian are facing decisions that threaten to fray their love affair.


## 4) Solve the Mystery

By this point, you should know what's happening.. There is a Collection where everything has been hidden.

Complete the code cell below by providing the right Collection name, and tell us the following:

1. Who is the culprit? Watch out, because there may be `decoys`.
2. What have they stolen?

**💚 Hint:** Once you've connected to the right collection, take a look at all the Objects in there. Then, you may be able to use filters to avoid the decoys!

- [Weaviate Documentation: Read all Objects](https://weaviate.io/developers/weaviate/manage-data/read-all-objects)
- [Weaviate Documentation: Filters](https://weaviate.io/developers/weaviate/search/filters)

In [6]:
import weaviate

from weaviate.classes.init import Auth

headers = {"X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY")}
client = weaviate.connect_to_weaviate_cloud(cluster_url="https://zgvjwlycsr6p5j1ziuyea.c0.europe-west3.gcp.weaviate.cloud",
                                            auth_credentials=Auth.api_key(os.getenv("WEAVIATE_API_KEY")),
                                            headers=headers)

In [7]:
col_configs = client.collections.list_all()

collection_names = sorted(list(col_configs.keys()))
print("Collection Names:", collection_names)

Collection Names: ['Default', 'Santas_Grotto']


In [8]:
collection_name = collection_names[-1] # equals 'Santas_Grotto'

In [9]:
collection_details = client.collections.get(collection_name)
for item in collection_details.iterator(
):
    print(item.properties)

{'plot': 'Tuana is here with not just all the vectors but also all the presents that are supposed to be delivered around the World!', 'decoy': False}
{'plot': "Sebastian is here, but he seems unsure what's going on", 'decoy': True}
{'plot': "JP is here, looks like he's feasting on cookies", 'decoy': True}


In [10]:
from weaviate.classes.query import Filter

response = collection_details.query.fetch_objects(
    filters=Filter.by_property("decoy").equal(False)) # setting decoy = False to avoid the decoys

In [11]:
print(response)

QueryReturn(objects=[Object(uuid=_WeaviateUUIDInt('2713b638-12fd-48ea-99d1-0a852a7cf241'), metadata=MetadataReturn(creation_time=None, last_update_time=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=None, rerank_score=None), properties={'plot': 'Tuana is here with not just all the vectors but also all the presents that are supposed to be delivered around the World!', 'decoy': False}, references=None, vector={}, collection='Santas_Grotto')])


In [12]:
for obj in response.objects:
    print(obj.properties.get('plot'))  # Tuana is the culprit, he stole was all the presents!

Tuana is here with not just all the vectors but also all the presents that are supposed to be delivered around the World!
