In [19]:
import weaviate
import os
from weaviate.classes.init import AdditionalConfig, Timeout

In [20]:
client.close()

In [21]:
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())

client = weaviate.connect_to_wcs(
    cluster_url=os.getenv("WCD_URL"),
    auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WCD_API_KEY")),
    headers={
        "X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"]
    },
    additional_config=AdditionalConfig(timeout=Timeout(init=10, query=45, insert=120))
)
client.is_ready()

True

In [22]:
# Making the collection
collection_of_docs = client.collections.get("Documents")

In [23]:
# How many chunks?
collection_of_docs.aggregate.over_all()

AggregateReturn(properties={}, total_count=104)

In [24]:
# Semantic Search, print items in response objects. len = limit.
response = collection_of_docs.query.near_text(
        query="error handling",
        limit=2
    )
for item in response.objects:
    print(item.properties)

{'content': 'management systems, includ-\ning NoSQL and relational databases, are not designed for these\ndatasets and workloads. First, vector queries rely on the concept of\nsimilarity which can be vague for different applications, requiring\na different query specification. Second, similarity computation is\nmore expensive than other types of comparisons seen in relational\npredicates, requiring efficient techniques. Third, processing a vector\nquery often requires retrieving full vectors from the collection.', 'source': 'Quickstart Competitive Analysis.pdf'}
{'content': 'Vector Database Management Techniques and Systems\nJianguo Wang\ncsjgwang@purdue.edu\nPurdue University\nWest Lafayette, Indiana, USA\nJames Jie Pan\njamesjpan@tsinghua.edu.cn\nTsinghua University\nBeijing, China\nGuoliang Li\nliguoliang@tsinghua.edu.cn\nTsinghua University\nBeijing, China\nABSTRACT\nFeature vectors are now mission-critical for many applications, in-\ncluding retrieval-based large language models (

In [26]:
# Generative Search with single prompt. Need a for loop to parse through limit amount of chunks that generate.near_text() returns.
response = collection_of_docs.generate.near_text(
        query="high dimensional vector",
        single_prompt="Explain {content} very directly",
        limit=2
    )
for item in response.objects:
    print(item.generated)


 managing and querying large amounts of data efficiently. The Management of Data conference, organized by SIGMOD, focuses on addressing the challenges and opportunities in managing and querying large-scale data, particularly high-dimensional feature vectors.

The conference will take place from June 9â€“15, 2024, in Santiago, AA, Chile. The event will feature presentations, workshops, and discussions on the latest research and developments in data management. Researchers, practitioners, and industry experts from around the world will come together to share their insights and experiences in handling large datasets.

Topics that will be covered at the conference include but are not limited to:

- Efficient indexing and querying of high-dimensional feature vectors
- Scalable storage and retrieval techniques for large datasets
- Machine learning and deep learning approaches for data management
- Data cleaning, preprocessing, and transformation
- Query optimization and performance tuning
- 

In [None]:
# Generative Search with grouped task. Only one response, since the limit amount of chunks are combined, so only need to return one thing.
response = collection_of_docs.generate.near_text(
        query="error handling",
        grouped_task="Please summarize this content",
        limit=2
    )
print(response.generated)



The content discusses how traditional database management systems, including NoSQL and relational databases, are not designed to handle datasets and workloads that involve vector queries. Vector queries rely on the concept of similarity, which can be vague and require different query specifications. Similarity computation is more expensive than other types of comparisons, and processing a vector query often requires retrieving full vectors from the collection. The abstract of the source paper mentions that feature vectors are crucial for many applications, such as retrieval-based large language models, and traditional database management systems are not equipped to handle them.
