# **Weaviate**

Weaviate is an open-source vector database designed to store, manage, and retrieve data represented as vectors. It enables efficient and scalable similarity searches, making it ideal for working with applications powered by machine learning models or other systems that use vector representations.

# **Top 3 Use Cases for Weaviate:**

**Semantic Search:** Enables context-based search for documents, products, or data, beyond simple keyword matching.
    
**Recommendation Systems:** Delivers personalized suggestions by comparing user preferences and item embeddings.
    
**Generative AI Applications:** Acts as a memory layer for AI chatbots and assistants to store and retrieve embeddings efficiently.

https://console.weaviate.cloud/

In [1]:
!pip install weaviate-client
!pip install langchain
!pip install openai

Collecting weaviate-client
  Downloading weaviate_client-4.9.4-py3-none-any.whl.metadata (3.6 kB)
Collecting httpx<=0.27.0,>=0.25.0 (from weaviate-client)
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting validators==0.34.0 (from weaviate-client)
  Downloading validators-0.34.0-py3-none-any.whl.metadata (3.8 kB)
Collecting authlib<1.3.2,>=1.2.1 (from weaviate-client)
  Downloading Authlib-1.3.1-py2.py3-none-any.whl.metadata (3.8 kB)
Collecting grpcio-tools<2.0.0,>=1.57.0 (from weaviate-client)
  Downloading grpcio_tools-1.68.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.3 kB)
Collecting grpcio-health-checking<2.0.0,>=1.57.0 (from weaviate-client)
  Downloading grpcio_health_checking-1.68.0-py3-none-any.whl.metadata (1.1 kB)
Collecting protobuf<6.0dev,>=5.26.1 (from grpcio-health-checking<2.0.0,>=1.57.0->weaviate-client)
  Downloading protobuf-5.28.3-cp38-abi3-manylinux2014_x86_64.whl.metadata (592 bytes)
Downloading weaviate_client-4.9.

In [None]:
OPENAI_API_KEY = ""
WEAVIATE_API_KEY = ""
WEAVIATE_CLUSTER = ""

## Data Reading

In [2]:
!mkdir data

In [3]:
!pip install unstructured
!pip install "unstructured[pdf]"

Collecting unstructured
  Downloading unstructured-0.16.8-py3-none-any.whl.metadata (24 kB)
Collecting filetype (from unstructured)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting python-magic (from unstructured)
  Downloading python_magic-0.4.27-py2.py3-none-any.whl.metadata (5.8 kB)
Collecting emoji (from unstructured)
  Downloading emoji-2.14.0-py3-none-any.whl.metadata (5.7 kB)
Collecting dataclasses-json (from unstructured)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting python-iso639 (from unstructured)
  Downloading python_iso639-2024.10.22-py3-none-any.whl.metadata (13 kB)
Collecting langdetect (from unstructured)
  Downloading langdetect-1.0.9.tar.gz (981 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m981.5/981.5 kB[0m [31m26.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting rapidfuzz (from unstructured)
  Downloading rapidfuzz-3.10.1-cp31

In [None]:
from langchain.document_loaders import DirectoryLoader

loader = DirectoryLoader("./data",glob = "**/*.pdf")
data = loader.load()

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [None]:
data

[Document(page_content='You Only Look Once (YOLO): Unified, Real-Time Object Detection\n\nPresenter: Shivang Singh\n\nSept 2nd, 2021\n\nCS391R: Robot Learning (Fall 2021)\n\n1\n\nProblem Addressed: Object Detection\n\n❖ Object detection is the problem of both\n\nlocating AND classifying objects\n\n❖ Goal of YOLO algorithm is to do object\n\ndetection both fast AND with high\n\naccuracy\n\n“Deep Learning for Vision Systems” (Elgendy)\n\nCS391R: Robot Learning (Fall 2021)\n\nObject Detection vs Classification\n\n2\n\nImportance of Object Detection for Robotics\n\n❖ Visual modality is very powerful\n\n❖ Humans are able to detect objects and do\n\nVision based vs LIDAR (self driving)\n\nperception using just this modality in real time\n\n(not needing radar)\n\n❖ If we want responsive robot systems that\n\nwork in real time (without specialized\n\nsensors) almost real time vision based object\n\ndetection can help greatly\n\nTesla Investor Day Presentation\n\nCS391R: Robot Learning (Fall 20

## Text Splitting

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
docs = text_splitter.split_documents(data)

In [None]:
docs

[Document(page_content='You Only Look Once (YOLO): Unified, Real-Time Object Detection\n\nPresenter: Shivang Singh\n\nSept 2nd, 2021\n\nCS391R: Robot Learning (Fall 2021)\n\n1\n\nProblem Addressed: Object Detection\n\n❖ Object detection is the problem of both\n\nlocating AND classifying objects\n\n❖ Goal of YOLO algorithm is to do object\n\ndetection both fast AND with high\n\naccuracy\n\n“Deep Learning for Vision Systems” (Elgendy)\n\nCS391R: Robot Learning (Fall 2021)\n\nObject Detection vs Classification\n\n2\n\nImportance of Object Detection for Robotics\n\n❖ Visual modality is very powerful\n\n❖ Humans are able to detect objects and do\n\nVision based vs LIDAR (self driving)\n\nperception using just this modality in real time\n\n(not needing radar)\n\n❖ If we want responsive robot systems that\n\nwork in real time (without specialized\n\nsensors) almost real time vision based object\n\ndetection can help greatly\n\nTesla Investor Day Presentation\n\nCS391R: Robot Learning (Fall 20

In [None]:
len(docs)

10

## Embedding Convertion

In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(openai_api_key= OPENAI_API_KEY)

In [None]:
embeddings

OpenAIEmbeddings(client=<class 'openai.api_resources.embedding.Embedding'>, model='text-embedding-ada-002', deployment='text-embedding-ada-002', openai_api_version='', openai_api_base='', openai_api_type='', openai_proxy='', embedding_ctx_length=8191, openai_api_key='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', openai_organization='', allowed_special=set(), disallowed_special='all', chunk_size=1000, max_retries=6, request_timeout=None, headers=None, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={}, skip_empty=False)

## Vector Database Storage

In [None]:
import weaviate
from langchain.vectorstores import Weaviate

#Connect to weaviate Cluster
auth_config = weaviate.auth.AuthApiKey(api_key = WEAVIATE_API_KEY)
WEAVIATE_URL = WEAVIATE_CLUSTER

client = weaviate.Client(
    url = WEAVIATE_URL,
    additional_headers = {" ": OPENAI_API_KEY},
    auth_client_secret = auth_config,
    startup_period = 10
)

In [None]:
client.is_ready()

True

In [None]:
# define input structure
client.schema.delete_all()
client.schema.get()
schema = {
    "classes": [
        {
            "class": "Chatbot",
            "description": "Documents for chatbot",
            "vectorizer": "text2vec-openai",
            "moduleConfig": {"text2vec-openai": {"model": "ada", "type": "text"}},
            "properties": [
                {
                    "dataType": ["text"],
                    "description": "The content of the paragraph",
                    "moduleConfig": {
                        "text2vec-openai": {
                            "skip": False,
                            "vectorizePropertyName": False,
                        }
                    },
                    "name": "content",
                },
            ],
        },
    ]
}

client.schema.create(schema)
vectorstore = Weaviate(client, "Chatbot", "content", attributes=["source"])

In [None]:
# load text into the vectorstore
text_meta_pair = [(doc.page_content, doc.metadata) for doc in docs]
texts, meta = list(zip(*text_meta_pair))
vectorstore.add_texts(texts, meta)

['9c496404-7515-4b3c-8b8e-88e2dc09dfc8',
 '0c6883df-8242-4011-a51d-13efd4d6dd39',
 '3ad7f0e4-d5b9-4ffc-9fb0-1d4a4b31f610',
 '4845d89b-4e65-4ec3-ac46-e182793e9b68',
 '32df9a02-fdf1-4e71-81b5-1103264f7499',
 '6e2be198-6bb7-4374-8bf8-bc3f42aefd17',
 '5074cbc0-870c-416e-a37c-63a8f28a06bd',
 '59fd3d13-29f3-4001-af59-7fc48a4e1bec',
 'cb52baa4-bd78-4491-801e-21b074f6aa8d',
 '631ccd1a-e35d-4eb4-94ae-8beb3c1f99e4']

## Similarity Measurement

In [None]:
query = "what is a yolo?"

# retrieve text related to the query
docs = vectorstore.similarity_search(query, top_k=20)

In [None]:
docs

[Document(page_content='You Only Look Once (YOLO): Unified, Real-Time Object Detection\n\nPresenter: Shivang Singh\n\nSept 2nd, 2021\n\nCS391R: Robot Learning (Fall 2021)\n\n1\n\nProblem Addressed: Object Detection\n\n❖ Object detection is the problem of both\n\nlocating AND classifying objects\n\n❖ Goal of YOLO algorithm is to do object\n\ndetection both fast AND with high\n\naccuracy\n\n“Deep Learning for Vision Systems” (Elgendy)\n\nCS391R: Robot Learning (Fall 2021)\n\nObject Detection vs Classification\n\n2\n\nImportance of Object Detection for Robotics\n\n❖ Visual modality is very powerful\n\n❖ Humans are able to detect objects and do\n\nVision based vs LIDAR (self driving)\n\nperception using just this modality in real time\n\n(not needing radar)\n\n❖ If we want responsive robot systems that\n\nwork in real time (without specialized\n\nsensors) almost real time vision based object\n\ndetection can help greatly\n\nTesla Investor Day Presentation\n\nCS391R: Robot Learning (Fall 20

# **USING LLM MODEL**

In [None]:
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI

In [None]:
# define chain
chain = load_qa_chain(
    OpenAI(openai_api_key = OPENAI_API_KEY,temperature=0),
    chain_type="stuff")

**if temperature is 0 it is gives correct value**

In [None]:
# create answer
chain.run(input_documents=docs, question=query)

' YOLO is an algorithm for object detection that is unified, real-time, and has high accuracy. It is presented by Shivang Singh in the CS391R: Robot Learning (Fall 2021) course on Sept 2nd, 2021.'