# Weaviate Quickstart Tutorial

## Step 0: Install Python client library

In [47]:
#!pip install weaviate-client==3.25.3 --quiet
#!pip install rich --quiet

## Step 1: Create a Weaviate databse (using the Embedded option). 
For this tutorial we'll use the Weaviate Embedded as our datastore, but for the actual course we'll switch to a clouded hosted instance on Weaviate Cloud Services (WCS)

In [46]:
import weaviate
from weaviate.embedded import EmbeddedOptions
from rich import print # nice library for pretty printing

# for an explanation of what's going on here see: https://weaviate.io/developers/weaviate/installation/embedded#starting-embedded-weaviate-under-the-hood
client = weaviate.Client(embedded_options=EmbeddedOptions())

embedded weaviate is already listening on port 8079


## Step 2: Create a Class aka an "Index"

In [9]:
class_name = "Question"

class_obj = {
    "class": class_name,
    "vectorizer": "none",  # If set to "none" you must always provide vectors yourself.
}

client.schema.create_class(class_obj)

Embedded weaviate wasn't listening on port 8079, so starting embedded weaviate again
Started /home/elastic/.cache/weaviate-embedded: process ID 1370687


{"action":"startup","default_vectorizer_module":"none","level":"info","msg":"the default vectorizer modules is set to \"none\", as a result all new schema classes without an explicit vectorizer setting, will use this vectorizer","time":"2023-11-25T00:22:12Z"}
{"action":"startup","auto_schema_enabled":true,"level":"info","msg":"auto schema enabled setting is set to \"true\"","time":"2023-11-25T00:22:12Z"}
{"action":"hnsw_vector_cache_prefill","count":3000,"index_id":"wine_NsJ5EuCxvoPp","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2023-11-25T00:22:12Z","took":72497}
{"action":"grpc_startup","level":"info","msg":"grpc server listening at [::]:50060","time":"2023-11-25T00:22:12Z"}
{"action":"restapi_management","level":"info","msg":"Serving weaviate at http://127.0.0.1:8079","time":"2023-11-25T00:22:12Z"}
{"action":"hnsw_vector_cache_prefill","count":1000,"index_id":"question_htVCwMtsXzQ7","level":"info","limit":1000000000000,"msg":"prefilled vector cache","t

## Step 3: Import Data from Github

Import a very tiny dataset (n=10) of Jeopardy questions and their associated embeddings

In [34]:
import requests
import json

fname = "jeopardy_tiny_with_vectors.json"  # This file includes pre-generated vectors
url = f'https://raw.githubusercontent.com/americanthinker/vector_search_applications_PUBLIC/master/{fname}'
resp = requests.get(url)
data = json.loads(resp.text)  # Load data

## Step 4: Index Data on Weaviate DB
Use the client batch class to quickly index the data

In [35]:
client.batch.configure(batch_size=100)  # Configure batch
with client.batch as batch:  # Configure a batch process
    for i, d in enumerate(data, 1):  # Batch import all Questions
        print(f"importing question: {i}")
        properties = {
            "answer": d["Answer"],
            "question": d["Question"],
            "category": d["Category"],
        }
        batch.add_data_object(
            data_object=properties,
            class_name="Question",
            vector=d["vector"]  # Add custom vector
        )

importing question: 1
importing question: 2
importing question: 3
importing question: 4
importing question: 5
importing question: 6
importing question: 7
importing question: 8
importing question: 9
importing question: 10


### View data on Weaviate host
You should see an `objectCount` of 10 and a `vectorIndexingStatus` of `READY`

In [42]:
client.cluster.get_nodes_status()[0]['shards']

[{'class': 'Question',
  'name': 'htVCwMtsXzQ7',
  'objectCount': 10,
  'vectorIndexingStatus': 'READY',
  'vectorQueueLength': 0}]

# Step 5: Queries

### Download Embedding Model to embed queries

In [43]:
from sentence_transformers import SentenceTransformer
model_path = 'sentence-transformers/all-MiniLM-L6-v2'
model = SentenceTransformer(model_path)

### Execute Vector/Semantic search

In [62]:
#create text query
query = 'biological'
#embed query as a vector for comparison with vectors in datastore
query_vector = model.encode(query)

response = (
    client.query
    .get(class_name, ["question", "answer", "category"]) #direct query at Class that we created previously and set properties to display in response
    .with_near_vector({"vector": query_vector}) # pass in query_vector
    .with_limit(3) # limit # of results to n
    .do()  # execute query
)

In [63]:
print(response)

### Execute BM25 search

In [72]:
response = (
    client.query
    .get(class_name, ["question", "answer", "category"])
    .with_bm25(
      query="glycogen"
    )
    .with_limit(3)
    .do()
)

In [75]:
print(response)

# Conclusion
This was a quick overview of Weaviate capabilities, the purpose of this tutorial is simply to familiarize someone with the Weaviate API if they've never used it before or haven't touched it in a while, in prepartion for the [Vector Search Applications with LLMs course](https://uplimit.com/course/vector-search-apps). 