## Dependencies

In [None]:
!pip install weaviate-client

## Connect to Weaviate

In [None]:
import weaviate
from weaviate import WeaviateClient

def connect_to_weaviate(palm_key) -> WeaviateClient:
    # Connect to your local Weaviate instance deployed with Docker
    client = weaviate.connect_to_local(
        headers={
            "X-PALM-Api-Key": palm_key,
        }
    )

    # Option 2
    # Connect to your Weaviate Client Service cluster
    # client = weaviate.connect_to_wcs(
    #     cluster_id="WCS-CLUSTER-ID", # Replace with your WCS cluster ID
    #     auth_credentials=weaviate.AuthApiKey(
    #       api_key="WCS-API-KEY" # Replace with your WCS API KEY
    #     ),
    #     headers={
    #       "X-PALM-Api-Key": key
    #     }
    # )

    print(client.is_ready())

    return client

### Expired Google Cloud Token

The Google Cloud's OAuth 2.0 access tokens only have a **one** hour lifetime. This means you have to replace the expired token with a valid one and it to Weaviate by re-instantiating the client. 

#### Option 1: With Google Cloud CLI

In [None]:
import subprocess

def refresh_token_with_GC_CLI() -> str:
    result = subprocess.run(["gcloud", "auth", "print-access-token"], capture_output=True, text=True)
    if result.returncode != 0:
        print(f"Error refreshing token: {result.stderr}")
        return None
    return result.stdout.strip()

Then you could run the below cell periodically.

In [None]:
# Run every 60 minutes
token = refresh_token_with_GC_CLI
client = connect_to_weaviate(token)

#### Option 2: With `google-auth`

See the links to google-auth in [Python](https://google-auth.readthedocs.io/en/master/index.html) and [Node.js](https://cloud.google.com/nodejs/docs/reference/google-auth-library/latest) libraries.

In [None]:
from google.auth.transport.requests import Request
from google.oauth2.service_account import Credentials

def get_credentials() -> Credentials:
    credentials = Credentials.from_service_account_file('path/to/your/service-account.json', scopes=['openid'])
    request = Request()
    credentials.refresh(request)
    return credentials

Then run the below periodically:

In [None]:
# Run every 60 minutes
credentials = get_credentials()
client = connect_to_weaviate(credentials.token)

## Create a collection
> Collection stores your data and vector embeddings.

In [None]:
# Note: in practice, you shouldn't rerun this cell, as it deletes your data
# in "JeopardyQuestion", and then you need to re-import it again.
import weaviate.classe.config as wc

# Delete the collection if it already exists
if (client.collections.exists("JeopardyQuestion")):
    client.collections.delete("JeopardyQuestion")

client.collections.create(
    name="JeopardyQuestion",

    vectorizer_config=wc.Configure.Vectorizer.text2vec_palm( # specify the vectorizer and model type you're using
        project_id="YOUR-GOOGLE-CLOUD-PROJECT-ID", # required. replace with your value: (e.g. "cloud-large-language-models")
        api_endpoint="YOUR-GOOGLE-CLOUD-PROJECT-ID", # required. replace with your value: (e.g. "cloud-large-language-models")
        # api_endpoint="generativelanguage.googleapis.com", # This is the endpoint for Google MakerSuite
        model_id="YOUR-GOOGLE-CLOUD-MODEL-ID" # optional. defaults to "textembedding-gecko".
    ),

    properties=[ # defining properties (data schema) is optional
        wc.Property(name="Question", data_type=wc.DataType.TEXT), 
        wc.Property(name="Answer", data_type=wc.DataType.TEXT),
        wc.Property(name="Category", data_type=wc.DataType.TEXT, skip_vectorization=True), 
    ]
)

print("Successfully created collection: JeopardyQuestion.")

## Import the Data

In [None]:
import requests, json
url = 'https://raw.githubusercontent.com/weaviate/weaviate-examples/main/jeopardy_small_dataset/jeopardy_tiny.json'
resp = requests.get(url)
data = json.loads(resp.text)

# Get a collection object for "JeopardyQuestion"
jeopardy = client.collections.get("JeopardyQuestion")

# Insert data objects
response = jeopardy.data.insert_many(data)

# Note, the `data` array contains 10 objects, which is great to call insert_many with.
# However, if you have a milion objects to insert, then you should spit them into smaller batches (i.e. 100-1000 per insert)

if (response.has_errors):
    print(response.errors)
else:
    print("Insert complete.")

## Hybrid Search

The `alpha` parameter determines the weight given to the sparse and dense search methods. `alpha = 0` is pure sparse (bm25) search, whereas `alpha = 1` is pure dense (vector) search. 

Alpha is an optional parameter. The default is set to `0.75`.

### Hybrid Search only

The below query is finding Jeopardy questions about animals and is limiting the output to only two results. Notice `alpha` is set to `0.80`, which means it is weighing the vector search results more than bm25. If you were to set `alpha = 0.25`, you would get different results. 

In [None]:
# note, you can reuse the collection object from the previous cell.
# Get a collection object for "JeopardyQuestion"
jeopardy = client.collections.get("JeopardyQuestion")

response = jeopardy.query.hybrid(
    query="northern beast",
    alpha=0.8,
    limit=3
)

for item in response.objects:
    print("ID:", item.uuid)
    print("Data:", json.dumps(item.properties, indent=2), "\n")

### Hybrid Search on a specific property

The `properties` parameter allows you to list the properties that you want bm25 to search on.

In [None]:
response = jeopardy.query.hybrid(
    query="northern beast",
    query_properties=["question"],
    alpha=0.8,
    limit=3
)

for item in response.objects:
    print("ID:", item.uuid)
    print("Data:", json.dumps(item.properties, indent=2), "\n")

### Hybrid Search with a `where` filter

Find Jeopardy questions about elephants, where the category is set to Animals.

In [None]:
import weaviate.classes.query as wq # wq is an alias to save us from typing weaviate.classes everywhere ;)

response = jeopardy.query.hybrid(
    query="northern beast",
    alpha=0.8,
    filters=wq.Filter("category").equal("Animals"),
    limit=3
)

for item in response.objects:
    print("ID:", item.uuid)
    print("Data:", json.dumps(item.properties, indent=2), "\n")

### Hybrid Search with a custom vector

You can pass in your own vector as input into the hybrid query, by using the `vector` parameter. 

In [None]:
vector = [-0.0125526935, -0.021168863, ...]

response = jeopardy.query.hybrid(
    query="animal",
    vector=vector,
    limit=2
)

for item in response.objects:
    print("ID:", item.uuid)
    print("Data:", json.dumps(item.properties, indent=2), "\n")