# Initial Setup

## Install Weaviate Python Client v4
> This notebook was created with Weaviate `1.24` and the Weaviate Client `4.5`

Run the below command to install the latest version of the Weaviate Python Client v4.

In [None]:
!pip install -U weaviate-client

## Deploy Weaviate

Weaviate offers 3 deployment options:
* Embedded
* Self-hosted - with Docker Compose
* Cloud deployment - [Weaviate Cloud Service](https://console.weaviate.cloud/)

# Time to Build

## Connect to Weaviate

* If you are new to OpenAI, register at [https://platform.openai.com](https://platform.openai.com/) and head to [https://platform.openai.com/api-keys](https://platform.openai.com/api-keys) to create your API key.
* If you are new to Cohere, register at [https://cohere.com](https://https://cohere.com) and head to [https://dashboard.cohere.com/api-keys](https://dashboard.cohere.com/api-keys) to create your API key.

In [None]:
import weaviate, os, json

# Connect with Weaviate Embedded
# client = weaviate.connect_to_embedded(
#     version="1.24.0",
#     headers={
#         "X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY"), # Replace with your inference API key
#         # "X-Cohere-Api-Key": os.getenv("COHERE_API_KEY"), # Replace with your inference API key
#     })

# Connect to a cloud instance of Weaviate (with WCS)
# client = weaviate.connect_to_wcs(
#     cluster_url=os.getenv("WCS_MM_DEMO_URL"),
#     auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WCS_MM_DEMO_KEY")),
#     headers={
#         "X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY"), # Replace with your inference API key
#         "X-Cohere-Api-Key": os.getenv("COHERE_API_KEY"), # Replace with your inference API key
#     }
# )

# Connect to the local instance deployed with Docker Compose
client = weaviate.connect_to_local(
    headers={
        "X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY"), # Replace with your inference API key
        "X-Cohere-Api-Key": os.getenv("COHERE_API_KEY"), # Replace with your inference API key

        "X-AWS-Access-Key": os.getenv("AWS_ACCESS_KEY"),
        "X-AWS-Secret-Key": os.getenv("AWS_SECRET_KEY"),
    }
)

client.is_ready()

In [None]:
import weaviate, os, json

# Connect to a cloud instance of Weaviate (with WCS)
client = weaviate.connect_to_wcs(
    cluster_url=os.getenv("WORKSHOP_DEMO_URL"),
    auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WORKSHOP_DEMO_KEY_ADMIN")),

    headers={
        "X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY"), # Replace with your inference API key
        "X-Cohere-Api-Key": os.getenv("COHERE_API_KEY"), # Replace with your inference API key
        
        "X-AWS-Access-Key": os.getenv("AWS_ACCESS_KEY"),
        "X-AWS-Secret-Key": os.getenv("AWS_SECRET_KEY"),
    }
)

client.is_ready()

## Create a collection
[Weaviate Docs - collection creation and configuration](https://weaviate.io/developers/weaviate/configuration/schema-configuration)

In [None]:
from weaviate.classes.config import Configure

if client.collections.exists("Jeopardy"):
    client.collections.delete("Jeopardy")

# Create a collection here - with Cohere as a vectorizer
client.collections.create(
    name="Jeopardy",
    vectorizer_config=Configure.Vectorizer.text2vec_openai()
)

In [10]:
from weaviate.classes.config import Configure

if client.collections.exists("Jeopardy"):
    client.collections.delete("Jeopardy")

# Create a collection here
client.collections.create(
    name="Jeopardy",

    vectorizer_config=Configure.Vectorizer.text2vec_cohere(),

    # # Option 2 - Use Cohere embedding model through AWS Bedrock
    # vectorizer_config=Configure.Vectorizer.text2vec_aws(
    #     model="cohere.embed-english-v3",
    #     region="us-east-1"
    # ),

    # Option 3 - Use Titan Embed model 
    # vectorizer_config=Configure.Vectorizer.text2vec_aws(
    #     model="amazon.titan-embed-text-v1",
    #     # region="eu-central-1",
    #     region="us-east-1",
    # ),
)

<weaviate.collections.collection.Collection at 0x10e53ca90>

## Import data

### Sample Data

In [33]:
import json
data_10 = json.load(open("./jeopardy_tiny.json"))

print(json.dumps(data_10, indent=2))

[
  {
    "Category": "SCIENCE",
    "Question": "This organ removes excess glucose from the blood & stores it as glycogen",
    "Answer": "Liver"
  },
  {
    "Category": "ANIMALS",
    "Question": "It's the only living mammal in the order Proboseidea",
    "Answer": "Elephant"
  },
  {
    "Category": "ANIMALS",
    "Question": "The gavial looks very much like a crocodile except for this bodily feature",
    "Answer": "the nose or snout"
  },
  {
    "Category": "ANIMALS",
    "Question": "Weighing around a ton, the eland is the largest species of this animal in Africa",
    "Answer": "Antelope"
  },
  {
    "Category": "ANIMALS",
    "Question": "Heaviest of all poisonous snakes is this North American rattlesnake",
    "Answer": "the diamondback rattler"
  },
  {
    "Category": "SCIENCE",
    "Question": "2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classification",
    "Answer": "species"
  },
  {
    "Category": "SCIENCE",
   

  data_10 = json.load(open("./jeopardy_tiny.json"))


### Insert Many
[Weaviate Docs - insert many](https://weaviate.io/developers/weaviate/manage-data/import)

In [36]:
# Insert data
jeopardy = client.collections.get("Jeopardy")
jeopardy.data.insert_many(data_10)

BatchObjectReturn(all_responses=[UUID('759970c3-1d71-4116-bc17-cfed11a4b0df'), UUID('88c4baef-fe6a-4351-9432-7d4700b86792'), UUID('4773ca71-9fc3-4d00-8ad1-98d0b1c91524'), UUID('87b308ce-968b-43f3-bbae-b0211ff35c89'), UUID('cb05cd2f-e8af-4aac-8b8a-34bc90e43728'), UUID('0b2ee289-25dd-4a3e-97ac-52b37e0452ad'), UUID('e2f9f12d-4c11-491e-97fe-70c94282569d'), UUID('b237d617-bcb3-4b1b-aa06-cadf2fed3c70'), UUID('cc054de2-26e8-46a9-a873-6a7bbbb02347'), UUID('ace4da94-dbf8-405e-b64f-c37b3cff621d')], elapsed_seconds=0.7886521816253662, errors={}, uuids={0: UUID('759970c3-1d71-4116-bc17-cfed11a4b0df'), 1: UUID('88c4baef-fe6a-4351-9432-7d4700b86792'), 2: UUID('4773ca71-9fc3-4d00-8ad1-98d0b1c91524'), 3: UUID('87b308ce-968b-43f3-bbae-b0211ff35c89'), 4: UUID('cb05cd2f-e8af-4aac-8b8a-34bc90e43728'), 5: UUID('0b2ee289-25dd-4a3e-97ac-52b37e0452ad'), 6: UUID('e2f9f12d-4c11-491e-97fe-70c94282569d'), 7: UUID('b237d617-bcb3-4b1b-aa06-cadf2fed3c70'), 8: UUID('cc054de2-26e8-46a9-a873-6a7bbbb02347'), 9: UUID('ac

### Data preview

In [37]:
# Show data preview
jeopardy = client.collections.get("Jeopardy")
response = jeopardy.query.fetch_objects(limit=4)

for item in response.objects:
    print(item.uuid, item.properties)

0b2ee289-25dd-4a3e-97ac-52b37e0452ad {'answer': 'species', 'question': "2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classification", 'category': 'SCIENCE'}
4773ca71-9fc3-4d00-8ad1-98d0b1c91524 {'answer': 'the nose or snout', 'question': 'The gavial looks very much like a crocodile except for this bodily feature', 'category': 'ANIMALS'}
759970c3-1d71-4116-bc17-cfed11a4b0df {'answer': 'Liver', 'question': 'This organ removes excess glucose from the blood & stores it as glycogen', 'category': 'SCIENCE'}
87b308ce-968b-43f3-bbae-b0211ff35c89 {'answer': 'Antelope', 'question': 'Weighing around a ton, the eland is the largest species of this animal in Africa', 'category': 'ANIMALS'}


In [38]:
# Show data preview - with vectors
jeopardy = client.collections.get("Jeopardy")
response = jeopardy.query.fetch_objects(
    limit=4,
    include_vector=True
)

for item in response.objects:
    print(item.properties)
    print(item.vector, '\n')

{'answer': 'species', 'question': "2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classification", 'category': 'SCIENCE'}
{'default': [-0.004252561368048191, 0.006941039115190506, -0.006847059819847345, -0.01008263137191534, -7.703991286689416e-05, 0.014392251148819923, -0.003017405280843377, -0.016580626368522644, -0.0220582727342844, -0.024837374687194824, -0.009619448333978653, 0.010183323174715042, -0.008914603851735592, -0.003384931478649378, 0.014956126920878887, 0.03445011004805565, 0.02945578284561634, 0.012599932961165905, 0.026757236570119858, -0.009491904638707638, -0.021910591050982475, 0.02365592122077942, -0.0016891092527657747, -0.02270270325243473, 0.010250451974570751, 0.009384499862790108, 0.002941886428743601, -0.012613358907401562, -0.018205124884843826, 0.021897166967391968, -0.005544776096940041, 0.028354883193969727, -0.004853357095271349, -0.02744194306433201, -0.016392666846513748, -0.01960138790309429, 0.0032

### Super quick query example

In [None]:
response = jeopardy.query.near_text(
    # "Zwierzęta afrykańskie", #African animals in Polish
    # "アフリカの動物", #African animals in Japanese
    query="Afrikan animals",
    limit=2
)

for item in response.objects:
    print(item.properties)

## Create a collection with a Generative module

In [32]:
# new collection with 1k objects and OpenAI vectorizer and generative model

from weaviate.classes.config import Configure, Property, DataType

if client.collections.exists("Questions"):
    client.collections.delete("Questions")

# Create a collection here - with Cohere as a vectorizer
client.collections.create(
    name="Questions",
    vectorizer_config=Configure.Vectorizer.text2vec_openai(),
    generative_config=Configure.Generative.openai(model="gpt-4"),

    properties=[  # Define properties (Optional)
        Property(name="question", data_type=DataType.TEXT),
        Property(name="answer", data_type=DataType.TEXT),
        Property(name="category", data_type=DataType.TEXT, skip_vectorization=True),
        Property(name="round", data_type=DataType.TEXT, skip_vectorization=True),
        Property(name="points", data_type=DataType.NUMBER),
        Property(name="airDate", data_type=DataType.DATE),
    ],
)

<weaviate.collections.collection.Collection at 0x10ecbd190>

In [None]:
# from weaviate.classes.config import Property, Configure, DataType

# client.collections.create(
#     name="Jeopardy",

#     vectorizer_config=[
#         Configure.Vectorizer.text2vec_openai(
#             name="question-vector",
#             source_properties=["question"]
#         ),

#         Configure.Vectorizer.text2vec_openai(
#             name="long-vector",
#             source_properties=["question", "answer", "category"]
#         ),
#     ],

#     properties=[  # Define properties (Optional)
#         Property(name="question", data_type=DataType.TEXT),
#         Property(name="answer", data_type=DataType.TEXT),
#         Property(name="category", data_type=DataType.TEXT, skip_vectorization=True),
#         Property(name="round", data_type=DataType.TEXT, skip_vectorization=True),
#         Property(name="points", data_type=DataType.NUMBER),
#         Property(name="airDate", data_type=DataType.DATE),
#     ],

# )

### Import data - 1k objects

In [None]:
import json
data_1k = json.load(open("./jeopardy_1k.json"))

print(json.dumps(data_1k, indent=2))

In [None]:
# Insert data
questions = client.collections.get("Questions")

with questions.batch.dynamic() as batch:
    for item in data_1k:
        batch.add_object(item)

if(len(questions.batch.failed_objects)>0):
    print("Import complete with errors")
    for err in questions.batch.failed_objects:
        print(err)
else:
    print("Import complete with no errors")

# questions.data.insert_many(data_1k)