## Early user testing for the Weaviate 'collections' client

Welcome!

We have been working on a new (and hopefully improved) API for our Python client. We are excited for you to try it out and provide feedback for us.

### Installation

This version of the client is on a different branch in GH.

If you don't want this to affect your current workflow - we recommend you create a **new environment** (whether venv, or Conda/Mamba). If you don't, make sure to uninstall it after and install the official release.

In your desired environment, install it with:

```shell
pip install -U "git+https://github.com/weaviate/weaviate-python-client.git@pydantic_experiment#egg=weaviate-client[GRPC]"
```

Use this docker-compose file:

```yaml
---
version: '3.4'
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: semitechnologies/weaviate:preview-error-without-module-c10476a
    restart: on-failure:0
    ports:
     - "8080:8080"
     - "50051:50051"
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      QUERY_MAXIMUM_RESULTS: 10000
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-openai'
      ENABLE_MODULES: 'text2vec-openai,text2vec-cohere,generative-openai'
      CLUSTER_HOSTNAME: 'node1'
...
```

And spin up a container with `docker compose up -d`

### Key ideas

We are calling this the *'collections'* client, because many of the data interactions will be at the collections (i.e. Weaviate *'Class'*) level. So, instantiate the client and then instantiate a collection like this:

In [28]:
from weaviate import Config
import weaviate
import os

client = weaviate.Client(
    "http://localhost:8080",
    additional_config=Config(grpc_port_experimental=50055),
    additional_headers={
        "X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"],  # Replace with yours, or ask me for a key ;)
    },
)

client.is_ready()

True

### Help is here!

We've created objects to help with lots of things here.

You'll notice below that class definitions are done through the `CollectionConfig` class, configurations in `Text2VecOpenAIConfig`, and so on.

You can import them individually, like so:

```
from weaviate.weaviate_classes import CollectionConfig, Vectorizer, VectorDistance
```

I (JP) personally import the set of classes like this:

```
import weaviate.weaviate_classes as wvc
```

To prep, delete any existing classes with the same name like so:

In [29]:
for collection_name in ["TestArticle", "TestAuthor"]:
    client.collection.delete(collection_name)

### Class creation

In [30]:
import weaviate.weaviate_classes as wvc

articles = client.collection.create(
    name="TestArticle",
    properties=[
        wvc.Property(
            name="title",
            data_type=wvc.DataType.TEXT,
        ),
        # Get the user to create these properties
        wvc.Property(
            name="body",
            data_type=wvc.DataType.TEXT,
        ),
        wvc.Property(
            name="url",
            data_type=wvc.DataType.TEXT,
            tokenization=wvc.Tokenization.FIELD,
            vectorizer_config=wvc.PropertyVectorizerConfig()
        ),
        # =====
    ],
    vectorizer_config=wvc.VectorizerFactory.text2vec_openai(),
    # vectorizer_config=wvc.VectorizerFactory.text2vec_cohere(
    #     model="embed-multilingual-v2.0"
    # ),
    replication_config=wvc.ConfigFactory.replication(factor=1),
    # Get the user to create this config
    inverted_index_config=wvc.ConfigFactory.inverted_index(
        index_property_length=True
    )
    # =====
)

authors = client.collection.create(
    name="TestAuthor",
    properties=[
        wvc.Property(
            name="name",
            data_type=wvc.DataType.TEXT,
        ),
        # Get the user to create this
        wvc.Property(
            name="birth_year",
            data_type=wvc.DataType.INT,
        ),
        # Get the user to create this
        wvc.ReferenceProperty(name="wroteArticle", target_collection="TestArticle")
    ],
    # Get the user to add this vectorizer
    vectorizer_config=wvc.VectorizerFactory.text2vec_openai(),
    # vectorizer_config=wvc.VectorizerFactory.text2vec_contextionary(),
)

print(client.collection.exists("TestAuthor"))
print(client.collection.exists("TestArticle"))

True
True


In your IDE, you should now see IntelliSense autocompletes through the `articles` / `authors` objects - the two key subsets are:

-  `data`: CRUD operations
-  `query`: Search operations (old GraphQL)

You can also create an object from existing collections like this:

In [31]:
articles = client.collection.get("TestArticle")
authors = client.collection.get("TestAuthor")

### CRUD operations

#### Add objects (single)

Adding objects is done with the `insert` method.

The pattern can be very similar to what you've done before. (You can specify a UUID as you have done before!)

In [46]:
# user_test.py
my_first_obj = {
    "title": "Something something dark side",
    "body": "A long long time ago, in a galaxy far, far away...",
    "url": "http://www.starwars.com"
}

article_uuid = articles.data.insert(my_first_obj)
print(article_uuid)

c132fd43-4b5f-44bb-a97c-6fa23a2572a3


In [47]:
# user_test.py
author_uuid = authors.data.insert(
    {
        "name": "G Lucas",
        "birth_year": 1944,
        "wroteArticle": wvc.ReferenceFactory.to(uuids=[article_uuid])
    }
)

The returend object is a UUID type

In [33]:
print(type(uuid))
print(uuid)

<class 'uuid.UUID'>
4ca98885-b211-47ac-9cca-9825384b50c6


Notice that you didn't have to specify a class, because you're working with a class

In [34]:
uuid = authors.data.insert(
    {
        "name": "G Lucas",
        "birth_year": 1944,
        "wroteArticle": wvc.ReferenceFactory.to(uuids=[uuid])
    }
)

#### Add objects (batch)

In [48]:
# user_test.py
articles_to_add = [
    wvc.DataObject(
        properties={
            "title": f"The best restaurants of {1980+i}:",
            "body": "1. McDonald's, 2. ...",
        },
    )
    for i in range(5)
]

response = articles.data.insert_many(articles_to_add)
print(response)

_BatchReturn(all_responses=[UUID('dafefb4a-b737-4bdd-aa87-7c6cf13fdd17'), UUID('045c11f8-f4c7-4b9c-b8f1-7755f975c593'), UUID('f6441360-a121-4e4c-92ac-20b137d1ce4e'), UUID('01cf04d4-d066-4871-8a17-204356e5070e'), UUID('0f15e550-cca9-4494-8bca-0f7f0a195e9f')], uuids={0: UUID('dafefb4a-b737-4bdd-aa87-7c6cf13fdd17'), 1: UUID('045c11f8-f4c7-4b9c-b8f1-7755f975c593'), 2: UUID('f6441360-a121-4e4c-92ac-20b137d1ce4e'), 3: UUID('01cf04d4-d066-4871-8a17-204356e5070e'), 4: UUID('0f15e550-cca9-4494-8bca-0f7f0a195e9f')}, errors={}, has_errors=False)


In [49]:
# user_test.py
authors_to_add = [
    wvc.DataObject(
        properties={
            "name": f"Jim {i+1}",
            "birth_year": 1970 + i,
            "wroteArticle": wvc.ReferenceFactory.to(uuids=[article_uuid])
        },
        # vector=CUSTOM_VECTOR_HERE,
        # uuid=CUSTOM_UUID_HERE
    )
    for i in range(5)
]

authors.data.insert_many(authors_to_add)

_BatchReturn(all_responses=[UUID('8c033488-c6d7-49c2-be1f-995af5bf08ff'), UUID('2ca8fa5c-5bec-4ad3-a591-b2d694d07987'), UUID('e771130e-a461-4124-9727-66c965f7221e'), UUID('6055c796-d187-4b27-baf1-aeee5cf8a4d6'), UUID('c6b776b0-026b-4ea0-a437-e8514475e89e')], uuids={0: UUID('8c033488-c6d7-49c2-be1f-995af5bf08ff'), 1: UUID('2ca8fa5c-5bec-4ad3-a591-b2d694d07987'), 2: UUID('e771130e-a461-4124-9727-66c965f7221e'), 3: UUID('6055c796-d187-4b27-baf1-aeee5cf8a4d6'), 4: UUID('c6b776b0-026b-4ea0-a437-e8514475e89e')}, errors={}, has_errors=False)

In [42]:
articles_to_add = [
    wvc.DataObject(
        properties={
            "title": f"The best restaurants of {1980+i}:",
            "body": "1. McDonald's, 2. ...",
            "url": str(i) if i != 2 else i
        },
    )
    for i in range(5)
]

response = articles.data.insert_many(articles_to_add)
print(response)

_BatchReturn(all_responses=[UUID('ee4fac05-8f6a-4730-8a8e-9afe77442e17'), UUID('5cff2531-06cb-4dcc-99e4-5e2a92555d2c'), Error(message="invalid text property 'url' on class 'TestArticle': not a string, but float64", code=None, original_uuid=None), UUID('0b578371-4b0e-4ac5-8c56-937d01967ade'), UUID('651bfa60-e587-48c3-963d-ef26a6bbe8d8')], uuids={0: UUID('ee4fac05-8f6a-4730-8a8e-9afe77442e17'), 1: UUID('5cff2531-06cb-4dcc-99e4-5e2a92555d2c'), 3: UUID('0b578371-4b0e-4ac5-8c56-937d01967ade'), 4: UUID('651bfa60-e587-48c3-963d-ef26a6bbe8d8')}, errors={2: Error(message="invalid text property 'url' on class 'TestArticle': not a string, but float64", code=None, original_uuid=None)}, has_errors=True)


In [44]:
print(response.uuids)

{0: UUID('ee4fac05-8f6a-4730-8a8e-9afe77442e17'), 1: UUID('5cff2531-06cb-4dcc-99e4-5e2a92555d2c'), 3: UUID('0b578371-4b0e-4ac5-8c56-937d01967ade'), 4: UUID('651bfa60-e587-48c3-963d-ef26a6bbe8d8')}


In [43]:
print(response.errors)

{2: Error(message="invalid text property 'url' on class 'TestArticle': not a string, but float64", code=None, original_uuid=None)}


In [7]:
authors_to_add = [
    wvc.DataObject(
        properties={
            "name": f"Jim {i+1}",
            "birth_year": 1970 + i,
            "wroteArticle": wvc.ReferenceFactory.to(uuids=[uuid])
        },
    )
    for i in range(10)
]

authors.data.insert_many(authors_to_add)

_BatchReturn(all_responses=[UUID('cdb9c90d-6742-4955-931f-7354b8b79dae'), UUID('1b17f12f-4f57-42d6-9c86-120c3f1e9514'), UUID('163e3d9f-879c-4925-89b2-8f66a57ba440'), UUID('b14ffad5-afdd-4dba-8bf8-2c03a00eff51'), UUID('1a705870-eba4-4a08-90ba-7e00b5bb9d8e'), UUID('c738ff11-da61-4906-b99d-3cf31a476b73'), UUID('88330512-93b4-4e51-b312-db1202d929f9'), UUID('60d8a325-0c89-4612-ae59-d12e231c44eb'), UUID('fe9993ae-9eb0-4b80-989d-bdd371718395'), UUID('a990a2c7-6a42-4ebd-80bd-150c5989ea25')], uuids={0: UUID('cdb9c90d-6742-4955-931f-7354b8b79dae'), 1: UUID('1b17f12f-4f57-42d6-9c86-120c3f1e9514'), 2: UUID('163e3d9f-879c-4925-89b2-8f66a57ba440'), 3: UUID('b14ffad5-afdd-4dba-8bf8-2c03a00eff51'), 4: UUID('1a705870-eba4-4a08-90ba-7e00b5bb9d8e'), 5: UUID('c738ff11-da61-4906-b99d-3cf31a476b73'), 6: UUID('88330512-93b4-4e51-b312-db1202d929f9'), 7: UUID('60d8a325-0c89-4612-ae59-d12e231c44eb'), 8: UUID('fe9993ae-9eb0-4b80-989d-bdd371718395'), 9: UUID('a990a2c7-6a42-4ebd-80bd-150c5989ea25')}, errors={}, ha

### Queries

In [45]:
authors_to_add = [
    wvc.DataObject(
        properties={
            "name": f"Jim {i+1}",
            "birth_year": 1970 + i,
            "wroteArticle": wvc.ReferenceFactory.to(uuids=[uuid])
        },
        # vector=[0.05] * 100
        uuid=
    )
    for i in range(10)
]

authors.data.insert_many(authors_to_add)

NameError: name 'article_uuid' is not defined

In [51]:
response = articles.query.get(limit=2)

print(response)

_QueryReturn(objects=[_Object(properties={'title': 'The best restaurants of 1983:', 'body': "1. McDonald's, 2. ..."}, metadata=_MetadataReturn(uuid=UUID('01cf04d4-d066-4871-8a17-204356e5070e'), vector=None, creation_time_unix=1694795353140, last_update_time_unix=1694795353140, distance=None, certainty=None, score=0.0, explain_score='', is_consistent=False, generative=None)), _Object(properties={'title': 'The best restaurants of 1981:', 'body': "1. McDonald's, 2. ..."}, metadata=_MetadataReturn(uuid=UUID('045c11f8-f4c7-4b9c-b8f1-7755f975c593'), vector=None, creation_time_unix=1694795353140, last_update_time_unix=1694795353140, distance=None, certainty=None, score=0.0, explain_score='', is_consistent=False, generative=None))])


Get objects like this:

In [53]:
response = articles.query.get(
    limit=2,
    return_properties=["title"],
    return_metadata=wvc.MetadataQuery(uuid=True)
)

print(response)

_QueryReturn(objects=[_Object(properties={'title': 'The best restaurants of 1983:'}, metadata=_MetadataReturn(uuid=UUID('01cf04d4-d066-4871-8a17-204356e5070e'), vector=None, creation_time_unix=None, last_update_time_unix=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=False, generative=None)), _Object(properties={'title': 'The best restaurants of 1981:'}, metadata=_MetadataReturn(uuid=UUID('045c11f8-f4c7-4b9c-b8f1-7755f975c593'), vector=None, creation_time_unix=None, last_update_time_unix=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=False, generative=None))])


In [10]:
response = authors.query.get(
    limit=2,
    return_properties=["name", "birth_year", wvc.LinkTo(link_on="wroteArticle", return_properties="title")],
    return_metadata=wvc.MetadataQuery(uuid=True)
)

print(response)

_QueryReturn(objects=[_Object(properties={'birth_year': 1977.0, 'name': 'Jim 8', 'wroteArticle': <weaviate.collection.classes.internal.ReferenceFactory object at 0x1056e4a30>}, metadata=_MetadataReturn(uuid=UUID('0ede32b6-b5a7-4105-91e9-62814776f465'), vector=None, creation_time_unix=None, last_update_time_unix=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=False, generative=None)), _Object(properties={'birth_year': 1972.0, 'name': 'Jim 3', 'wroteArticle': <weaviate.collection.classes.internal.ReferenceFactory object at 0x105690b20>}, metadata=_MetadataReturn(uuid=UUID('163e3d9f-879c-4925-89b2-8f66a57ba440'), vector=None, creation_time_unix=None, last_update_time_unix=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=False, generative=None))])


In [75]:
response = authors.query.get()

for o in response.objects:
    print(o.properties["birth_year"])

1944.0
1971.0
1973.0
1970.0
1944.0
1974.0
1972.0


In [76]:
response = authors.query.get(filters=wvc.Filter(path=["birth_year"]).greater_than_equal(1971))

for o in response.objects:
    print(o.properties["birth_year"])

1974.0
1971.0
1973.0
1972.0


In [65]:
response.objects

[_Object(properties={'name': 'G Lucas', 'birth_year': 1944.0}, metadata=_MetadataReturn(uuid=UUID('00a8b3c3-ecb4-4eeb-ae75-a7b8557b3668'), vector=None, creation_time_unix=1694795345107, last_update_time_unix=1694795345107, distance=None, certainty=None, score=0.0, explain_score='', is_consistent=False, generative=None)),
 _Object(properties={'birth_year': 1971.0, 'name': 'Jim 2'}, metadata=_MetadataReturn(uuid=UUID('2ca8fa5c-5bec-4ad3-a591-b2d694d07987'), vector=None, creation_time_unix=1694795364869, last_update_time_unix=1694795364869, distance=None, certainty=None, score=0.0, explain_score='', is_consistent=False, generative=None)),
 _Object(properties={'birth_year': 1973.0, 'name': 'Jim 4'}, metadata=_MetadataReturn(uuid=UUID('6055c796-d187-4b27-baf1-aeee5cf8a4d6'), vector=None, creation_time_unix=1694795364870, last_update_time_unix=1694795364870, distance=None, certainty=None, score=0.0, explain_score='', is_consistent=False, generative=None)),
 _Object(properties={'birth_year': 

Or like this:

All queries have these `flat` or `options` methods.
- `flat` - The parameters to be provided are "flat" - individual parameters
- `options` - The parameters to be provided are typed - like you see above with `wvc.GetOptions` and so on, and for returns.

What do you prefer?

Also, notice that you get typed objects back!

The returned objects have `properties` and `metadata`. Explore them and see what you find.

**Task**: Can you construct a `nearText` query (with `options` and with `flat`)
- for "the dark side"
- with a certainty of 0.75
- and get the distance

Hint:
If you don't see `NearTextOptions` in `weaviate.weaviate_classes` - you can get it from `weaviate.collection.classes.grpc` this will be available later.

In [11]:
import weaviate.collection.classes.filters as wv_filters

response = articles.query.near_text(
    query="The dark side",
    certainty=0.75,
    return_properties=["title", "body"],
    return_metadata=wvc.MetadataQuery(uuid=True),
    filters=wvc.Filter(
        path=""
    )
)

print(response)

AssertionError: 

**Task**: Can you construct a `bm25` query (with `options` and with `flat`)
- with a query for `galaxy`

In [None]:
response.objects[0]

In [None]:
print(response)

In [None]:
response.objects