## Early user testing for the Weaviate 'collections' client

Welcome!

We have been working on a new (and hopefully improved) API for our Python client. We are excited for you to try it out and provide feedback for us.

### Installation

This version of the client is on a different branch in GH.

If you don't want this to affect your current workflow - we recommend you create a **new environment** (whether venv, or Conda/Mamba). If you don't, make sure to uninstall it after and install the official release.

In your desired environment, install it with:

```shell
pip install -U "git+https://github.com/weaviate/weaviate-python-client.git@pydantic_experiment#egg=weaviate-client[GRPC]"
```

Use this docker-compose file:

```yaml
---
version: '3.4'
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: semitechnologies/weaviate:preview-error-without-module-c10476a
    restart: on-failure:0
    ports:
     - "8080:8080"
     - "50051:50051"
    environment:
      OPENAI_APIKEY: $OPENAI_APIKEY
      COHERE_APIKEY: $COHERE_APIKEY
      QUERY_DEFAULTS_LIMIT: 25
      QUERY_MAXIMUM_RESULTS: 10000
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-openai'
      ENABLE_MODULES: 'text2vec-openai,text2vec-cohere,generative-openai'
      CLUSTER_HOSTNAME: 'node1'
...
```

### Key ideas

We are calling this the *'collections'* client, because many of the data interactions will be at the collections (i.e. Weaviate *'Class'*) level. So, instantiate the client and then instantiate a collection like this:

In [1]:
from weaviate import Config
import weaviate
import os

client = weaviate.Client(
    "http://localhost:8080",
    additional_config=Config(grpc_port_experimental=50051),
    additional_headers={
        "X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"],
        "X-Cohere-Api-Key": os.environ["COHERE_APIKEY"],
    },
)

### Help is here!

We've created objects to help with lots of things here.

You'll notice below that class definitions are done through the `CollectionConfig` class, configurations in `Text2VecOpenAIConfig`, and so on.

You can import them individually, like so:

```
from weaviate.weaviate_classes import CollectionConfig, Vectorizer, VectorDistance
```

I (JP) personally import the set of classes like this:

```
import weaviate.weaviate_classes as wvc
```

In [4]:
for n in ["TestArticle", "TestAuthor"]:
    if client.schema.exists(n):
        client.schema.delete_class(n)

### Class creation

In [5]:
import weaviate.weaviate_classes as wvc

articles = client.collection.create(
    config=wvc.CollectionConfig(
        name="TestArticle",
        properties=[
            wvc.Property(
                name="title",
                data_type=wvc.DataType.TEXT,
            ),
            wvc.Property(
                name="body",
                data_type=wvc.DataType.TEXT,
            ),
            wvc.Property(
                name="url",
                data_type=wvc.DataType.TEXT,
                tokenization=wvc.Tokenization.FIELD
            ),
        ],
        vectorizer_config=wvc.Text2VecOpenAIConfig(),
        inverted_index_config=wvc.InvertedIndexConfigCreate(
            index_property_length=True
        )
    )
)

authors = client.collection.create(
    config=wvc.CollectionConfig(
        name="TestAuthor",
        properties=[
            wvc.Property(
                name="name",
                data_type=wvc.DataType.TEXT,
            ),
            wvc.Property(
                name="birth_year",
                data_type=wvc.DataType.INT,
            ),
        ],
        vectorizer_config=wvc.Text2VecOpenAIConfig(),
    )
)

In your IDE, you should now see IntelliSense autocompletes through the `articles` / `authors` objects - the two key subsets are:

-  `data`: CRUD operations
-  `query`: Search operations (old GraphQL)

### CRUD operations

#### Add objects (single)

Adding objects is done with the `insert` method.

The pattern can be very similar to what you've done before. (You can specify a UUID as you have done before!)

In [6]:
uuid = articles.data.insert(
    {
        "title": "Something something dark side",
        "body": "A long long time ago, in a galaxy far, far away...",
        "url": "http://www.starwars.com"
    }
)

The returend object is a UUID type

In [5]:
print(type(uuid))
print(uuid)

<class 'uuid.UUID'>
b3c952d3-da97-4007-9760-87515087ab77


Notice that you didn't have to specify a class, because you're working with a class

#### Deleting objects

In [6]:
articles.data.delete(uuid)

True

In [7]:
articles.data.delete(uuid)

False

#### Add objects - with generics!

But you can also add objects using a data model - so that you get hints & validation.

In [8]:
from typing import TypedDict

class Article(TypedDict):
    title: str
    body: str
    url: str

uuid = articles.data.with_data_model(Article).insert(
    properties=Article(
        title='',
        body="",
        url=""
    ),
)
print(uuid)

04d0e5cc-65af-47a2-9e95-cfcae2f9701b


Hint: Try supplying data with an incorrect datatype and see what happens!

In [9]:
from typing import TypedDict

class Article(TypedDict):
    title: str
    body: str
    url: str

uuid = articles.data.with_data_model(Article).insert(
    properties=Article(
        "Something something dark side",
        "A long long time ago, in a galaxy far, far away...",
        "http://www.starwars.com"
    ),
)
print(uuid)

TypeError: dict expected at most 1 argument, got 3

#### Add objects (batch)

In [10]:
authors_to_add = [
    wvc.DataObject(
        properties={
            "name": f"Jim {i+1}",
            "birth_year": 1970 + i
        }
    )
    for i in range(10)
]

authors.data.insert_many(authors_to_add)

_BatchReturn(all_responses=[UUID('5b241bdf-a91d-4bf0-af55-1173412442ea'), UUID('e81d6b6c-ef55-43ac-b2a7-121e3db310f2'), UUID('be9df2d7-2185-4e55-90aa-6d61ac32d20a'), UUID('c101c7f6-f3f4-4a10-aebf-812efa4c732b'), UUID('233f2c21-1041-4dc3-a78f-59498f07e71d'), UUID('3630a59e-e125-4867-8aa9-9e7324a49e5c'), UUID('1202b7b3-a540-4752-9739-8497e2f19353'), UUID('de950aa3-7b34-45df-87ff-c88be38a32f9'), UUID('f1d6fbb0-8fa3-4f9b-b990-555a8e1ccfef'), UUID('9856a6e8-765e-467d-89d9-fd240fa8b06c')], uuids={0: UUID('5b241bdf-a91d-4bf0-af55-1173412442ea'), 1: UUID('e81d6b6c-ef55-43ac-b2a7-121e3db310f2'), 2: UUID('be9df2d7-2185-4e55-90aa-6d61ac32d20a'), 3: UUID('c101c7f6-f3f4-4a10-aebf-812efa4c732b'), 4: UUID('233f2c21-1041-4dc3-a78f-59498f07e71d'), 5: UUID('3630a59e-e125-4867-8aa9-9e7324a49e5c'), 6: UUID('1202b7b3-a540-4752-9739-8497e2f19353'), 7: UUID('de950aa3-7b34-45df-87ff-c88be38a32f9'), 8: UUID('f1d6fbb0-8fa3-4f9b-b990-555a8e1ccfef'), 9: UUID('9856a6e8-765e-467d-89d9-fd240fa8b06c')}, errors={}, ha

### Queries

Get objects like this:

In [11]:
response = articles.query.get_flat(
    limit=2,
    # return_properties=["title", "body"]
)

print(response)

[_Object(properties={'title': '', 'body': '', 'url': ''}, metadata=_MetadataReturn(uuid=UUID('04d0e5cc-65af-47a2-9e95-cfcae2f9701b'), vector=[-0.0006792086642235518, 0.00585294421762228, -0.002304322086274624, -0.026847125962376595, 0.006945088040083647, 0.02081497572362423, -0.02084202691912651, -0.007790400646626949, -0.015486125834286213, -0.018623925745487213, 0.008642475120723248, 0.02220805175602436, -0.02393925189971924, -0.0006677969358861446, 7.74517611716874e-05, 0.006346606649458408, 0.013396513648331165, -0.005538487806916237, 0.019462475553154945, -0.019949376583099365, -0.003012693952769041, 0.005978050176054239, -0.015986550599336624, 0.002213028259575367, -0.009670375846326351, -0.004720225464552641, 0.03362315148115158, -0.0376536026597023, 0.00772277545183897, 0.01186818815767765, 0.010495400987565517, -0.009940875694155693, -0.031161602586507797, -0.018880901858210564, 0.01136776339262724, 0.012192788533866405, -0.01503980066627264, -0.04355050250887871, 0.0171902757

Or like this:

In [23]:
response = articles.query.get_options(
    options=wvc.GetOptions(limit=1),
    # returns=["title", "body", "$uuid"]
    returns=wvc.ReturnValues(
        properties=["title", "body", "$uuid"],
        metadata=wvc.MetadataQuery(
            uuid=True,
        )
    ),
)

print(response)

[_Object(properties={'title': 'Something something dark side', 'body': 'A long long time ago, in a galaxy far, far away...'}, metadata=_MetadataReturn(uuid=UUID('4e6a5fb0-95d7-4873-b23f-eb73dbcb0048'), vector=None, creation_time_unix=None, last_update_time_unix=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=False))]


All queries have these `flat` or `options` methods.
- `flat` - The parameters to be provided are "flat" - individual parameters
- `options` - The parameters to be provided are typed - like you see above with `wvc.GetOptions` and so on, and for returns.

What do you prefer?

Also, notice that you get typed objects back!

The returned objects have `properties` and `metadata`. Explore them and see what you find.

**Task**: Can you construct a `nearText` query (with `options` and with `flat`)
- for "the dark side"
- with a certainty of 0.75
- and get the distance

Hint:
If you don't see `NearTextOptions` in `weaviate.weaviate_classes` - you can get it from `weaviate.collection.classes.grpc` this will be available later.

In [28]:
from weaviate.collection.classes.grpc import NearTextOptions

response = articles.query.near_text_options(
    query="the dark side",
    options=NearTextOptions(certainty=0.75),
    returns=wvc.ReturnValues(
        properties=["title", "body"],
        metadata=wvc.MetadataQuery(
            uuid=True
        )
    ),
)

print(response)

[_Object(properties={'title': 'Something something dark side', 'body': 'A long long time ago, in a galaxy far, far away...'}, metadata=_MetadataReturn(uuid=UUID('4e6a5fb0-95d7-4873-b23f-eb73dbcb0048'), vector=None, creation_time_unix=None, last_update_time_unix=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=False))]


In [32]:
response = articles.query.near_text_flat(
    query="The dark side",
    certainty=0.75,
    return_properties=["title", "body"],
    return_metadata=wvc.MetadataQuery(uuid=True)
)

print(response)

[_Object(properties={'title': 'Something something dark side', 'body': 'A long long time ago, in a galaxy far, far away...'}, metadata=_MetadataReturn(uuid=UUID('4e6a5fb0-95d7-4873-b23f-eb73dbcb0048'), vector=None, creation_time_unix=None, last_update_time_unix=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=False))]


**Task**: Can you construct a `bm25` query (with `options` and with `flat`)
- with a query for `galaxy`

In [34]:
response = articles.query.bm25_flat(
    query="galaxy",
    return_properties=["title", "body"],
    return_metadata=wvc.MetadataQuery(uuid=True)
)

print(response)

[_Object(properties={'title': 'Something something dark side', 'body': 'A long long time ago, in a galaxy far, far away...'}, metadata=_MetadataReturn(uuid=UUID('4e6a5fb0-95d7-4873-b23f-eb73dbcb0048'), vector=None, creation_time_unix=None, last_update_time_unix=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=False))]


In [35]:
articles.query.bm25_options(
    query="galaxy",
    returns=wvc.ReturnValues(
        properties=["title", "body"],
        metadata=wvc.MetadataQuery(uuid=True)
    )
)

[_Object(properties={'title': 'Something something dark side', 'body': 'A long long time ago, in a galaxy far, far away...'}, metadata=_MetadataReturn(uuid=UUID('4e6a5fb0-95d7-4873-b23f-eb73dbcb0048'), vector=None, creation_time_unix=None, last_update_time_unix=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=False))]

**Task**: Can you construct a `hybrid` query (with `options` and with `flat`)
- with a query for `galaxy`

In [37]:
response = articles.query.hybrid_options(
    query="galaxy",
    options=wvc.HybridOptions(
        alpha=0.25
    ),
    returns=wvc.ReturnValues(
        properties=["title", "body"],
        metadata=wvc.MetadataQuery(
            uuid=True
        )
    ),
)

print(response)

[_Object(properties={'title': 'Something something dark side', 'body': 'A long long time ago, in a galaxy far, far away...'}, metadata=_MetadataReturn(uuid=UUID('4e6a5fb0-95d7-4873-b23f-eb73dbcb0048'), vector=None, creation_time_unix=None, last_update_time_unix=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=False))]


In [None]:
response = articles.query.hybrid_flat(
    query="galaxy",
    return_properties=["title", "body"],
    return_metadata=wvc.MetadataQuery(uuid=True)
)

print(response)