# Early user testing for the Weaviate 'collections' client

Welcome!

We have been working on a new (and hopefully improved) API for our Python client. We are excited for you to try it out and provide feedback for us.

## Feedback

If you have feedback about any part of this, we tell us to know. No answers or impressions are wrong.

Also, you will see sections marked **Please provide feedback from this section**. These are areas in which we are particularly interested.

## Installation

We recommend you create a **new environment** (whether venv, or Conda/Mamba) for this, in a new project directory.

#### How to create a virtual environment with `venv`

Go to your project directory and run:

```shell
python -m venv .venv
```

(Depending on your setup, `python` might need to be `python3`)

Then activate it with:

```shell
source .venv/bin/activate
```

### Client installation

Once you are in your desired environment:

```shell
pip install -U "git+https://github.com/weaviate/weaviate-python-client.git@pydantic_experiment#egg=weaviate-client[GRPC]"
```
#### Start Weaviate

You will also need a pre-release version of Weaviate.

Save the below to `docker-compose.yml` in the project directory. Then, go to the directory and run `docker compose up -d`.

```yaml
---
version: '3.4'
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: semitechnologies/weaviate:preview-automatically-return-all-props-metadata-for-refs-07fce6f
    restart: on-failure:0
    ports:
     - "8080:8080"
     - "50051:50051"
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      QUERY_MAXIMUM_RESULTS: 10000
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-openai'
      ENABLE_MODULES: 'text2vec-contextionary,text2vec-openai,text2vec-cohere,text2vec-huggingface,text2vec-palm,generative-openai'
      CLUSTER_HOSTNAME: 'node1'
      CONTEXTIONARY_URL: contextionary:9999
      AUTOSCHEMA_ENABLED: 'false'
  contextionary:
    environment:
      OCCURRENCE_WEIGHT_LINEAR_FACTOR: 0.75
      EXTENSIONS_STORAGE_MODE: weaviate
      EXTENSIONS_STORAGE_ORIGIN: http://weaviate:8080
      NEIGHBOR_OCCURRENCE_IGNORE_PERCENTILE: 5
      ENABLE_COMPOUND_SPLITTING: 'false'
    image: semitechnologies/contextionary:en0.16.0-v1.2.1
    ports:
    - 9999:9999
...
```

You can check that the containers are up and available by running `docker ps` from the shell. This should show two Docker containers running - like:

```shell
CONTAINER ID   IMAGE                                                                                        COMMAND                  CREATED          STATUS         PORTS                                              NAMES
4be7efdff229   semitechnologies/contextionary:en0.16.0-v1.2.1                                               "/contextionary-serv…"   10 seconds ago   Up 9 seconds   0.0.0.0:9999->9999/tcp                             try_new_wv_api_202306-contextionary-1
e90a63fe1e15   semitechnologies/weaviate:preview-automatically-return-all-props-metadata-for-refs-07fce6f   "/bin/weaviate --hos…"   10 seconds ago   Up 9 seconds   0.0.0.0:8080->8080/tcp, 0.0.0.0:50055->50051/tcp   try_new_wv_api_202306-weaviate-1
```

The `STATUS` should include `Up` and closely match the `CREATED` time.

#### Troubleshooting

If your gRPC port is not open, try remapping it to another port (it can be anything). For example, you can change it from 50051 to 50055 by editing:

```yaml
    ports:
     - "8080:8080"
     - "50051:50051"
```

To

```yaml
    ports:
     - "8080:8080"
     - "50055:50051"
```

Note that this will require you to edit the gRPC port in the client for `grpc_port_experimental` below.

## Try out the client

We're good to go! Fire up your preferred way to edit / run Python code (Jupyter, VSCode, PyCharm, vim, whatever) and follow along.

### Key ideas

We are calling this the `collections` client, because many of the data interactions will be at the collections (currently called `Class` in Weaviate) level.

From here, we will call Weaviate `classes`

This client also includes custom Python classes to provide assistance for building collection definitions, objects, performing searches, and so on.

You can import them individually, like so:

```
from weaviate.weaviate_classes import Property, ConfigFactory, VectorizerFactory, DataObject
```

But you can import the whole set of classes like this.

```
import weaviate.weaviate_classes as wvc
```

This will let you use them as required like `wvc.Property(...)` and so on.

### Instantiation

Run the below to connect to Weaviate. Note that you need a `grpc` port specified. It should return `True`.

In [1]:
# user_test.py
import weaviate
from weaviate import Config
import weaviate.weaviate_classes as wvc
import os

client = weaviate.Client(
    "http://localhost:8080",
    additional_config=Config(grpc_port_experimental=50051),
    # ⬇️ Optional, if you want to try it with an inference API:
    additional_headers={
        "X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"],  # Replace with your key
    },
)

print(client.is_ready())

True


Run the below delete any existing classes, or in case you are re-running the same code:

In [2]:
# user_test.py
for collection_name in ["TestArticle", "TestAuthor"]:
    client.collection.delete(collection_name)


### Collection creation

#### Existing API

The existing API uses a raw dictionary / JSON object to create classes, like so:

```python
# Old API example
collection_definition = {
    "class": "Article",
    "vectorizer": "text2vec-openai",
    "vectorIndexConfig": {
        "distance": "cosine"
    },
    "moduleConfig": {
        "generative-openai": {}
    },
    "properties": [
        {
            "name": "title",
            "dataType": ["text"]
        },
        {
            "name": "chunk_no",
            "dataType": ["int"]
        },
        {
            "name": "url",
            "dataType": ["text"],
            "tokenization": "field",
            "moduleConfig": {
                "text2vec-openai": {
                    "skip": True
                }
            }
        }
    ]
}

client.schema.create_class(collection_definition)
```

Now, collection creation looks like this

```python
# New API example
articles = client.collection.create(
    name="TestArticle",
    properties=[
        wvc.Property(
            name="title",
            data_type=wvc.DataType.TEXT,
        ),
    ],
    vectorizer_config=wvc.VectorizerFactory.text2vec_openai(),
    replication_config=wvc.ConfigFactory.replication(factor=1),
)
```


It uses Weaviate-specific classes to define properties (`wvc.Property`), specify vectorizers (`wvc.ConfigFactory`), and complex configuration options (`wvc.ConfigFactory`) and so on.

The below is a partially complete collection definition. Please see if you can edit the below based on the comments:

In [3]:
# Edit the following to create collection definitions
articles = client.collection.create(
    name="TestArticle",
    properties=[
        wvc.Property(
            name="title",
            data_type=wvc.DataType.TEXT,
        ),
        # Try creating a new property 'body', with the text datatype
        # Try creating a new property 'url', with the text datatype and field tokenization
    ],
    vectorizer_config=wvc.VectorizerFactory.text2vec_openai(),
    replication_config=wvc.ConfigFactory.replication(factor=1),
    # Try adding an inverted index config with property length
)

authors = client.collection.create(
    name="TestAuthor",
    properties=[
        wvc.Property(
            name="name",
            data_type=wvc.DataType.TEXT,
        ),
        # Try creating a new property 'birth_year', with the int datatype
        # Try creating a new cross-reference 'wroteArticle', linking to `TestArticle` collection
    ],
    # Add a Contextionary vectorizer
)

Here is one that we created as an example. Did you end up with the same?

In [4]:
# user_test.py
for collection_name in ["TestArticle", "TestAuthor"]:
    client.collection.delete(collection_name)

articles = client.collection.create(
    name="TestArticle",
    properties=[
        wvc.Property(
            name="title",
            data_type=wvc.DataType.TEXT,
        ),
        wvc.Property(
            name="body",
            data_type=wvc.DataType.TEXT,
        ),
        wvc.Property(
            name="url",
            data_type=wvc.DataType.TEXT,
            tokenization=wvc.Tokenization.FIELD,
            vectorizer_config=wvc.PropertyVectorizerConfig()
        ),
    ],
    vectorizer_config=wvc.VectorizerFactory.text2vec_openai(),
    replication_config=wvc.ConfigFactory.replication(factor=1),
    inverted_index_config=wvc.ConfigFactory.inverted_index(
        index_property_length=True
    )
)

authors = client.collection.create(
    name="TestAuthor",
    properties=[
        wvc.Property(
            name="name",
            data_type=wvc.DataType.TEXT,
        ),
        wvc.Property(
            name="birth_year",
            data_type=wvc.DataType.INT,
        ),
        wvc.ReferenceProperty(name="wroteArticle", target_collection="TestArticle")
    ],
    vectorizer_config=wvc.VectorizerFactory.text2vec_openai(),
)

print(client.collection.exists("TestAuthor"))
print(client.collection.exists("TestArticle"))

True
True


You can also create an object from existing collections in Weaviate like this:

In [5]:
# user_test.py
articles = client.collection.get("TestArticle")
authors = client.collection.get("TestAuthor")

#### Collection creation feedback

Please score this on a scale of 1-5, where 1 is bad, 5 is good.

- Was it an improvement (5), similar (3), or worse (1) than the existing API?
    - Score?
    - Any reasons?
- How intuitive was the syntax to use? (Very intuitive: 5, unintuitive: 1)
    - Score?
    - Any reasons?
- Are you happpy with this section in general? (Very happy: 5, not happy: 1)
    - Score?
    - Any reasons?
- Any other notes?


### Collection methods

You should now see code autocomplete suggestions for the `articles` / `authors` objects. Two key submodules are:

-  `data`: CRUD operations
-  `query`: Search operations (old GraphQL, now gRPC)

### CRUD operations

You can add objects with the `insert` method.

Run this to add an object to the `articles` collection:

In [6]:
# user_test.py
my_first_obj = {
    "title": "Something something dark side",
    "body": "A long long time ago, in a galaxy far, far away...",
    "url": "http://www.starwars.com"
}

article_uuid = articles.data.insert(my_first_obj)
print(article_uuid)

3434c361-bcaf-4e20-abde-ef99682b7794


And run this to add an object to the `authors` collection:

In [7]:
# user_test.py
author_uuid = authors.data.insert(
    {
        "name": "G Lucas",
        "birth_year": 1944,
        "wroteArticle": wvc.ReferenceFactory.to(uuids=[article_uuid])
    }
)

print(author_uuid)

f3645284-c647-4549-a11d-5ac148c215c3



#### Add objects (batch)

You can also add multiple objects at once. The new syntax allows you to pass a list of (wvc.DataObject) objects.

In [8]:
# user_test.py
articles_to_add = [
    wvc.DataObject(
        properties={
            "title": f"The best restaurants of {1980+i}:",
            "body": "1. McDonald's, 2. ...",
            "url": "ss"
        },
    )
    for i in range(5)
]

response = articles.data.insert_many(articles_to_add)

The `response` object contains the UUIDs of the created objects, and more.

In [9]:
print(response)

_BatchReturn(all_responses=[UUID('5d668e0d-cf22-4606-94d6-2a06d0a585d7'), UUID('32a096f3-c0d9-40e5-a9b8-1a35ebbae9dd'), UUID('27b6a284-1042-40a6-ac67-63fd42d86dcb'), UUID('f13ee5bb-2a59-499f-846c-f0bd7d6019a1'), UUID('c3442b58-5ab9-4c5d-925a-c472e631741a')], uuids={0: UUID('5d668e0d-cf22-4606-94d6-2a06d0a585d7'), 1: UUID('32a096f3-c0d9-40e5-a9b8-1a35ebbae9dd'), 2: UUID('27b6a284-1042-40a6-ac67-63fd42d86dcb'), 3: UUID('f13ee5bb-2a59-499f-846c-f0bd7d6019a1'), 4: UUID('c3442b58-5ab9-4c5d-925a-c472e631741a')}, errors={}, has_errors=False)


This is how we add some articles to 'authors', with a cross-reference to `articles`:

In [10]:
# user_test.py
authors_to_add = [
    wvc.DataObject(
        properties={
            "name": f"Jim {i+1}",
            "birth_year": 1970 + i,
            "wroteArticle": wvc.ReferenceFactory.to(uuids=[article_uuid])
        },
        # vector=CUSTOM_VECTOR_HERE,  # To add custom vectors
        # uuid=CUSTOM_UUID_HERE  # To specify custom UUIDs
    )
    for i in range(5)
]

response = authors.data.insert_many(authors_to_add)
print(response)

_BatchReturn(all_responses=[UUID('7e4a2713-cd7c-4e36-b696-216e791163d2'), UUID('8e8c21c3-523c-4aae-bca0-00f28736b71f'), UUID('a1c0c5ac-452b-4bba-a10c-64c66d9fafbc'), UUID('29874f36-85ba-42d7-809d-16da05e6eade'), UUID('c4d45299-c8b2-4e33-9f32-23e55c060ab0')], uuids={0: UUID('7e4a2713-cd7c-4e36-b696-216e791163d2'), 1: UUID('8e8c21c3-523c-4aae-bca0-00f28736b71f'), 2: UUID('a1c0c5ac-452b-4bba-a10c-64c66d9fafbc'), 3: UUID('29874f36-85ba-42d7-809d-16da05e6eade'), 4: UUID('c4d45299-c8b2-4e33-9f32-23e55c060ab0')}, errors={}, has_errors=False)


#### Errors

The client will now automatically capture errors. Try this example, where the `url` property is erroneously provided a numerical value for one of the inputs:

In [11]:
articles_to_add = [
    wvc.DataObject(
        properties={
            "title": f"The best restaurants of {1980+i}:",
            "body": "1. McDonald's, 2. ...",
            "url": str(i) if i != 2 else i
        },
    )
    for i in range(5)
]

response = articles.data.insert_many(articles_to_add)

Inspecting the `errors` attribute allows you to see the index of the object, and the error message!

In [12]:
print(response.errors)

{2: Error(message="invalid text property 'url' on class 'TestArticle': not a string, but float64", code=None, original_uuid=None)}


#### CRUD operations feedback

Please score this on a scale of 1-5, where 1 is bad, 5 is good.

- Was it an improvement (5), similar (3), or worse (1) than the existing API?
    - Score?
    - Any reasons?
- How intuitive was the syntax to use? (Very intuitive: 5, unintuitive: 1)
    - Score?
    - Any reasons?
- Are you happpy with this section in general? (Very happy: 5, not happy: 1)
    - Score?
    - Any reasons?
- Any other notes?

### Queries

Now you have data on which you can try out queries!

You can now get objects like this:

In [13]:
response = articles.query.get(limit=2)

print(response)

_QueryReturn(objects=[_Object(properties={'title': 'The best restaurants of 1982:', 'body': "1. McDonald's, 2. ...", 'url': 'ss'}, metadata=_MetadataReturn(uuid=UUID('27b6a284-1042-40a6-ac67-63fd42d86dcb'), vector=None, creation_time_unix=1694807802129, last_update_time_unix=1694807802129, distance=None, certainty=None, score=0.0, explain_score='', is_consistent=False, generative=None)), _Object(properties={'title': 'The best restaurants of 1981:', 'body': "1. McDonald's, 2. ...", 'url': 'ss'}, metadata=_MetadataReturn(uuid=UUID('32a096f3-c0d9-40e5-a9b8-1a35ebbae9dd'), vector=None, creation_time_unix=1694807802128, last_update_time_unix=1694807802128, distance=None, certainty=None, score=0.0, explain_score='', is_consistent=False, generative=None))])


Notice how you don't have to specify the collection name (provided in the object), and properties to retrieve.

You can also specify these if you wish.

In [14]:
response = articles.query.get(
    limit=2,
    return_properties=["title"],
    return_metadata=wvc.MetadataQuery(uuid=True)  # MetaDataQuery object is used to specify the metadata to be returned
)

print(response)

_QueryReturn(objects=[_Object(properties={'title': 'The best restaurants of 1982:'}, metadata=_MetadataReturn(uuid=UUID('27b6a284-1042-40a6-ac67-63fd42d86dcb'), vector=None, creation_time_unix=None, last_update_time_unix=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=False, generative=None)), _Object(properties={'title': 'The best restaurants of 1981:'}, metadata=_MetadataReturn(uuid=UUID('32a096f3-c0d9-40e5-a9b8-1a35ebbae9dd'), vector=None, creation_time_unix=None, last_update_time_unix=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=False, generative=None))])


#### Filters

You can add filters like so, with a `Filter` object:

In [15]:
response = authors.query.get(
  filters=wvc.Filter(path=["birth_year"]).greater_than_equal(1971)    # Filter object is used to specify the filter
)

for o in response.objects:
    print(o.properties["birth_year"])

1974.0
1971.0
1972.0
1973.0


**Suggestion**: Try constructing different filters!

#### Near text search

In [16]:
response = articles.query.near_text(
    query="The dark side",
    distance=0.2,
)

print(response)

_QueryReturn(objects=[_Object(properties={'title': 'Something something dark side', 'body': 'A long long time ago, in a galaxy far, far away...', 'url': 'http://www.starwars.com'}, metadata=_MetadataReturn(uuid=UUID('3434c361-bcaf-4e20-abde-ef99682b7794'), vector=None, creation_time_unix=1694807800262, last_update_time_unix=1694807800262, distance=0.17336618900299072, certainty=0.9133169054985046, score=0.0, explain_score='', is_consistent=False, generative=None)), _Object(properties={'title': 'The best restaurants of 1980:', 'body': "1. McDonald's, 2. ...", 'url': '0'}, metadata=_MetadataReturn(uuid=UUID('757fe958-11a7-4dfd-b742-b9893eb71f1b'), vector=None, creation_time_unix=1694807803529, last_update_time_unix=1694807803529, distance=0.2807278037071228, certainty=0.8596360683441162, score=0.0, explain_score='', is_consistent=False, generative=None)), _Object(properties={'title': 'The best restaurants of 1984:', 'body': "1. McDonald's, 2. ...", 'url': 'ss'}, metadata=_MetadataReturn(

In [17]:
for r in response.objects:
    print(r.metadata.distance)

0.17336618900299072
0.2807278037071228
0.28196239471435547
0.28233039379119873
0.28464192152023315
0.28512680530548096
0.28548455238342285
0.2856038808822632
0.2864052653312683
0.2873896360397339


**Suggestion**: Try different queries, like bm25 or hybrid.

#### Queries feedback

Please score this on a scale of 1-5, where 1 is bad, 5 is good.

- Was it an improvement (5), similar (3), or worse (1) than the existing API?
    - Score?
    - Any reasons?
- How intuitive was the syntax to use? (Very intuitive: 5, unintuitive: 1)
    - Score?
    - Any reasons?
- Are you happpy with this section in general? (Very happy: 5, not happy: 1)
    - Score?
    - Any reasons?
- Any other notes?

### Retrieval augmented generation (RAG)

To be confirmed

#### RAG feedback

Please score this on a scale of 1-5, where 1 is bad, 5 is good.

- Was it an improvement (5), similar (3), or worse (1) than the existing API?
    - Score?
    - Any reasons?
- How intuitive was the syntax to use? (Very intuitive: 5, unintuitive: 1)
    - Score?
    - Any reasons?
- Are you happpy with this section in general? (Very happy: 5, not happy: 1)
    - Score?
    - Any reasons?
- Any other notes?