In [1]:
# user_test.py
import weaviate
from weaviate import Config
import weaviate.weaviate_classes as wvc
import os

client = weaviate.Client(
    "http://localhost:8080",
    additional_config=Config(grpc_port_experimental=50051),
    # ⬇️ Optional, if you want to try it with an inference API:
    additional_headers={
        "X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"],  # Replace with your key
    },
)

print(client.is_ready())

True


In [1]:
# Early user testing for the Weaviate 'collections' client

Welcome!

We have been working on a new (and hopefully improved) API for our Python client. We are excited for you to try it out and provide feedback for us.

## Feedback

If you have feedback about any part of this, we tell us to know. No answers or impressions are wrong.

Also, you will see sections marked **Please provide feedback from this section**. These are areas in which we are particularly interested.

## Installation

We recommend you create a **new environment** (whether venv, or Conda/Mamba) for this, in a new project directory.

#### How to create a virtual environment with `venv`

Go to your project directory and run:

```shell
python -m venv .venv
```

(Depending on your setup, `python` might need to be `python3`)

Then activate it with:

```shell
source .venv/bin/activate
```

### Client installation

Once you are in your desired environment:

```shell
pip install -U "git+https://github.com/weaviate/weaviate-python-client.git@pydantic_experiment#egg=weaviate-client[GRPC]"
```
#### Start Weaviate

You will also need a pre-release version of Weaviate. You can build your own with the Weaviate `image` from `https://github.com/weaviate/weaviate-python-client/blob/pydantic_experiment/ci/docker-compose.yml`, or use the provided example below.

Save it as `docker-compose.yml` it in the project directory. Then, go to the directory and run `docker compose up -d`.

```yaml
---
version: '3.4'
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: semitechnologies/weaviate:preview-fix-issues-with-references-and-toclass-autodetect-23ab716
    restart: on-failure:0
    ports:
     - "8080:8080"
     - "50051:50051"
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      QUERY_MAXIMUM_RESULTS: 10000
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-openai'
      ENABLE_MODULES: 'text2vec-contextionary,text2vec-openai,text2vec-cohere,text2vec-huggingface,text2vec-palm,generative-openai,generative-cohere'
      CLUSTER_HOSTNAME: 'node1'
      CONTEXTIONARY_URL: contextionary:9999
      AUTOSCHEMA_ENABLED: 'false'
  contextionary:
    environment:
      OCCURRENCE_WEIGHT_LINEAR_FACTOR: 0.75
      EXTENSIONS_STORAGE_MODE: weaviate
      EXTENSIONS_STORAGE_ORIGIN: http://weaviate:8080
      NEIGHBOR_OCCURRENCE_IGNORE_PERCENTILE: 5
      ENABLE_COMPOUND_SPLITTING: 'false'
    image: semitechnologies/contextionary:en0.16.0-v1.2.1
    ports:
    - 9999:9999
...
```

You can check that the containers are up and available by running `docker ps` from the shell. This should show two Docker containers running - like:

```shell
CONTAINER ID   IMAGE                                                                                        COMMAND                  CREATED          STATUS         PORTS                                              NAMES
4be7efdff229   semitechnologies/contextionary:en0.16.0-v1.2.1                                               "/contextionary-serv…"   10 seconds ago   Up 9 seconds   0.0.0.0:9999->9999/tcp                             try_new_wv_api_202306-contextionary-1
e90a63fe1e15   semitechnologies/weaviate:preview-automatically-return-all-props-metadata-for-refs-07fce6f   "/bin/weaviate --hos…"   10 seconds ago   Up 9 seconds   0.0.0.0:8080->8080/tcp, 0.0.0.0:50055->50051/tcp   try_new_wv_api_202306-weaviate-1
```

The `STATUS` should include `Up` and closely match the `CREATED` time.

#### Troubleshooting

If you get a gRPC related error, your gRPC port may not be open. Please try remapping it to another port (it can be anything). For example, you can change it from 50051 to 50055 by editing:

```yaml
    ports:
     - "8080:8080"
     - "50051:50051"
```

To

```yaml
    ports:
     - "8080:8080"
     - "50055:50051"
```

Note that this will require you to edit the gRPC port in the client for `grpc_port_experimental` below.

## Try out the client

We're good to go! Fire up your preferred way to edit / run Python code (Jupyter, VSCode, PyCharm, vim, whatever) and follow along.

### Key ideas

We are calling this the `collections` client, because many of the data interactions will be at the collections (currently called `Class` in Weaviate) level.

From here, we will call Weaviate `classes`

This client also includes custom Python classes to provide assistance for building collection definitions, objects, performing searches, and so on.

You can import them individually, like so:

```
from weaviate.weaviate_classes import Property, ConfigFactory, VectorizerFactory, DataObject
```

But you can import the whole set of classes like this.

```
import weaviate.weaviate_classes as wvc
```

This will let you use them as required like `wvc.Property(...)` and so on.

### Instantiation

Run the below to connect to Weaviate. Note that you need a `grpc` port specified. It should return `True`.

True


(If you get an error, it may be gRPC port related. See the `TroubleShooting` note above)

Run the below delete any existing classes, or in case you are re-running the same code:

In [2]:
# user_test.py
for collection_name in ["TestArticle", "TestAuthor"]:
    client.collection.delete(collection_name)


### Collection creation

#### Existing API

The existing API uses a raw dictionary / JSON object to create classes, like so:

```python
# Old API example
collection_definition = {
    "class": "Article",
    "vectorizer": "text2vec-openai",
    "vectorIndexConfig": {
        "distance": "cosine"
    },
    "moduleConfig": {
        "generative-openai": {}
    },
    "properties": [
        {
            "name": "title",
            "dataType": ["text"]
        },
        {
            "name": "chunk_no",
            "dataType": ["int"]
        },
        {
            "name": "url",
            "dataType": ["text"],
            "tokenization": "field",
            "moduleConfig": {
                "text2vec-openai": {
                    "skip": True
                }
            }
        }
    ]
}

client.schema.create_class(collection_definition)
```

Now, collection creation looks like this

```python
# New API example
articles = client.collection.create(
    name="TestArticle",
    properties=[
        wvc.Property(
            name="title",
            data_type=wvc.DataType.TEXT,
        ),
    ],
    vectorizer_config=wvc.VectorizerFactory.text2vec_openai(),
    replication_config=wvc.ConfigFactory.replication(factor=1),
)
```


It uses Weaviate-specific classes to define properties (`wvc.Property`), specify vectorizers (`wvc.ConfigFactory`), and complex configuration options (`wvc.ConfigFactory`) and so on.

The below is a partially complete collection definition. Please see if you can edit the below based on the comments:

In [3]:
# Edit the following to create collection definitions
articles = client.collection.create(
    name="TestArticle",
    properties=[
        wvc.Property(
            name="title",
            data_type=wvc.DataType.TEXT,
        ),
        # Try creating a new property 'body', with the text datatype
        # Try creating a new property 'url', with the text datatype and field tokenization
    ],
    vectorizer_config=wvc.VectorizerFactory.text2vec_openai(),
    replication_config=wvc.ConfigFactory.replication(factor=1),
    # Try adding an inverted index config with property length
)

authors = client.collection.create(
    name="TestAuthor",
    properties=[
        wvc.Property(
            name="name",
            data_type=wvc.DataType.TEXT,
        ),
        # Try creating a new property 'birth_year', with the int datatype
        # Try creating a new cross-reference 'wroteArticle', linking to `TestArticle` collection
    ],
    # Add a Contextionary vectorizer
)

Here is one that we created as an example. Did you end up with the same?

In [4]:
# user_test.py
for collection_name in ["TestArticle", "TestAuthor"]:
    client.collection.delete(collection_name)

articles = client.collection.create(
    name="TestArticle",
    properties=[
        wvc.Property(
            name="title",
            data_type=wvc.DataType.TEXT,
        ),
        wvc.Property(
            name="body",
            data_type=wvc.DataType.TEXT,
        ),
        wvc.Property(
            name="url",
            data_type=wvc.DataType.TEXT,
            tokenization=wvc.Tokenization.FIELD,
        ),
    ],
    vectorizer_config=wvc.VectorizerFactory.text2vec_openai(),
    replication_config=wvc.ConfigFactory.replication(factor=1),
    inverted_index_config=wvc.ConfigFactory.inverted_index(
        index_property_length=True
    )
)

authors = client.collection.create(
    name="TestAuthor",
    properties=[
        wvc.Property(
            name="name",
            data_type=wvc.DataType.TEXT,
        ),
        wvc.Property(
            name="birth_year",
            data_type=wvc.DataType.INT,
        ),
        wvc.ReferenceProperty(name="wroteArticle", target_collection="TestArticle")
    ],
    vectorizer_config=wvc.VectorizerFactory.text2vec_openai(),
)

print(client.collection.exists("TestAuthor"))
print(client.collection.exists("TestArticle"))

True
True


You can also create an object from existing collections in Weaviate like this:

In [5]:
# user_test.py
articles = client.collection.get("TestArticle")
authors = client.collection.get("TestAuthor")

#### Collection creation feedback

Please score this on a scale of 1-5, where 1 is bad, 5 is good.

- Was it an improvement (5), similar (3), or worse (1) than the existing API?
    - Score?
    - Any reasons?
- How intuitive was the syntax to use? (Very intuitive: 5, unintuitive: 1)
    - Score?
    - Any reasons?
- Are you happpy with this section in general? (Very happy: 5, not happy: 1)
    - Score?
    - Any reasons?
- Any other notes?


### Collection methods

You should now see code autocomplete suggestions for the `articles` / `authors` objects. Two key submodules are:

-  `data`: CRUD operations
-  `query`: Search operations (old GraphQL, now gRPC)

### CRUD operations

You can add objects with the `insert` method.

Run this to add an object to the `articles` collection:

In [6]:
# user_test.py
my_first_obj = {
    "title": "Something something dark side",
    "body": "A long long time ago, in a galaxy far, far away...",
    "url": "http://www.starwars.com"
}

article_uuid = articles.data.insert(my_first_obj)
print(article_uuid)

2fba92f8-11d0-475d-9edb-c237c76b22e0


And run this to add an object to the `authors` collection:

In [7]:
# user_test.py
author_uuid = authors.data.insert(
    {
        "name": "G Lucas",
        "birth_year": 1944,
        "wroteArticle": wvc.ReferenceFactory.to(uuids=[article_uuid])
    }
)

print(author_uuid)

b16fc5b3-ad0f-4511-8af6-11325269a678



#### Add objects (batch)

You can also add multiple objects at once. The new syntax allows you to pass a list of (wvc.DataObject) objects.

In [8]:
# user_test.py
articles_to_add = [
    wvc.DataObject(
        properties={
            "title": f"The best restaurants of {1980+i}:",
            "body": "1. McDonald's, 2. ...",
            "url": "ss"
        },
    )
    for i in range(5)
]

response = articles.data.insert_many(articles_to_add)

The `response` object contains the UUIDs of the created objects, and more.

In [9]:
print(response)

_BatchReturn(all_responses=[UUID('67cb4b6b-9a58-4f57-9e1c-d4ef5dba9c88'), UUID('9fd33f35-19d5-47ce-b5f3-787e7813858e'), UUID('fb698c8d-7a46-4c40-bb58-78043d39e1e4'), UUID('fc2bfeb1-89e8-4e21-9a2b-73f0aa453f82'), UUID('2463ca4f-ed9f-4946-84e3-1dd56a1e90e3')], uuids={0: UUID('67cb4b6b-9a58-4f57-9e1c-d4ef5dba9c88'), 1: UUID('9fd33f35-19d5-47ce-b5f3-787e7813858e'), 2: UUID('fb698c8d-7a46-4c40-bb58-78043d39e1e4'), 3: UUID('fc2bfeb1-89e8-4e21-9a2b-73f0aa453f82'), 4: UUID('2463ca4f-ed9f-4946-84e3-1dd56a1e90e3')}, errors={}, has_errors=False)


This is how we add some articles to 'authors', with a cross-reference to `articles`:

In [10]:
# user_test.py
authors_to_add = [
    wvc.DataObject(
        properties={
            "name": f"Jim {i+1}",
            "birth_year": 1970 + i,
            "wroteArticle": wvc.ReferenceFactory.to(uuids=[article_uuid])
        },
        # vector=CUSTOM_VECTOR_HERE,  # To add custom vectors
        # uuid=CUSTOM_UUID_HERE  # To specify custom UUIDs
    )
    for i in range(5)
]

response = authors.data.insert_many(authors_to_add)
print(response)

_BatchReturn(all_responses=[UUID('6626a324-037a-4715-b57f-c5995b13990d'), UUID('02ed5311-86ec-4995-9efb-5af3810c33e6'), UUID('b3e1a61b-ba49-40f7-a10c-77e0ea249b8f'), UUID('8e5e96d6-01e8-4799-aa63-77b518e833b1'), UUID('541857a9-d851-41f7-8eb2-9f4b4a9a03bd')], uuids={0: UUID('6626a324-037a-4715-b57f-c5995b13990d'), 1: UUID('02ed5311-86ec-4995-9efb-5af3810c33e6'), 2: UUID('b3e1a61b-ba49-40f7-a10c-77e0ea249b8f'), 3: UUID('8e5e96d6-01e8-4799-aa63-77b518e833b1'), 4: UUID('541857a9-d851-41f7-8eb2-9f4b4a9a03bd')}, errors={}, has_errors=False)


#### Errors

The client will now automatically capture errors. Try this example, where the `url` property is erroneously provided a numerical value for one of the inputs:

In [11]:
articles_to_add = [
    wvc.DataObject(
        properties={
            "title": f"The best restaurants of {1980+i}:",
            "body": "1. McDonald's, 2. ...",
            "url": str(i) if i != 2 else i
        },
    )
    for i in range(5)
]

response = articles.data.insert_many(articles_to_add)

Inspecting the `errors` attribute allows you to see the index of the object, and the error message!

In [12]:
print(response.errors)

{2: Error(message="invalid text property 'url' on class 'TestArticle': not a string, but float64", code=None, original_uuid=None)}


#### CRUD operations feedback

Please score this on a scale of 1-5, where 1 is bad, 5 is good.

- Was it an improvement (5), similar (3), or worse (1) than the existing API?
    - Score?
    - Any reasons?
- How intuitive was the syntax to use? (Very intuitive: 5, unintuitive: 1)
    - Score?
    - Any reasons?
- Are you happpy with this section in general? (Very happy: 5, not happy: 1)
    - Score?
    - Any reasons?
- Any other notes?

### Queries

Now you have data on which you can try out queries!

You can now fetch objects like this:

In [13]:
response = articles.query.fetch_objects(limit=2)

print(response)

_QueryReturn(objects=[_Object(properties={'title': 'The best restaurants of 1984:', 'body': "1. McDonald's, 2. ...", 'url': 'ss'}, metadata=_MetadataReturn(uuid=UUID('2463ca4f-ed9f-4946-84e3-1dd56a1e90e3'), vector=None, creation_time_unix=1695153836800, last_update_time_unix=1695153836800, distance=None, certainty=None, score=0.0, explain_score='', is_consistent=False, generative=None)), _Object(properties={'title': 'Something something dark side', 'body': 'A long long time ago, in a galaxy far, far away...', 'url': 'http://www.starwars.com'}, metadata=_MetadataReturn(uuid=UUID('2fba92f8-11d0-475d-9edb-c237c76b22e0'), vector=None, creation_time_unix=1695153834643, last_update_time_unix=1695153834643, distance=None, certainty=None, score=0.0, explain_score='', is_consistent=False, generative=None))])


Notice how you don't have to specify the collection name (provided in the object), and properties to retrieve.

You can also specify these if you wish.

In [14]:
response = articles.query.fetch_objects(
    limit=2,
    return_properties=["title"],
    return_metadata=wvc.MetadataQuery(uuid=True)  # MetaDataQuery object is used to specify the metadata to be returned
)

print(response)

_QueryReturn(objects=[_Object(properties={'title': 'The best restaurants of 1984:'}, metadata=_MetadataReturn(uuid=UUID('2463ca4f-ed9f-4946-84e3-1dd56a1e90e3'), vector=None, creation_time_unix=None, last_update_time_unix=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=False, generative=None)), _Object(properties={'title': 'Something something dark side'}, metadata=_MetadataReturn(uuid=UUID('2fba92f8-11d0-475d-9edb-c237c76b22e0'), vector=None, creation_time_unix=None, last_update_time_unix=None, distance=None, certainty=None, score=None, explain_score=None, is_consistent=False, generative=None))])


#### Near text search

In [15]:
response = articles.query.near_text(
    query="The dark side",
    limit=2,
)

print(response)

_QueryReturn(objects=[_Object(properties={'title': 'Something something dark side', 'body': 'A long long time ago, in a galaxy far, far away...', 'url': 'http://www.starwars.com'}, metadata=_MetadataReturn(uuid=UUID('2fba92f8-11d0-475d-9edb-c237c76b22e0'), vector=None, creation_time_unix=1695153834643, last_update_time_unix=1695153834643, distance=0.1809372901916504, certainty=0.9095313549041748, score=0.0, explain_score='', is_consistent=False, generative=None)), _Object(properties={'title': 'The best restaurants of 1980:', 'body': "1. McDonald's, 2. ...", 'url': '0'}, metadata=_MetadataReturn(uuid=UUID('6be33725-4b43-4eea-bbf6-2e92598d3c5d'), vector=None, creation_time_unix=1695153837688, last_update_time_unix=1695153837688, distance=0.27976858615875244, certainty=0.8601157069206238, score=0.0, explain_score='', is_consistent=False, generative=None))])


**Suggestion**: Try different queries, like bm25 or hybrid.

#### Filters

You can add filters like so, with a `Filter` object:

In [16]:
response = authors.query.fetch_objects(
  filters=wvc.Filter(path=["birth_year"]).greater_or_equal(1971)    # Filter object is used to specify the filter
)

for o in response.objects:
    print(o.properties["birth_year"])

1974.0
1971.0
1972.0
1973.0


**Suggestion**: Try constructing different filters!

#### Queries feedback

Please score this on a scale of 1-5, where 1 is bad, 5 is good.

- Was it an improvement (5), similar (3), or worse (1) than the existing API?
    - Score?
    - Any reasons?
- How intuitive was the syntax to use? (Very intuitive: 5, unintuitive: 1)
    - Score?
    - Any reasons?
- Are you happpy with this section in general? (Very happy: 5, not happy: 1)
    - Score?
    - Any reasons?
- Any other notes?

### Retrieval augmented generation (RAG)

Let's load some data and try RAG fun:

In [17]:
import weaviate_datasets as wd

dataset = wd.JeopardyQuestions1k()  # <-- Comes with pre-vectorized data
dataset.upload_dataset(client, 300)

            Please instead use the `client.batch.configure()` method to configure your batch and `client.batch` to enter the context manager.
            See https://weaviate.io/developers/weaviate/client-libraries/python for details.
1000it [00:03, 318.71it/s]


True

RAG functionality is now available from within each search, through an additional argument:

In [26]:
from weaviate.collection.classes.grpc import Generate  # <-- This class will be available from `wvc` shortly

questions = client.collection.get("JeopardyQuestion")

Grouped task, for example like this:

In [24]:
response = questions.query.near_text(
    query="Moon landing",
    limit=3,
    generate=Generate(grouped_task="Write a haiku from these facts")
)

print(response)

_GenerativeReturn(objects=[_Object(properties={'points': 200.0, 'air_date': '2001-09-21T00:00:00Z', 'answer': 'Venus', 'question': "In March 1966 the USSR's Venera 3 became the first space probe to physically touch another planet, this one", 'round': 'Double Jeopardy!'}, metadata=_MetadataReturn(uuid=UUID('33e88b70-19f7-5cb6-9f61-6c6925905afd'), vector=None, creation_time_unix=1695153841446, last_update_time_unix=1695153841939, distance=0.156119704246521, certainty=0.9219401478767395, score=0.0, explain_score='', is_consistent=False, generative=None)), _Object(properties={'points': 800.0, 'answer': 'Gemini', 'air_date': '2009-10-15T00:00:00Z', 'round': 'Jeopardy!', 'question': '<a href="http://www.j-archive.com/media/2009-10-15_J_09.jpg" target="_blank">Buzz Aldrin & Jim Lovell</a> do look like <a href="http://www.j-archive.com/media/2009-10-15_J_09a.jpg" target="_blank">twins</a> as they prepare for a mission in this 1960s program'}, metadata=_MetadataReturn(uuid=UUID('ac922ee4-f036-5

The generated text is now available like this:

In [25]:
print(response.generated)

Venera touched Venus,
First probe to reach another,
Planet's surface kissed.

Gemini twins prepare,
Buzz and Jim, ready for flight,
1960s mission.

Sojourner Truth's name,
Mars rover explores unknown,
Her legacy lives.


And for single promps:

In [27]:
response = questions.query.near_text(
    query="European history",
    limit=2,
    generate=Generate(single_prompt="Re-write this in Japanese: {question}")
)

print(response)

_GenerativeReturn(objects=[_Object(properties={'points': 1000.0, 'answer': 'Poland', 'air_date': '1990-04-17T00:00:00Z', 'question': 'A 1795 partition ended its existence as a separate state in E. Europe; in 1918 it was back as a republic', 'round': 'Double Jeopardy!'}, metadata=_MetadataReturn(uuid=UUID('bb545f85-48c9-5bdf-8c58-0cb3d6f5767f'), vector=None, creation_time_unix=1695153842367, last_update_time_unix=1695153842603, distance=0.17593902349472046, certainty=0.9120304584503174, score=0.0, explain_score='', is_consistent=False, generative='1795年の分割により、東ヨーロッパの別々の国家としての存在は終わりを迎えました。1918年には共和国として復活しました。')), _Object(properties={'points': 200.0, 'answer': 'Slavs', 'air_date': '1985-11-19T00:00:00Z', 'question': 'Word “slavery” comes from these eastern Europeans who were often enslaved by conquerors', 'round': 'Jeopardy!'}, metadata=_MetadataReturn(uuid=UUID('4d50646c-5089-5142-a471-f49ba11a8ce7'), vector=None, creation_time_unix=1695153842367, last_update_time_unix=1695153842604, dis

Try navigating through each result here:

In [19]:
print(response.objects[0].metadata.generative)
print(response.objects[0].metadata.generative)

'1795年の分割により、東ヨーロッパの別々の国家としての存在は終わりを迎えました。1918年には共和国として復活しました。'

#### RAG feedback

Please score this on a scale of 1-5, where 1 is bad, 5 is good.

- Was it an improvement (5), similar (3), or worse (1) than the existing API?
    - Score?
    - Any reasons?
- How intuitive was the syntax to use? (Very intuitive: 5, unintuitive: 1)
    - Score?
    - Any reasons?
- Are you happpy with this section in general? (Very happy: 5, not happy: 1)
    - Score?
    - Any reasons?
- Any other notes?