## Early user testing for the Weaviate 'collections' client

Welcome!

We have been working on a new (and hopefully improved) API for our Python client. We are excited for you to try it out and provide feedback for us.

### Installation

This version of the client is on a different branch in GH.

If you don't want this to affect your current workflow - we recommend you create a **new environment** (whether venv, or Conda/Mamba). If you don't, make sure to uninstall it after and install the official release.

In your desired environment, install it with:

```shell
pip install -U "git+https://github.com/weaviate/weaviate-python-client.git@pydantic_experiment#egg=weaviate-client[GRPC]"
```

Use this docker-compose file:

```yaml
---
version: '3.4'
services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: semitechnologies/weaviate:preview-error-without-module-c10476a
    restart: on-failure:0
    ports:
     - "8080:8080"
     - "50051:50051"
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      QUERY_MAXIMUM_RESULTS: 10000
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-openai'
      ENABLE_MODULES: 'text2vec-openai,text2vec-cohere,generative-openai'
      CLUSTER_HOSTNAME: 'node1'
...
```

And spin up a container with `docker compose up -d`

### Key ideas

We are calling this the *'collections'* client, because many of the data interactions will be at the collections (i.e. Weaviate *'Class'*) level. So, instantiate the client and then instantiate a collection like this:

In [1]:
from weaviate import Config
import weaviate
import os

client = weaviate.Client(
    "http://localhost:8080",
    additional_config=Config(grpc_port_experimental=50051),
    additional_headers={
        "X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"],
        "X-Cohere-Api-Key": os.environ["COHERE_APIKEY"],
    },
)

### Help is here!

We've created objects to help with lots of things here.

You'll notice below that class definitions are done through the `CollectionConfig` class, configurations in `Text2VecOpenAIConfig`, and so on.

You can import them individually, like so:

```
from weaviate.weaviate_classes import CollectionConfig, Vectorizer, VectorDistance
```

I (JP) personally import the set of classes like this:

```
import weaviate.weaviate_classes as wvc
```

To prep, delete any existing classes with the same name like so:

In [2]:
for n in ["TestArticle", "TestAuthor"]:
    if client.schema.exists(n):
        client.schema.delete_class(n)

### Class creation

In [3]:
import weaviate.weaviate_classes as wvc

articles = client.collection.create(
    name="TestArticle",
    properties=[
        wvc.Property(
            name="title",
            data_type=wvc.DataType.TEXT,
        ),
        # Get the user to create these properties
        wvc.Property(
            name="body",
            data_type=wvc.DataType.TEXT,
        ),
        wvc.Property(
            name="url",
            data_type=wvc.DataType.TEXT,
            tokenization=wvc.Tokenization.FIELD,
            vectorizer_config=wvc.PropertyVectorizerConfig()
        ),
        # =====
    ],
    vectorizer_config=wvc.VectorizerFactory.text2vec_openai(),
    # vectorizer_config=wvc.VectorizerFactory.text2vec_cohere(
    #     model="embed-multilingual-v2.0"
    # ),
    replication_config=wvc.ConfigFactory.replication(factor=1),
    # Get the user to create this config
    inverted_index_config=wvc.ConfigFactory.inverted_index(
        index_property_length=True
    )
    # =====
)

authors = client.collection.create(
    name="TestAuthor",
    properties=[
        wvc.Property(
            name="name",
            data_type=wvc.DataType.TEXT,
        ),
        # Get the user to create this
        wvc.Property(
            name="birth_year",
            data_type=wvc.DataType.INT,
        ),
        # Get the user to create this
        wvc.ReferenceProperty(name="wroteArticle", target_collection="TestArticle")
    ],
    # Get the user to add this vectorizer
    # vectorizer_config=wvc.VectorizerFactory.text2vec_openai(),
    vectorizer_config=wvc.VectorizerFactory.text2vec_cohere(
        model="embed-multilingual-v2.0"
    ),
)

In your IDE, you should now see IntelliSense autocompletes through the `articles` / `authors` objects - the two key subsets are:

-  `data`: CRUD operations
-  `query`: Search operations (old GraphQL)

### CRUD operations

#### Add objects (single)

Adding objects is done with the `insert` method.

The pattern can be very similar to what you've done before. (You can specify a UUID as you have done before!)

In [4]:
uuid = articles.data.insert(
    {
        "title": "Something something dark side",
        "body": "A long long time ago, in a galaxy far, far away...",
        "url": "http://www.starwars.com"
    }
)

The returend object is a UUID type

In [5]:
print(type(uuid))
print(uuid)

<class 'uuid.UUID'>
7b99bdc8-e19b-4a0a-8bf0-23dd335b5fb5


Notice that you didn't have to specify a class, because you're working with a class

#### Add objects (batch)

In [6]:
authors_to_add = [
    wvc.DataObject(
        properties={
            "name": f"Jim {i+1}",
            "birth_year": 1970 + i,
            "wroteArticle": wvc.ReferenceFactory.to(uuids=[uuid])
        },
        # vector=[0.05] * 100
    )
    for i in range(10)
]

authors.data.insert_many(authors_to_add)

AssertionError: 

### Queries

In [None]:
authors_to_add = [
    wvc.DataObject(
        properties={
            "name": f"Jim {i+1}",
            "birth_year": 1970 + i,
            "wroteArticle": wvc.ReferenceFactory.to(uuids=[uuid])
        },
        # vector=[0.05] * 100
    )
    for i in range(10)
]

authors.data.insert_many(authors_to_add)

Get objects like this:

In [None]:
response = articles.query.get(
    limit=2,
    return_properties=["title", "body"],
    return_metadata=wvc.MetadataQuery(uuid=True)
)

print(response)

In [None]:
response = authors.query.get(
    limit=2,
    return_properties=["name", "birth_year", wvc.LinkTo(link_on="wroteArticle", return_properties="title")],
    return_metadata=wvc.MetadataQuery(uuid=True)
)

print(response)

Or like this:

All queries have these `flat` or `options` methods.
- `flat` - The parameters to be provided are "flat" - individual parameters
- `options` - The parameters to be provided are typed - like you see above with `wvc.GetOptions` and so on, and for returns.

What do you prefer?

Also, notice that you get typed objects back!

The returned objects have `properties` and `metadata`. Explore them and see what you find.

**Task**: Can you construct a `nearText` query (with `options` and with `flat`)
- for "the dark side"
- with a certainty of 0.75
- and get the distance

Hint:
If you don't see `NearTextOptions` in `weaviate.weaviate_classes` - you can get it from `weaviate.collection.classes.grpc` this will be available later.

In [None]:
import weaviate.collection.classes.filters as wv_filters

response = articles.query.near_text(
    query="The dark side",
    certainty=0.75,
    return_properties=["title", "body"],
    return_metadata=wvc.MetadataQuery(uuid=True),
)

print(response)

**Task**: Can you construct a `bm25` query (with `options` and with `flat`)
- with a query for `galaxy`

In [None]:
response = articles.query.bm25(
    query="galaxy",
    return_properties=["title", "body"],
    return_metadata=wvc.MetadataQuery(uuid=True)
)

print(response)

In [None]:
response = authors.query.generative(
    return_properties=["name", "birth_year"],
    return_metadata=wvc.MetadataQuery(uuid=True),
    prompt_combined_results="Turn this into a haiku"
)

In [None]:
response.objects[0]

In [None]:
print(response)

In [None]:
response.objects