Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proposal] Vectorizer select and exclude properties – in one place #4855

Open
1 task done
sebawita opened this issue May 6, 2024 · 0 comments
Open
1 task done

Comments

@sebawita
Copy link
Contributor

sebawita commented May 6, 2024

Describe your feature request

Hi team,

With the latest update to the named vectors, Weaviate added source_properties under the vectorizer configuration, which is GREAT πŸ˜„

Can we move the skip_vectorization from the collection property schema to the vectorizer config (just like this was done with source_properties)? We could call it exclude_properties.

This way, we could deprecate skip_vectorization from the property schema.

Scenarios

Here are a few scenarios (I will provide the examples in Python):

Default – use all relevant properties (existing syntax)

When source_properties or exclude_properties are not provided, Weaviate should use all relevant properties for vectorization.

client.collections.create(
    "Article",
    vectorizer_config=Configure.NamedVectors.text2vec_x(
        name="content",
    ),
    ...
)

Selective (existing syntax)

Use properties provided in source_properties for vectorization

client.collections.create(
    "Article",
    vectorizer_config=Configure.NamedVectors.text2vec_x(
        name="content",
        source_properties=["title", "description"]
    ),
    ...
)

Exclusive (the new syntax)

Use all relevant properties except for the ones provided in exclude_properties

Note. exclude should only take an array of strings, as we don't need to specify extra properties like: vectorize_property_name=True

client.collections.create(
    "Article",
    vectorizer_config=Configure.NamedVectors.text2vec_x(
        name="content",
        exclude_properties=["author_name", "url"]
    ),
    ...
)

Mixed Error

We shouldn't mix source_properties with exclude_properties.
The following should throw an error.

client.collections.create(
    "Article",
    vectorizer_config=Configure.NamedVectors.text2vec_x(
        name="content",
        source_properties=["title", "description"]    # throw an error
        exclude_properties=["author_name", "url"] # throw an error
    ),
    ...
)

Multiple Vectors

Each vectorizer can use a different set of property selectors

client.collections.create(
    "Article",
    vectorizer_config=[Configure.NamedVectors.text2vec_x(
            name="title",
            source_properties=["title"]
        ),
    vectorizer_config=[Configure.NamedVectors.text2vec_x(
            name="content",
            exclude_properties=["author_name", "url"]
        ),
    ]
    ...
)

Code of Conduct

@sebawita sebawita changed the title [proposal] Move vector property selection under the vectorizer – both include and exclude [proposal] Vectorizer select and exclude properties – in one place May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant