Skip to content

feat: Valkey Online Write Batch Vector Search Support#351

Merged
Manisha4 merged 7 commits intofeature/vector-storefrom
add/valkey-search-support-writes
Apr 6, 2026
Merged

feat: Valkey Online Write Batch Vector Search Support#351
Manisha4 merged 7 commits intofeature/vector-storefrom
add/valkey-search-support-writes

Conversation

@Manisha4
Copy link
Copy Markdown
Collaborator

What this PR does / why we need it:

                                                                                                                                                                                                                                                                                                                                                                                                                                 Adds vector embedding storage and indexing support to the Valkey online store, enabling vector similarity search capabilities in Feast (Phase 1 - Write Support).                                                                                                                                                                                                                                                                  

Changes

Implementation (eg_valkey.py):

  • Add vector field detection via field.vector_index property
  • Store vector fields with original field names + raw numpy bytes (required for Valkey Search FT.SEARCH)
  • Store non-vector fields with mmh3 hash + protobuf (existing behavior preserved)
  • Automatic vector index creation (FT.CREATE) on first write when vector fields are present
  • Support for FLAT and HNSW index algorithms with configurable parameters
  • Support for Float32 and Float64 vector types
  • Vector dimension validation against field.vector_length

Configuration (EGValkeyOnlineStoreConfig):

  • vector_index_algorithm: FLAT or HNSW (default: HNSW)
  • vector_index_hnsw_m: Max outgoing edges per node (default: 16)
  • vector_index_hnsw_ef_construction: Index build quality (default: 200)
  • vector_index_hnsw_ef_runtime: Search quality (default: 10)

Design Decisions

  • Vector support for regular FeatureViews only (not SortedFeatureView, consistent with Milvus/ES)
  • Single vector per FeatureView (matches current Feast API constraint)
  • Backward compatible: non-vector feature views work unchanged

feast_dtype: Feast data type (e.g., Array(Float32))

Returns:
Valkey vector type string: "FLOAT32" or "FLOAT64"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is float64 a supported vector type in valkey?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is supported

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document says, only float32 is supported. A'm I missing something?
https://valkey.io/topics/search-data-formats/

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I looked at data formats not relating to Search by mistake. Thanks!!

index_name = _get_vector_index_name(config.project, table.name)

# Check if index exists
try:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If two processes call online_write_batch concurrently for the same feature view, both could see the index as non-existent and both attempt FT.CREATE, with one failing.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will catch the exception like the other DBs


if feature_name in vector_field_names:
# Vector field: deserialize from raw bytes
field = next(f for f in feature_view.features if f.name == feature_name)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is O(n) field lookup per vector feature per entity. Can we try a dict lookup?

schema_fields.append(VectorField(field_name, algorithm, attributes))

# Define index on HASH keys with specific prefix
key_prefix = _redis_key_prefix(table.join_keys)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_redis_key_prefix(table.join_keys) produces a prefix based only on the join key names (e.g., ["item_id"]), without including the project. But _redis_key() builds the actual HASH keys as
<serialized_entity_key>, with the project appended as a suffix.

This means the IndexDefinition prefix will match HASH keys from all projects that share the same join key names. If two projects (e.g., prod and staging) both have a feature view with join_keys=["item_id"], the vector index created for prod will also index staging's keys, and vice versa.
Im not sure how the solution would look like for this. Maybe use a filter expression or something when building the search to scope the results to correct project?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I added a TODO comment for now but will address this while implementing the Search part.

Copy link
Copy Markdown
Collaborator

@vanitabhagwat vanitabhagwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to clean up the vector indexes somewhere. Maybe in delete_table and call it in teardown like we do for hashes.

if len(_tables) == 1:
pipe.delete(_k)
else:
pipe.hdel(_k, *valkey_hash_keys)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are using the original field names for vector fields, *pipe.hdel(_k, valkey_hash_keys) wont delete the vector fields. I
n case of vector fields, you need to have the original vector names added to valkey_hash_keys.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, made the fix!

def _get_vector_index_name(project: str, feature_view_name: str) -> str:
"""Generate Valkey Search index name for vector fields."""
return f"{project}_{feature_view_name}_vidx"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the feature name into the index name? Looking ahead at multi-vector we are going to run into issues with just project + fv name.

Copy link
Copy Markdown
Collaborator

@piket piket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Manisha4 Manisha4 merged commit bd19925 into feature/vector-store Apr 6, 2026
28 of 29 checks passed
@Manisha4 Manisha4 deleted the add/valkey-search-support-writes branch April 6, 2026 23:34
vanitabhagwat pushed a commit that referenced this pull request Apr 9, 2026
* Adding support for Valkey Search, adding changes to the online_write_batch functionality

* Addressing PR comments

* addressing linting error

* fix tests

* addressing PR comments

* addressing PR comments

* fixing linting

---------

Co-authored-by: Manisha4 <Manisha4@github.com>
Manisha4 added a commit that referenced this pull request Apr 9, 2026
* feat: Valkey Online Write Batch Vector Search Support (#351)

* Adding support for Valkey Search, adding changes to the online_write_batch functionality

* Addressing PR comments

* addressing linting error

* fix tests

* addressing PR comments

* addressing PR comments

* fixing linting

---------

Co-authored-by: Manisha4 <Manisha4@github.com>

* feat: Support Vector Search in Valkey (#354)

* Adding support for Valkey Search, adding changes to the online_write_batch functionality

* Addressing PR comments

* addressing linting error

* Adding changes to support search in valkey

* fix tests

* adding unit tests

* reformatting files and adding checks and more tests

* reformatting files and adding checks and more tests

* reformatting files and adding checks and more tests

* Fix linter errors: type annotations and code formatting

- Add explicit type annotation for schema_fields to support both TagField and VectorField
- Encode project string to bytes for consistency with other hash values
- Decode doc_key bytes to string for hmget compatibility
- Fix code formatting: break long lines and remove extra blank lines
- Remove tests for multiple vector fields (Feast enforces one vector per feature view)
- Fix config type: use 'eg-valkey' (hyphen) not 'eg_valkey' (underscore)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* addressing PR comments

* addressing PR comments

* fixing linting

* Fix missing feature_name argument in retrieve_online_documents_v2

Add the third argument (vector_field.name) to _get_vector_index_name
call to match the updated function signature.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* addressing comments, PR changes for some fixes and merge conflicts

* fixing tests

* fixing tests

* fixing linting

* fixing linting

---------

Co-authored-by: Manisha4 <Manisha4@github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix: Valkey vector search - remove unsupported SORTBY (#356)

* fix: Valkey vector search - remove unsupported SORTBY and fix tag filter syntax

Valkey Search KNN queries return results pre-sorted by distance, so
explicit SORTBY is not supported and causes a ResponseError. This removes
the .sort_by() call from the query builder.

Additionally, fixes the project tag filter to use unquoted syntax with
backslash escaping for special characters (e.g. hyphens, dots) instead
of the quoted syntax which was returning empty results.

Updates unit tests to reflect both changes: replaces three metric-specific
sort order tests with a single test asserting no SORTBY is set, and
updates escaping assertions to match the new backslash-escape approach.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: apply ruff format to eg_valkey.py and test_valkey.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Manisha4 <Manisha4@github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Manisha4 <Manisha4@github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants