Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added voyager in backend #1846

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
263 changes: 263 additions & 0 deletions docarray/index/backends/voyager.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,263 @@
from dataclasses import dataclass, field
from typing import Any, Dict, Generic, List, Sequence, Tuple, Type, TypeVar, cast

Check warning on line 2 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L1-L2

Added lines #L1 - L2 were not covered by tests

import numpy as np
from voyager import BaseDoc, BaseDocIndex, DocList, VoyagerBaseDoc

Check warning on line 5 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L4-L5

Added lines #L4 - L5 were not covered by tests

from docarray.utils.find import _FindResult, _FindResultBatched

Check warning on line 7 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L7

Added line #L7 was not covered by tests

TSchema = TypeVar('TSchema', bound=VoyagerBaseDoc)
T = TypeVar('T', bound='VoyagerIndex')

Check warning on line 10 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L9-L10

Added lines #L9 - L10 were not covered by tests


@dataclass
class DBConfig(BaseDocIndex.DBConfig):
default_column_config: Dict[Type, Dict[str, Any]] = field(default_factory=dict)

Check warning on line 15 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L13-L15

Added lines #L13 - L15 were not covered by tests


@dataclass
class RuntimeConfig(BaseDocIndex.RuntimeConfig):
pass

Check warning on line 20 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L18-L20

Added lines #L18 - L20 were not covered by tests


class VoyagerIndex(BaseDocIndex, Generic[TSchema]):
def __init__(self, db_config=None, **kwargs):
super().__init__(db_config=db_config, **kwargs)

Check warning on line 25 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L23-L25

Added lines #L23 - L25 were not covered by tests

if not self._db_config or not self._db_config.existing_table:
self._create_docs_table()

Check warning on line 28 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L27-L28

Added lines #L27 - L28 were not covered by tests

self._setup_backend()

Check warning on line 30 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L30

Added line #L30 was not covered by tests

def _create_docs_table(self):
columns: List[Tuple[str, str]] = []
for col, info in self._column_infos.items():
if (

Check warning on line 35 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L32-L35

Added lines #L32 - L35 were not covered by tests
col == 'id'
or '__' in col
or not info.db_type
or info.db_type == np.ndarray
):
continue
columns.append((col, info.db_type))

Check warning on line 42 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L41-L42

Added lines #L41 - L42 were not covered by tests

columns_str = ', '.join(f'{name} {type}' for name, type in columns)
if columns_str:
columns_str = ', ' + columns_str

Check warning on line 46 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L44-L46

Added lines #L44 - L46 were not covered by tests

query = f'CREATE TABLE IF NOT EXISTS docs (doc_id INTEGER PRIMARY KEY, data BLOB{columns_str})'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this make that column data will not be filterable at all?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SQL query provided creates a table named docs with a data column defined as BLOB (Binary Large Object). The BLOB data type is generally used for storing binary data, and it does not inherently support filtering or indexing.

If the intention is to make the data column filterable, I might want to consider using a different data type based on the nature of the data being stored. For example, if the data column contains text-based information, changing the data type to TEXT could be more appropriate.

Here's an example modification to the query:

if columns_str:
    columns_str = ', ' + columns_str

query = f'CREATE TABLE IF NOT EXISTS docs (doc_id INTEGER PRIMARY KEY, data TEXT{columns_str})'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what is done in HNSWDocumentIndex I believe

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes like something similar

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good indeed to have it

self._sqlite_cursor.execute(query)

Check warning on line 49 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L48-L49

Added lines #L48 - L49 were not covered by tests

def _index(self, column_to_data: Dict[str, Any]):

Check warning on line 51 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L51

Added line #L51 was not covered by tests
# Implement the indexing logic here
# Example: Assume a simple case where you have a database table and you want to insert a new row
self._insert_row_into_database(column_to_data)

Check warning on line 54 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L54

Added line #L54 was not covered by tests

def _filter_by_parent_id(self, parent_id: str):

Check warning on line 56 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L56

Added line #L56 was not covered by tests
# Implement the filter logic here
# Example: Assume a simple case where you want to query rows in the database based on parent_id
return self._query_rows_from_database_by_parent_id(parent_id)

Check warning on line 59 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L59

Added line #L59 was not covered by tests

@property
def index_name(self):
return self._db_config.work_dir

Check warning on line 63 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L61-L63

Added lines #L61 - L63 were not covered by tests

def _insert_row_into_database(self, column_to_data: Dict[str, Any]):

Check warning on line 65 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L65

Added line #L65 was not covered by tests
# Placeholder logic: Insert a new row into the database
# Adapt this according to your actual database backend
print("Inserting row into the database:", column_to_data)

Check warning on line 68 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L68

Added line #L68 was not covered by tests

def _query_rows_from_database_by_parent_id(self, parent_id: str):

Check warning on line 70 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L70

Added line #L70 was not covered by tests
# Placeholder logic: Query rows from the database based on parent_id
# Adapt this according to your actual database backend
print("Querying rows from the database by parent_id:", parent_id)
return []

Check warning on line 74 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L73-L74

Added lines #L73 - L74 were not covered by tests

def add_documents(self, documents: DocList):
vectors = [self.get_vector(doc) for doc in documents]
self.add_items(vectors)
self._num_docs += len(documents)

Check warning on line 79 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L76-L79

Added lines #L76 - L79 were not covered by tests

def build(self):
self.build_index()

Check warning on line 82 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L81-L82

Added lines #L81 - L82 were not covered by tests

def build_query(self, query: Dict):
return VoyagerQueryBuilder(self, query)

Check warning on line 85 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L84-L85

Added lines #L84 - L85 were not covered by tests

def execute_query(self, query: List[Tuple[str, Dict]], *args, **kwargs) -> Any:
if args or kwargs:
raise ValueError(

Check warning on line 89 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L87-L89

Added lines #L87 - L89 were not covered by tests
f'args and kwargs not supported for `execute_query` on {type(self)}'
)

if isinstance(query, list):
return self._execute_voyager_native_query(query)

Check warning on line 94 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L93-L94

Added lines #L93 - L94 were not covered by tests

return self._execute_voyager_query_builder(query)

Check warning on line 96 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L96

Added line #L96 was not covered by tests

def _find_batched(

Check warning on line 98 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L98

Added line #L98 was not covered by tests
self,
queries: np.ndarray,
limit: int,
search_field: str = '',
) -> '_FindResultBatched':
ids, distances = self._query_voyager(

Check warning on line 104 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L104

Added line #L104 was not covered by tests
queries, k=limit, search_field=search_field
)
documents = [self.get_item(id_) for id_ in ids]
distances_np = np.array(distances)

Check warning on line 108 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L107-L108

Added lines #L107 - L108 were not covered by tests

return _FindResultBatched(documents, distances_np.tolist())

Check warning on line 110 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L110

Added line #L110 was not covered by tests

def _find(

Check warning on line 112 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L112

Added line #L112 was not covered by tests
self, query: np.ndarray, limit: int, search_field: str = ''
) -> '_FindResult':
query_batched = np.expand_dims(query, axis=0)
docs, scores = self._find_batched(

Check warning on line 116 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L115-L116

Added lines #L115 - L116 were not covered by tests
queries=query_batched, limit=limit, search_field=search_field
)
return _FindResult(

Check warning on line 119 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L119

Added line #L119 was not covered by tests
documents=docs[0], scores=NdArray._docarray_from_native(scores[0])
)

def _query_voyager(

Check warning on line 123 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L123

Added line #L123 was not covered by tests
self,
queries: np.ndarray,
k: int,
search_field: str = '',
) -> Tuple[List[str], List[float]]:
result = self.query(queries, k=k, search_field=search_field)

Check warning on line 129 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L129

Added line #L129 was not covered by tests

# Extracting ids and distances from the result
ids = [doc['id'] for doc in result]
distances = [doc['distance'] for doc in result]

Check warning on line 133 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L132-L133

Added lines #L132 - L133 were not covered by tests

return ids, distances

Check warning on line 135 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L135

Added line #L135 was not covered by tests


class VoyagerQueryBuilder(BaseDocIndex.QueryBuilder):
def __init__(self, document_index, query):
super().__init__(document_index)
self.query = query

Check warning on line 141 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L138-L141

Added lines #L138 - L141 were not covered by tests

def _find_batched(

Check warning on line 143 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L143

Added line #L143 was not covered by tests
self,
queries: np.ndarray,
limit: int,
search_field: str = '',
) -> _FindResultBatched:
ids, distances = self._query_voyager(

Check warning on line 149 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L149

Added line #L149 was not covered by tests
queries, k=limit, search_field=search_field
)

documents = [self.get_item(id_) for id_ in ids]

Check warning on line 153 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L153

Added line #L153 was not covered by tests

# Explicitly specify the type of distances to List[float]
distances_list = distances.tolist() # Assuming distances is a numpy array

Check warning on line 156 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L156

Added line #L156 was not covered by tests

return _FindResultBatched(documents, distances_list)

Check warning on line 158 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L158

Added line #L158 was not covered by tests

def _find(

Check warning on line 160 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L160

Added line #L160 was not covered by tests
self, query: np.ndarray, limit: int, search_field: str = ''
) -> _FindResult:
query_batched = np.expand_dims(query, axis=0)
batched_result = self._find_batched(

Check warning on line 164 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L163-L164

Added lines #L163 - L164 were not covered by tests
queries=query_batched, limit=limit, search_field=search_field
)

# Assuming scores are available in batched_result
scores = batched_result.scores

Check warning on line 169 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L169

Added line #L169 was not covered by tests

return self._FindResult(documents=batched_result.documents, scores=scores)

Check warning on line 171 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L171

Added line #L171 was not covered by tests

def _filter(

Check warning on line 173 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L173

Added line #L173 was not covered by tests
self,
filter_query: Any,
limit: int,
) -> DocList:
result = self.execute_query(filter_query)

Check warning on line 178 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L178

Added line #L178 was not covered by tests

ids = [doc['id'] for doc in result]
embeddings = [doc['embedding'] for doc in result]

Check warning on line 181 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L180-L181

Added lines #L180 - L181 were not covered by tests

docs = DocList.__class_getitem__(cast(Type[BaseDoc], self.out_schema))()
for id_, embedding in zip(ids, embeddings):
doc = self._doc_from_bytes(embedding) # You need to implement this method
doc.id = id_
docs.append(doc)

Check warning on line 187 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L183-L187

Added lines #L183 - L187 were not covered by tests

return docs

Check warning on line 189 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L189

Added line #L189 was not covered by tests

def _filter_batched(

Check warning on line 191 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L191

Added line #L191 was not covered by tests
self,
filter_queries: Any,
limit: int,
) -> List[DocList]:
# You can implement batched filtering logic here
# For example, execute each filter query separately and combine the results
raise NotImplementedError(

Check warning on line 198 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L198

Added line #L198 was not covered by tests
f'{type(self)} does not support filter-only batched queries.'
f' To perform post-filtering on a query, use'
f' `build_query()` and `execute_query()`.'
)

def _text_search(

Check warning on line 204 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L204

Added line #L204 was not covered by tests
self,
query: str,
limit: int,
search_field: str = '',
) -> _FindResult:
result = self.execute_query({'text_search': query, 'limit': limit})

Check warning on line 210 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L210

Added line #L210 was not covered by tests

ids = [doc['id'] for doc in result]
embeddings = [doc['embedding'] for doc in result]

Check warning on line 213 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L212-L213

Added lines #L212 - L213 were not covered by tests

docs = DocList.__class_getitem__(cast(Type[BaseDoc], self.out_schema))()
for id_, embedding in zip(ids, embeddings):
doc = self._doc_from_bytes(embedding) # You need to implement this method
doc.id = id_
docs.append(doc)

Check warning on line 219 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L215-L219

Added lines #L215 - L219 were not covered by tests

return _FindResult(

Check warning on line 221 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L221

Added line #L221 was not covered by tests
documents=docs,
scores=[1.0] * len(docs), # You may adjust the scores as needed
)

def _text_search_batched(

Check warning on line 226 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L226

Added line #L226 was not covered by tests
self,
queries: Sequence[str],
limit: int,
search_field: str = '',
) -> _FindResultBatched:
# You can implement batched text search logic here
# For example, execute each text search query separately and combine the results
raise NotImplementedError(

Check warning on line 234 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L234

Added line #L234 was not covered by tests
f'{type(self)} does not support text search batched queries.'
)


class NdArray:
darshi1337 marked this conversation as resolved.
Show resolved Hide resolved
@staticmethod
def _docarray_from_native(data):

Check warning on line 241 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L239-L241

Added lines #L239 - L241 were not covered by tests
"""
Convert a NumPy array to a document array.

:param data: NumPy array
:return: Document array
"""
# Placeholder logic: Implement the actual conversion logic based on your requirements
# For example, you can create a list of dictionaries where each dictionary represents a document
# and contains key-value pairs corresponding to the document's fields and values.

doc_array = []
for row in data:

Check warning on line 253 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L252-L253

Added lines #L252 - L253 were not covered by tests
# Assuming row is a NumPy array representing a document
# Modify this based on the structure of your data
doc = {

Check warning on line 256 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L256

Added line #L256 was not covered by tests
'field1': row[0],
'field2': row[1],
# Add more fields as needed
}
doc_array.append(doc)

Check warning on line 261 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L261

Added line #L261 was not covered by tests

return doc_array

Check warning on line 263 in docarray/index/backends/voyager.py

View check run for this annotation

Codecov / codecov/patch

docarray/index/backends/voyager.py#L263

Added line #L263 was not covered by tests
Loading