Commit ee1c26f
committed
Do not save sparse vectors for OSNeuralSparseDocV3GTE
Why these changes are being introduced:
Our initial pass with the embedding class OSNeuralSparseDocV3GTE was to save both
the sparse vector and the decoded token:weights. Each sparse vector was the length of the
model vocabulary, about 30k, with mostly zeros. While technically this could be used for
analysis beyond just the decoded token:weights given to OpenSearch, the data transfer
and storage overhead exceeds any known use cases at the moment.
How this addresses that need:
The OSNeuralSparseDocV3GTE embedding model is updated to not include the sparse vector
for the Embedding.embedding_vector property on output.
This can easily be turned on later, with an inline code comment showing how to toggle
that back on.
Side effects of this change:
* No sparse vectors are stored for now, storage is decreased.
Relevant ticket(s):
* None1 parent ed45be0 commit ee1c26f
File tree
3 files changed
+10
-7
lines changed- embeddings
- models
- tests
3 files changed
+10
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
51 | | - | |
52 | | - | |
| 51 | + | |
| 52 | + | |
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
247 | 247 | | |
248 | 248 | | |
249 | 249 | | |
250 | | - | |
251 | | - | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
252 | 255 | | |
253 | 256 | | |
254 | 257 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
217 | 217 | | |
218 | 218 | | |
219 | 219 | | |
220 | | - | |
| 220 | + | |
221 | 221 | | |
222 | 222 | | |
223 | 223 | | |
| |||
257 | 257 | | |
258 | 258 | | |
259 | 259 | | |
260 | | - | |
| 260 | + | |
261 | 261 | | |
262 | | - | |
| 262 | + | |
0 commit comments