You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Failure on data ingestion into "qdrant" using "text-embedding-3-small" embedding with 512 dimension size, which I took from sample in docs https://r2r-docs.sciphi.ai/deep-dive/config
Raw response content:
b'{"status":{"error":"Wrong input: Vector dimension error: expected dim: 1536, got 512"},"time":0.00937252}' - 2024-05-27 20:37:35,296
r2r.pipes.vector_storage_pipe - ERROR - Failed to store vector entries in the database: Unexpected Response: 400 (Bad Request)
Raw response content:
b'{"status":{"error":"Wrong input: Vector dimension error: expected dim: 1536, got 512"},"time":0.008870695}' - 2024-05-27 20:37:35,678
r2r.pipes.vector_storage_pipe - ERROR - Failed to store vector entries in the database: Unexpected Response: 400 (Bad Request)
and the same keep getting printed on console repeatedly.
I initially tried using "text-embedding-ada-002" which requires 1536 sized embeddings and it failed for a different reason: see #354
And I changed the text embedding to "text-embedding-3-small" with embedding size 512, but kept using the same collection name. As the collection was created with an expectation of 1536 sized embeddings, it was failing.
Describe the bug
Failure on data ingestion into "qdrant" using "text-embedding-3-small" embedding with 512 dimension size, which I took from sample in docs https://r2r-docs.sciphi.ai/deep-dive/config
Raw response content:
b'{"status":{"error":"Wrong input: Vector dimension error: expected dim: 1536, got 512"},"time":0.00937252}' - 2024-05-27 20:37:35,296
r2r.pipes.vector_storage_pipe - ERROR - Failed to store vector entries in the database: Unexpected Response: 400 (Bad Request)
Raw response content:
b'{"status":{"error":"Wrong input: Vector dimension error: expected dim: 1536, got 512"},"time":0.008870695}' - 2024-05-27 20:37:35,678
r2r.pipes.vector_storage_pipe - ERROR - Failed to store vector entries in the database: Unexpected Response: 400 (Bad Request)
and the same keep getting printed on console repeatedly.
To Reproduce
Use a config of
{
"app": {
"max_logs": 100,
"max_file_size_in_mb": 50
},
"completions": {
"provider": "openai"
},
"embedding": {
"provider": "openai",
"search_model": "text-embedding-3-small",
"search_dimension": 512,
"batch_size": 128,
"text_splitter": {
"type": "recursive_character",
"chunk_size": 512,
"chunk_overlap": 20
},
"rerank_model": "None"
},
"eval": {
"provider": "local",
"llm": {
"model": "gpt-4o",
"provider": "openai"
},
"sampling_fraction": 1.0
},
"ingestion": {
"selected_parsers": {
"csv": "default",
"docx": "default",
"html": "default",
"json": "default",
"md": "default",
"pdf": "default",
"pptx": "default",
"txt": "default",
"xlsx": "default",
"gif": "default",
"png": "default",
"jpg": "default",
"jpeg": "default",
"svg": "default"
}
},
"logging": {
"provider": "local",
"log_table": "logs",
"log_info_table": "log_info"
},
"prompt": {
"provider": "local"
},
"vector_database": {
"provider": "qdrant",
"collection_name": "blahblahblah"
}
}
I even tried removing the whole dictionary of "embedding" - but I was getting same errors as the above values I was using were the defaults.
Expected behavior
Data files vectorized and uploaded to qdrant
Additional context
I installed
r2r
package and programmatically provided a list of files and calledr2r.aingest_files
for the issue to hit.The text was updated successfully, but these errors were encountered: