Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure on data ingestion into "qdrant" using "text-embedding-ada-002" embedding #354

Open
avibathula opened this issue May 28, 2024 · 1 comment

Comments

@avibathula
Copy link

avibathula commented May 28, 2024

Describe the bug
Failure on data ingestion into qdrant using text-embedding-ada-002 embedding

BadRequestError: Error code: 400 - {'error': {'message': 'This model does not support specifying dimensions.', 'type': 'invalid_request_error',
'param': None, 'code': None}}

The issue seems to be in OpenAIEmbeddingProvider.get_embedding method in r2r/embeddings/openai/openai_base.py which is always passing in the dimensions while as per https://platform.openai.com/docs/api-reference/embeddings/create

dimensions integer Optional

  • The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.

So - for "text-embedding-ada-002" embedding type, the code shouldn't send the dimensions value.

To Reproduce
Steps to reproduce the behavior:
Use a config of

{
"app": {
"max_logs": 100,
"max_file_size_in_mb": 50
},
"completions": {
"provider": "openai"
},
"embedding": {
"provider": "openai",
"search_model": "text-embedding-ada-002",
"search_dimension": 1536,
"batch_size": 128,
"text_splitter": {
"type": "recursive_character",
"chunk_size": 512,
"chunk_overlap": 20
},
"rerank_model": "None"
},
"eval": {
"provider": "local",
"llm": {
"model": "gpt-4o",
"provider": "openai"
},
"sampling_fraction": 1.0
},
"ingestion": {
"selected_parsers": {
"csv": "default",
"docx": "default",
"html": "default",
"json": "default",
"md": "default",
"pdf": "default",
"pptx": "default",
"txt": "default",
"xlsx": "default",
"gif": "default",
"png": "default",
"jpg": "default",
"jpeg": "default",
"svg": "default"
}
},
"logging": {
"provider": "local",
"log_table": "logs",
"log_info_table": "log_info"
},
"prompt": {
"provider": "local"
},
"vector_database": {
"provider": "qdrant",
"collection_name": "blahblahblah"
}
}

and ingest any data files.

Expected behavior
Data files vectorized and uploaded to qdrant

Additional context
I installed r2r package and programmatically provided a list of files and called r2r.aingest_files for the issue to hit.

@emrgnt-cmplxty
Copy link
Contributor

Thank you - I will investigate this issue and report back our findings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants