New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Destination Weaviate: update vector: connection to: OpenAI API failed with status: 500 error #31991
Comments
Thanks for opening this issue @valebedu - Am I understanding correctly that the OpenAI vectorizer within Weaviate is erroring which is in turn breaking the sync? If yes, it looks like something Weaviate should ideally handle internally. Is it possible the specific text from your records is causing the error on OpenAI side? That being said we can add a retry mechanism on the Airbyte side as well to retry to load objects that failed in the current batch, I'm going to work on this. |
Hi @flash1293 , Yes that's right in my configuration Airbyte doesn't do the vectorizer, it simply sync data from source to destination. That's Weaviate job to vectorize content with OpenAI. It's not possible that a specific text cause the error because in that run the job fail after 175000 records. In a previous run it went up to 300000 records without any issue. It's just a random unavailable server error. A retry logic on the current batch of 128 elements could be really nice! |
Weaviate is already doing retries in case of timeout and similar, but not in case of a 500 error. I added this on the Airbyte side for all objects in a batch that didn't succeed (will retry twice): #32038 If the error on OpenAI side is transient, it should catch this problem |
Awesome! I'll ask Weaviate team why this case is not handled on Weaviate side |
@valebedu I just realized, I might have misunderstood the Weaviate code and I just configured the client incorrectly for automatic retries. Let me check again please. |
No problem, also note that this is not a critical issue as it works most of the time, I just need luck for the sync to finish successfully without 5XX error. Additionally, Weaviate released 1.22 yesterday with async vector indexing. That will allow non blocking sync and faster sync I hope, I'll try to sync with this version. |
That sounds great, thanks. It's not enabled by default, but the weaviate client already has retries on all errors built in. I switched the PR linked above to enable that - this along with the async vector indexing should help a lot. |
@valebedu - Thanks very much for logging this. Even if not a critical bug, feedback is helpful to our improving these connectors and is very much appreciated. 👍 |
@valebedu Were you able to run your sync successfully? |
Hi @flash1293 , Unfortunately no 😞, I'm facing a new error but I think this one is on the Weaviate client side I asked on the Weavaite #support slack channel to get more info on the error FYI the error is:
I also tried to see the diff between weaviate client v3.23.2 and v3.25.2 weaviate/weaviate-python-client@v3.23.2...v3.25.2 but I didn't find anything relevant it's the same logic |
@valebedu Are you using async vector indexing and version |
Yes I do, I'll try again without async indexing |
Hi @flash1293 FYI I downgraded Weaviate to 1.21.9 and the job finally run without any issue thanks a lot for your help :) |
That's great to hear @valebedu , keep me updated in case any other problems / feature gaps come up |
Connector Name
destination-weaviate
Connector Version
0.2.5
What step the error happened?
During the sync
Relevant information
Context
I'm syncing a table of 308k articles from Postgres to Weaviate and the sync failed because OpenAI return a 500 error for a single element.
update vector: connection to: OpenAI API failed with status: 500 error
The current batch of 128 elements failed and should be retried or perhaps it's better to stop it and retry it on the next job attempt.
But all the previous lines correctly added to Weaviate and vectorize should not be re added to Weaviate and revectorize, it incurs time and cost that can be avoided.
Obtained Result
Lines correctly added are revectorize
Expected Result
Lines correctly added are marked as added and are not revectorize
Step To Reproduce
Unfortunately to reproduce it you need to sync a large amount of data (really small) to weaviate in order to generate a lot of vector and be unlucky in order to get a 500 error
Or you can force the error with a proxy
Config
JOB_MAIN_CONTAINER_MEMORY_LIMIT=4096Mi
GOMEMLIMIT=2048MiB
Chunck size: 7372
Text fields to embed: publisher, publication_date, language_code, title, body
to build a custom field with secured token lengthText splitter: By separator
with Keep separator enableEmbedding: No external embedding
to let Weaviate do the vectorization jobBatch size: 128
(default)Relevant log output
Contribute
The text was updated successfully, but these errors were encountered: