You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One concern about using DuckDb and parquet is maintaining correctness even when potentially many requests are coming in per second to add new embeddings to the production data space.
The other concern is multiple users in the org querying or pulling data from a service at the same time.
Will this work? Will there be collisions?
The text was updated successfully, but these errors were encountered:
Yes, there would be. Follow up question: Would there be concurrent requests to the same dataset?
Also want to dig into the notion of "correctness". Depending how you design the API, update the in-memory model & flush to disk it should be possible to avoid potential conflicts or corruption, but you'd be subject to data loss if the process crashed after receiving a message but before persisting.
I'll read up on the capabilities of DuckDB to support incremental writes, that makes a difference here too.
Agree this is under-defined. Let's discuss tomorrow! I am also not at all wed to duckdb! In some ways - it might just not be the right tool for the job? We will see
One concern about using DuckDb and parquet is maintaining correctness even when potentially many requests are coming in per second to add new embeddings to the production data space.
The other concern is multiple users in the org querying or pulling data from a service at the same time.
Will this work? Will there be collisions?
The text was updated successfully, but these errors were encountered: