Multi-user #38

jeffchuber · 2022-10-27T19:22:35Z

One concern about using DuckDb and parquet is maintaining correctness even when potentially many requests are coming in per second to add new embeddings to the production data space.

The other concern is multiple users in the org querying or pulling data from a service at the same time.

Will this work? Will there be collisions?

levand · 2022-10-29T00:30:49Z

Yes, there would be. Follow up question: Would there be concurrent requests to the same dataset?

Also want to dig into the notion of "correctness". Depending how you design the API, update the in-memory model & flush to disk it should be possible to avoid potential conflicts or corruption, but you'd be subject to data loss if the process crashed after receiving a message but before persisting.

I'll read up on the capabilities of DuckDB to support incremental writes, that makes a difference here too.

jeffchuber · 2022-10-31T05:09:14Z

Agree this is under-defined. Let's discuss tomorrow! I am also not at all wed to duckdb! In some ways - it might just not be the right tool for the job? We will see

jeffchuber changed the title ~~Multi-user~~ [CHR-26] Multi-user Oct 27, 2022

jeffchuber changed the title ~~[CHR-26] Multi-user~~ Multi-user Oct 27, 2022

jeffchuber added the planning label Oct 28, 2022

jeffchuber assigned levand Oct 28, 2022

This was referenced Oct 29, 2022

Use gunicorn in the Docker image? Uvicorn not advised in prod #48

Closed

Data storage for parquet files #43

Closed

jeffchuber closed this as completed Nov 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-user #38

Multi-user #38

jeffchuber commented Oct 27, 2022 •

edited

Loading

levand commented Oct 29, 2022

jeffchuber commented Oct 31, 2022

Multi-user #38

Multi-user #38

Comments

jeffchuber commented Oct 27, 2022 • edited Loading

levand commented Oct 29, 2022

jeffchuber commented Oct 31, 2022

jeffchuber commented Oct 27, 2022 •

edited

Loading