Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unmaterialized S3 sources with SQS notifications will lose data if multiple views depend on them #7423

Closed
quodlibetor opened this issue Jul 14, 2021 · 2 comments
Assignees
Labels
C-bug Category: something is broken T-correctness Theme: relates to consistency and correctness of results.

Comments

@quodlibetor
Copy link
Contributor

Since there is no way to specify different SQS queues for different
source instances, there is no way to ensure that all objects are
discovered as new source instances come up.

Mentioned in #7414, there are a few things we can do:

  1. Probably fastest, not great: refuse to create unmaterialized S3 sources that have SQS notifications enabled
  2. Middle of the road: create a way to share S3 objects discovered via SQS notifications to all instances of a given source. While there are multiple possible implementations, this should be doable by adding a map of OID -> MPSC queue that is shared by all source instances, and sending new object discovered messages on each of those queues in addition to the existing queue that the SQS discovery task sends on.
  3. Most complex: fully implement s3 sources: Support specifying a catalog of objects to download/ingest #6605

IMO the second option should work pretty well and be easy to implement.

@quodlibetor quodlibetor added the C-bug Category: something is broken label Jul 14, 2021
@elindsey elindsey added this to Icebox in Storage (Old) via automation Jul 14, 2021
@elindsey elindsey added this to the 1.0 milestone Jul 14, 2021
@quodlibetor quodlibetor moved this from Icebox to In Progress in Storage (Old) Jul 15, 2021
@quodlibetor
Copy link
Contributor Author

Update after speaking with @frankmcsherry: option 2 is doable by stashing an Rc<..> on our Worker or RenderState struct and ensuring that all instances of the same source render to the same worker.

This is actually another instance where things would compose more naturally if SQS notifications were their own potentially-hidden dataflow: we wouldn't need to do any worker-pinning in that case.

@philip-stoev philip-stoev added the T-correctness Theme: relates to consistency and correctness of results. label Jul 20, 2021
@quodlibetor quodlibetor self-assigned this Jul 21, 2021
@quodlibetor quodlibetor moved this from In Progress to To Do in Storage (Old) Jul 28, 2021
@umanwizard umanwizard added the P1 High priority; in the case of a bug, should be fixed within the current release cycle label Jul 29, 2021
@nmeagan11 nmeagan11 removed this from the 1.0 milestone Mar 22, 2022
@nmeagan11 nmeagan11 removed the P1 High priority; in the case of a bug, should be fixed within the current release cycle label Mar 22, 2022
@nmeagan11 nmeagan11 moved this from To Do to Icebox in Storage (Old) Mar 31, 2022
@nmeagan11 nmeagan11 removed this from Icebox in Storage (Old) Aug 11, 2022
@benesch
Copy link
Member

benesch commented Jan 28, 2024

Closing as stale, as we no longer support S3 sources.

@benesch benesch closed this as not planned Won't fix, can't repro, duplicate, stale Jan 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: something is broken T-correctness Theme: relates to consistency and correctness of results.
Projects
None yet
Development

No branches or pull requests

6 participants