Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syncs should eventually skip unavailable content and progress #1693

Merged
merged 27 commits into from
Aug 12, 2021

Conversation

SidSethi
Copy link
Contributor

@SidSethi SidSethi commented Jul 28, 2021

Description

What is the purpose of this PR? What is the current behavior? New behavior? Relevant links (e.g. Trello) and/or information pertaining to PR?

Full details on Notion

Old behavior: if a secondary sync fails to retrieve any content (for any reason), it will immediately reject sync.
New behavior: a sync should eventually progress past unretrievable content and succeed.

Implementation:

  • new skipped Boolean column in Files table
  • sync ops will record every time a sync fails due to failure to retrieve/process files; once this failure count for a given user has crossed a specified threshold, the sync will mark those files as skipped in DB and succeed sync
  • new SkippedCIDRetryService to retry all content in DB marked as skipped to correct for transient unavailability. Will only process content created after specified date to prevent repeat processing of old, permanently unavailable content
  • expose skipped info in /users/clock_status/<wallet> route for later consumption

Other details:

  • configurable option saveFileForMultihashToFSIPFSFallback for sync to not check IPFS after gateway. defaulted to true for back-compat
  • change file retrieval to not error so sync always processes all files instead of immediately short-circuiting on first failure

Tests

List the manual tests and repro instructions to verify that this PR works as anticipated. Include log analysis if possible.
❗ If this change impacts clients, make sure that you have tested the clients ❗

  1. new unit tests for /users/clock_status<wallet>
  2. Maddog only tests happy path - never covers content unavailability, manual testing approach below

Manual:

  • bring up local setup with at least 5 CN (not sure why this is failing with 4)
  • on node startup, confirm db migration works
  • on node startup, confirm retryQueue starts and is processing correctly
  • run maddog base test on 1 user, confirm base cases work (content verification checks should confirm sFFM (saveFileForMultihashToFS()) still works i.e. no regression
  • force sync skipping
    • force CID unavail
      • randomly select a previously created CID for 1 user, add code in sFFM to reject save
      • override saveFileForMultihashToFSIPFSFallback to false
    • force 1 user Reconfig -> add code in snapback to mark 1 of user's secondaries as unhealthy
      • confirm sync to newly selected replica fails N times
      • confirm sync to newly selected replica eventually succeeds and marks CID as skipped
  • new clockstatus route
    • test with new param
    • test without new param
    • check user that exists
    • check user that doesn't exist
    • user that has no skips
    • user that has skips
    • check with param on primary and secondary
  • confirm retryQueue works
    • confirm queue retries skipped CID and fails
    • confirm createdAt date calc works: let it run and fetch skipped CID -> modify createdAt threshold to be after that CID -> confirm CID is not fetched -> reset threshold
    • remove code on primary to never stream CID -> ensure retryQueue eventually fetches CID + records it, also updates DB record to skipped = false
    • confirm this is reflected in users/clock_status route

❗ Reminder 💡❗:
If this PR touches a critical flow (such as Indexing, Uploads, Gateway or the Filesystem), make sure to add the requires-special-attention label. Add relevant labels as necessary.

How will this change be monitored?

For features that are critical or could fail silently please describe the monitoring/alerting being added.

TODO specify logs here

@SidSethi SidSethi changed the title Syncs should eventually skip unavailable content and progress [WIP] Syncs should eventually skip unavailable content and progress Aug 2, 2021
@SidSethi SidSethi added content-node Content Node (previously known as Creator Node) feature New features labels Aug 2, 2021
Copy link
Member

@raymondjacobson raymondjacobson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Super straight forward to me

creator-node/src/routes/users.js Outdated Show resolved Hide resolved
creator-node/src/services/sync/processSync.js Outdated Show resolved Hide resolved
creator-node/src/services/sync/processSync.js Outdated Show resolved Hide resolved
creator-node/src/services/sync/processSync.js Outdated Show resolved Hide resolved
creator-node/src/services/sync/skippedCIDsRetryService.js Outdated Show resolved Hide resolved
creator-node/src/services/sync/skippedCIDsRetryService.js Outdated Show resolved Hide resolved
creator-node/src/services/sync/skippedCIDsRetryService.js Outdated Show resolved Hide resolved
@SidSethi SidSethi marked this pull request as ready for review August 3, 2021 21:06
@SidSethi SidSethi changed the title [WIP] Syncs should eventually skip unavailable content and progress Syncs should eventually skip unavailable content and progress Aug 3, 2021
Copy link
Contributor

@dmanjunath dmanjunath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One kind of structural comment, otherwise looks good

creator-node/src/services/sync/skippedCIDsRetryService.js Outdated Show resolved Hide resolved
creator-node/src/config.js Show resolved Hide resolved
creator-node/src/services/sync/processSync.js Outdated Show resolved Hide resolved
Copy link
Contributor

@dmanjunath dmanjunath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Some small NIT's but overall this looks exactly right.

creator-node/src/config.js Outdated Show resolved Hide resolved
creator-node/src/fileManager.js Outdated Show resolved Hide resolved
creator-node/src/fileManager.js Show resolved Hide resolved
creator-node/src/services/sync/processSync.js Outdated Show resolved Hide resolved
creator-node/test/nodesync.test.js Outdated Show resolved Hide resolved
@SidSethi SidSethi added the requires-special-attention This change is risky and/or critical and requires special attention in review label Aug 11, 2021
@SidSethi SidSethi merged commit 5f79a6c into master Aug 12, 2021
@SidSethi SidSethi deleted the ss-skip-unavail-content branch August 12, 2021 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content-node Content Node (previously known as Creator Node) feature New features requires-special-attention This change is risky and/or critical and requires special attention in review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants