Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CON-510] Migrate legacy storage paths in background #4363

Merged
merged 17 commits into from
Nov 20, 2022

Conversation

theoilie
Copy link
Contributor

@theoilie theoilie commented Nov 15, 2022

Description

  • Adds a task that runs in the background to search for rows in the Files table with legacy storagePaths, copy (or move) the files to an updated path, and update the row to reflect the new path. Gated by env var migrateFilesWithLegacyStoragePath
  • Makes background tasks (this new legacy storagePath migration + disk deletion) run when cluster is disabled (they previously only ran when cluster was enabled)
  • Changes image dir validation to be storagePath-agnostic and make sure it's safe to change type=image rows and their corresponding type=dir row non atomically

Database impact

  • I paginated using fast cursor (>) pagination instead of LIMIT+OFFSET to avoid sequential scans
  • Finding all legacy files without batching (for manual inspection):
    • CN5 (1.7M rows in 280s): SELECT * FROM "Files" WHERE "storagePath" NOT LIKE '/file_storage/files/%' AND "storagePath" LIKE '/file_storage/%';
    • CN2 (9.3M rows in 158s): SELECT COUNT(*) FROM "Files" WHERE "storagePath" NOT LIKE '/file_storage/files/%' AND "storagePath" LIKE '/file_storage/%';
  • Finding all legacy files on CN2 with batching: these queries take milliseconds. Example: SELECT "storagePath" FROM "Files" WHERE "multihash" > 'Qma' AND "type" != 'dir' AND "storagePath" NOT LIKE '/file_storage/files/%' AND "storagePath" LIKE '/file_storage/%' LIMIT 100;

File storage impact

  • Files at legacy paths are deleted after they're successfully copied to a new path and their db row is updated. If we left the files at legacy paths then disk space usage would increase substantially. An upper bound estimate of per-node disk space increase if we didn't delete legacy files would be:
    • avg track+image file size * # of tracks and files on CN2
    • # of tracks and files on CN2 is ~9M according to this query: SELECT COUNT(*) FROM "Files" WHERE "type" != 'dir' AND "storagePath" NOT LIKE '/file_storage/files/%' AND "storagePath" LIKE '/file_storage/%';

Testing Legacy Path Migration

  1. Set migrateFilesWithLegacyStoragePath to false in config.js
  2. A seed clear; A seed create-user; A seed upload-track; A seed upload-image
  3. In db: UPDATE "Files" SET "storagePath" = CONCAT('/file_storage', SUBSTRING("storagePath" FROM 20)) WHERE true;
  4. docker exec -it cn3_creator-node_1 /bin/sh and move each file manually: cd /file_storage/files; ls then for each dir: mv <dir>/<CID> /file_storage
  5. Set migrateFilesWithLegacyStoragePath to true in config.js
  6. After a minute, storagePath in the db should be updated to what it was before, and in the container /file_storage should only have /files. Also each file in storagePath under the db should exist on disk

Other tests

  • Added tests to ensure extracting CIDs from legacy paths works for directories and non-directories

Monitoring - How will this change be monitored? Are there sufficient logs / alerts?

  • We should eventually see every node's value stop increasing for the filesMigratedFromLegacyPath metric
  • Search for occurrences of Error migrating legacy storagePaths, _copyLegacyFiles() could not remove, or Error updating legacy path db rows
  • When the following queries return 0 results, then there are no remaining legacy storage paths on the node:
    • SELECT * FROM "Files" WHERE "storagePath" NOT LIKE '/file_storage/files/%' AND "storagePath" LIKE '/file_storage/%' ORDER BY "multihash" ASC LIMIT 100;

@theoilie theoilie added the content-node Content Node (previously known as Creator Node) label Nov 15, 2022
@theoilie theoilie marked this pull request as draft November 15, 2022 23:52
Copy link
Contributor

@SidSethi SidSethi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally really nicely done, main thing we should talk about is the dir / image / nondir stuff, i need to handle that in my pr as well

creator-node/src/index.ts Show resolved Hide resolved
creator-node/src/dbManager.js Outdated Show resolved Hide resolved
creator-node/src/diskManager.ts Outdated Show resolved Hide resolved
creator-node/src/diskManager.ts Outdated Show resolved Hide resolved
creator-node/src/diskManager.ts Outdated Show resolved Hide resolved
creator-node/src/dbManager.js Show resolved Hide resolved
creator-node/src/fileManager.js Outdated Show resolved Hide resolved
creator-node/src/diskManager.ts Show resolved Hide resolved
@theoilie theoilie changed the title [CON-510] Skeleton without copy and db update [CON-510] Migrate legacy storage paths in background Nov 16, 2022
@theoilie theoilie marked this pull request as ready for review November 19, 2022 03:04
Copy link
Contributor

@SidSethi SidSethi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really really clean and thorough code. PR desc and code both very easy to follow. great work

p much good to go imo, 1 question about terminal condition of the while loop
and we should fast-follow the prom metric

creator-node/src/diskManager.ts Outdated Show resolved Hide resolved
creator-node/src/diskManager.ts Show resolved Hide resolved
creator-node/src/diskManager.ts Show resolved Hide resolved
creator-node/src/dbManager.js Show resolved Hide resolved
creator-node/src/utils/fsUtils.ts Show resolved Hide resolved
@theoilie theoilie merged commit e25edea into main Nov 20, 2022
@theoilie theoilie deleted the theo-migrate-storagePath branch November 20, 2022 06:48
@AudiusProject AudiusProject deleted a comment from linear bot Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content-node Content Node (previously known as Creator Node) size/XL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants