Async rehydrate with index! #542

raymondjacobson · 2020-06-23T16:50:45Z

Adds a migration that adds two columns to the files table
-- fileName: the actual queryable source filename stripped of any prefixes
-- dirMultihash: if the file is in an IPFS directory, the CID/multihash for the parent dir
Updates the models accordingly ^ and updates the image upload flow to populate those two columns
Allows fallback fetching from FS in /ipfs/:dirCID/:filename endpoint (we previously didn't have this)
Adds a GET query param to /ipfs/:CID and /ipfs/:dirCID/:filename fromFS that async dispatches IPFS rehydration and immediately returns query results from filesystem
Adds an index to the dirMultihash column

creator-node/src/routes/files.js

SidSethi

overall looks great. thanks for putting this together. had a few questions.
but also - unless i'm misreading, i dont' think this works? specifically the storagePath.split('/').length === 7 check should always be false?

also just want to make sure this is thoroughly tested not just with clean state locally but against existing data, to make sure that (1) migration does not corrupt non multires-image data and (2) new CID routes do not introduce regressions in any of the code paths

creator-node/sequelize/migrations/20200623002237-add-parent-file-data.js

creator-node/src/models/file.js

creator-node/src/routes/files.js

creator-node/src/utils.js

creator-node/src/routes/files.js

…async

raymondjacobson · 2020-07-01T00:39:45Z

overall looks great. thanks for putting this together. had a few questions.
but also - unless i'm misreading, i dont' think this works? specifically the storagePath.split('/').length === 7 check should always be false?

also just want to make sure this is thoroughly tested not just with clean state locally but against existing data, to make sure that (1) migration does not corrupt non multires-image data and (2) new CID routes do not introduce regressions in any of the code paths

I rewrote the split('/')[7] thing that gets the "leaf" of the path as a regex with capture so it's more reliable. I think that should hopefully mitigate some concerns. Tested this against staging data. Migration rollback is pretty bulletproof though so I'm not exceedingly worried about anything.

dmanjunath

Few small questions but this looks great!

dmanjunath · 2020-07-01T01:44:56Z

creator-node/sequelize/migrations/20200623002237-add-parent-file-data.js

+        { transaction }
+      )
+
+      const files = (await queryInterface.sequelize.query(`SELECT * FROM "Files";`, { transaction }))[0]


this loads all the files into memory right? do you think that's okay?

actually pretty sure this isn't going to work. this most likely exceeds node heap size, not to mention running a raw select * will take forever cause these tables have millions of records. we need to do this in a paginated way. like first get count, iterate over chunks of like 50k or 100k files. this query seems to work well

SELECT * FROM "Files" ORDER BY "multihash" ASC LIMIT 100000 OFFSET 200000;

Updated per our convo offline. Left the bulk update in as a comment.

creator-node/src/models/file.js

creator-node/src/utils.js

creator-node/sequelize/migrations/20200623002237-add-parent-file-data.js

raymondjacobson · 2020-07-02T16:46:35Z

Pulled out the route changes in this PR, going in separately. Merging this now.

This reverts commit 6f9c0e9.

raymondjacobson requested review from dmanjunath, SidSethi, hareeshnagaraj and vicky-g June 23, 2020 16:51

SidSethi reviewed Jun 23, 2020

View reviewed changes

creator-node/src/routes/files.js Outdated Show resolved Hide resolved

SidSethi reviewed Jun 24, 2020

View reviewed changes

vicky-g reviewed Jun 25, 2020

View reviewed changes

creator-node/src/routes/files.js Outdated Show resolved Hide resolved

vicky-g reviewed Jun 25, 2020

View reviewed changes

creator-node/src/routes/files.js Outdated Show resolved Hide resolved

raymondjacobson removed request for dmanjunath and hareeshnagaraj June 29, 2020 16:53

raymondjacobson changed the title ~~Async rehydrate~~ [DO NOT MERGE/REVIEW] Async rehydrate Jun 29, 2020

raymondjacobson force-pushed the rj-async-rehydrate branch from 2778225 to 1c45065 Compare July 1, 2020 00:33

raymondjacobson added 10 commits June 30, 2020 17:33

Add fromFS option to /ipfs path

b24aa00

Add migration to support direct dir cid queries and make rehydration …

6c63de6

…async

Add comment

1f3c0bc

Fix lint

ad8e28e

Remove commented out code

a52f0b3

Fix comma

34c6a33

Remove \n\n\n

8fc0699

Clean up

b95b3c1

Fix index

002bcdf

Rebase & install

26cd771

raymondjacobson force-pushed the rj-async-rehydrate branch from 1c45065 to 26cd771 Compare July 1, 2020 00:34

Lint

a6ec07a

raymondjacobson requested a review from dmanjunath July 1, 2020 00:39

raymondjacobson changed the title ~~[DO NOT MERGE/REVIEW] Async rehydrate~~ Async rehydrate with index! Jul 1, 2020

dmanjunath suggested changes Jul 1, 2020

View reviewed changes

raymondjacobson added 2 commits July 1, 2020 15:46

Clean up and paginate query

22d983d

Fix migration

aea9118

Comment out update

d8f2f52

dmanjunath approved these changes Jul 2, 2020

View reviewed changes

Reset route changes

b9ff8cb

Lint fix

91c6205

raymondjacobson merged commit 6f9c0e9 into master Jul 2, 2020

raymondjacobson deleted the rj-async-rehydrate branch July 2, 2020 17:08

raymondjacobson added a commit that referenced this pull request Jul 2, 2020

Revert "Async rehydrate with index! (#542)"

b877ab0

This reverts commit 6f9c0e9.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async rehydrate with index! #542

Async rehydrate with index! #542

raymondjacobson commented Jun 23, 2020 •

edited

SidSethi left a comment

raymondjacobson commented Jul 1, 2020

dmanjunath left a comment

dmanjunath Jul 1, 2020

dmanjunath Jul 1, 2020 •

edited

raymondjacobson Jul 2, 2020

raymondjacobson commented Jul 2, 2020

Async rehydrate with index! #542

Async rehydrate with index! #542

Conversation

raymondjacobson commented Jun 23, 2020 • edited

SidSethi left a comment

Choose a reason for hiding this comment

raymondjacobson commented Jul 1, 2020

dmanjunath left a comment

Choose a reason for hiding this comment

dmanjunath Jul 1, 2020

Choose a reason for hiding this comment

dmanjunath Jul 1, 2020 • edited

Choose a reason for hiding this comment

raymondjacobson Jul 2, 2020

Choose a reason for hiding this comment

raymondjacobson commented Jul 2, 2020

raymondjacobson commented Jun 23, 2020 •

edited

dmanjunath Jul 1, 2020 •

edited