-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async rehydrate with index! #542
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks great. thanks for putting this together. had a few questions.
but also - unless i'm misreading, i dont' think this works? specifically the storagePath.split('/').length === 7
check should always be false?
also just want to make sure this is thoroughly tested not just with clean state locally but against existing data, to make sure that (1) migration does not corrupt non multires-image data and (2) new CID routes do not introduce regressions in any of the code paths
creator-node/sequelize/migrations/20200623002237-add-parent-file-data.js
Show resolved
Hide resolved
creator-node/sequelize/migrations/20200623002237-add-parent-file-data.js
Outdated
Show resolved
Hide resolved
creator-node/sequelize/migrations/20200623002237-add-parent-file-data.js
Outdated
Show resolved
Hide resolved
creator-node/sequelize/migrations/20200623002237-add-parent-file-data.js
Outdated
Show resolved
Hide resolved
2778225
to
1c45065
Compare
1c45065
to
26cd771
Compare
I rewrote the split('/')[7] thing that gets the "leaf" of the path as a regex with capture so it's more reliable. I think that should hopefully mitigate some concerns. Tested this against staging data. Migration rollback is pretty bulletproof though so I'm not exceedingly worried about anything. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few small questions but this looks great!
{ transaction } | ||
) | ||
|
||
const files = (await queryInterface.sequelize.query(`SELECT * FROM "Files";`, { transaction }))[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this loads all the files into memory right? do you think that's okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually pretty sure this isn't going to work. this most likely exceeds node heap size, not to mention running a raw select * will take forever cause these tables have millions of records. we need to do this in a paginated way. like first get count, iterate over chunks of like 50k or 100k files. this query seems to work well
SELECT * FROM "Files" ORDER BY "multihash" ASC LIMIT 100000 OFFSET 200000;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated per our convo offline. Left the bulk update in as a comment.
creator-node/sequelize/migrations/20200623002237-add-parent-file-data.js
Outdated
Show resolved
Hide resolved
Pulled out the route changes in this PR, going in separately. Merging this now. |
This reverts commit 6f9c0e9.
Adds a migration that adds two columns to the files table
-- fileName: the actual queryable source filename stripped of any prefixes
-- dirMultihash: if the file is in an IPFS directory, the CID/multihash for the parent dir
Updates the models accordingly ^ and updates the image upload flow to populate those two columns
Allows fallback fetching from FS in /ipfs/:dirCID/:filename endpoint (we previously didn't have this)
Adds a GET query param to /ipfs/:CID and /ipfs/:dirCID/:filename
fromFS
that async dispatches IPFS rehydration and immediately returns query results from filesystemAdds an index to the dirMultihash column