-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace index.json with index.sql as the main index #452
Conversation
…of a list of links
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a high level this looks good, but one I think I was hoping is that we could get rid of Link
entirely with this PR and use Snapshot
exclusively.
Link
was only needed as an in-memory representation of a Snapshot
before it was added to the DB, because the DB was not the single source of truth.
Now that the DB is a single source of truth, we can do everything with Snapshot
, and if we need an in-memory representation we can just do Snapshot(url='http://...', ...)
without calling .save()
yet.
Thanks for the review! I will make the required changes, and start the removal of |
…make it more explicit. Change it so it affects the csv output too.
…ws other iterables and not just lists
Co-authored-by: Nick Sweeting <git@sweeting.me>
@pirate I think that most of the requested changes were done. There are a couple of them pending, but we can address them later. What else do you want in place to be able to close this PR? |
Summary
After this PR is ready, the json index will not be considered the main source of truth anymore. Instead, the index.sqlite3 will replace it in that role.
The index.json will still be around, but it will only be written at the end of the processes that run. If the archive is old (no index.sqlite3 is present) running
archivebox init --force
will be necessary to update it to the latest version.Changes these areas
Roadmap Goals
This is one of the main goals of the 0.5 release.