Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backfill: rev is not loaded from DB #629

Open
mrd0ll4r opened this issue Mar 29, 2024 · 1 comment
Open

backfill: rev is not loaded from DB #629

mrd0ll4r opened this issue Mar 29, 2024 · 1 comment

Comments

@mrd0ll4r
Copy link
Contributor

I'm currently backfilling all repositories. I know it's possible to apply diffs from the firehose data, but for the moment I've chosen not to do that, since I feel like there are a bunch more edge cases to consider, and for some reason my subscription to the firehose is somewhat flaky.
So for now I'm downloading the repos using sync.getRepo. To speed things up, I'd like to not re-download repos for which I have the latest revision locally. As such, when listing repos using sync.listRepos, I check the returned rev with the one in (existing) gorm jobs, to decide whether to re-enqueue them.

When loading a job from the database, some fields are filled, but the rev field is not: https://github.com/bluesky-social/indigo/blob/main/backfill/gormstore.go#L214-L225
This makes it impossible to check whether we already downloaded the repo at that version without looking at the downloaded data, which is annoying.

To make matters worse, sync.ListRepos seems to return empty strings for the rev field. The head field is populated -- would it be a better approach to use that to check whether we already have the lastest version of a repo downloaded?
See, e.g.:

$ curl -L -X GET 'https://bsky.network/xrpc/com.atproto.sync.listRepos?limit=10&cursor=' -H 'Accept: application/json' | jq '.repos[].rev'
""
""
""
""
""
""
""
""
""
""

Also (sorry this devolved into three issues in one) I just noticed the generated CURL sample on the docs website is wrong, it shows as:

curl -L -X GET 'https://bsky.social/xrpc' \
-H 'Accept: application/json'

without a method.

Sorry for the triple issue!

@bnewbold
Copy link
Collaborator

bnewbold commented Apr 8, 2024

These all sound like legit issues and developer papercuts. Thanks for reporting!

If you have some small quick-fix PRs with no performance concerns, we might be able to get those in, but we are juggling a bunch of priorities and work streams and it might take a while.

Could you open a separate issue in the docs site repo about the curl examples being incomplete? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants