-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add StatsContributorsStream #71
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments are mostly suggestions for improvement and discussion.
state_partitioning_keys = ["repo", "org"] | ||
# Note - these queries are expensive and the API might return an HTTP 202 if the response | ||
# has not been cached recently. https://docs.github.com/en/rest/reference/metrics#a-word-about-caching | ||
tolerated_http_errors = [202] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@edgarrmondragon the docs say that the request should be retried "a bit later" when we get a 202. Is there some sort of mechanism in the sdk to do this (I'd imagine something like enqueueing retries after N seconds)? I guess it could be handle the same way as a 429 - Too many requests
but that would less than ideal.
tap_github/repository_streams.py
Outdated
weekly_data = contributor_activity["weeks"] | ||
for week in weekly_data: | ||
# no need to save weeks with no contributions. | ||
is_week_empty = sum(week[key] for key in ["a", "c", "d"]) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we distinguish a missing response (say if we received a 202 that has not been retried yet, something like undefined
) from this situation where it's a zero
, in the downstream processing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's relevant, we get data for the past year each time, so ultimately we will get the data we want. The only issue is if we ALWAYS get a 202...
tap_github/repository_streams.py
Outdated
th.Property("org", th.StringType), | ||
th.Property("user_id", th.IntegerType), | ||
# Activity keys | ||
th.Property("w", th.IntegerType), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be tempted to rename these fields so they're more or less self-documenting:
week_start
additions
deletions
commits
What do you think?
tap_github/repository_streams.py
Outdated
th.Property("c", th.IntegerType), | ||
# Contributor keys | ||
th.Property("login", th.StringType), | ||
th.Property("id", th.IntegerType), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is id
always the same as user_id
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed id
to user_id
in the spirit of clarity as above.
tap_github/repository_streams.py
Outdated
# Parent keys | ||
th.Property("repo", th.StringType), | ||
th.Property("org", th.StringType), | ||
th.Property("user_id", th.IntegerType), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually receive user_id
from the parent, if the parent is a repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Co-authored-by: Laurent Savaete <laurent@where.tf>
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
Add the repo
/stats/contributors
endpoint to fetch weekly contribution data