Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add StatsContributorsStream #71

Merged
merged 10 commits into from
Jan 26, 2022
Merged

Add StatsContributorsStream #71

merged 10 commits into from
Jan 26, 2022

Conversation

ericboucher
Copy link
Contributor

Add the repo /stats/contributors endpoint to fetch weekly contribution data

Copy link
Contributor

@laurentS laurentS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are mostly suggestions for improvement and discussion.

state_partitioning_keys = ["repo", "org"]
# Note - these queries are expensive and the API might return an HTTP 202 if the response
# has not been cached recently. https://docs.github.com/en/rest/reference/metrics#a-word-about-caching
tolerated_http_errors = [202]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgarrmondragon the docs say that the request should be retried "a bit later" when we get a 202. Is there some sort of mechanism in the sdk to do this (I'd imagine something like enqueueing retries after N seconds)? I guess it could be handle the same way as a 429 - Too many requests but that would less than ideal.

weekly_data = contributor_activity["weeks"]
for week in weekly_data:
# no need to save weeks with no contributions.
is_week_empty = sum(week[key] for key in ["a", "c", "d"]) > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we distinguish a missing response (say if we received a 202 that has not been retried yet, something like undefined) from this situation where it's a zero, in the downstream processing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's relevant, we get data for the past year each time, so ultimately we will get the data we want. The only issue is if we ALWAYS get a 202...

th.Property("org", th.StringType),
th.Property("user_id", th.IntegerType),
# Activity keys
th.Property("w", th.IntegerType),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be tempted to rename these fields so they're more or less self-documenting:

  • week_start
  • additions
  • deletions
  • commits

What do you think?

th.Property("c", th.IntegerType),
# Contributor keys
th.Property("login", th.StringType),
th.Property("id", th.IntegerType),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is id always the same as user_id here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed id to user_id in the spirit of clarity as above.

# Parent keys
th.Property("repo", th.StringType),
th.Property("org", th.StringType),
th.Property("user_id", th.IntegerType),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually receive user_id from the parent, if the parent is a repo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't :)

Copy link
Contributor

@laurentS laurentS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

tap_github/repository_streams.py Outdated Show resolved Hide resolved
tap_github/repository_streams.py Outdated Show resolved Hide resolved
@sonarcloud
Copy link

sonarcloud bot commented Jan 26, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@ericboucher ericboucher merged commit a210ffa into main Jan 26, 2022
@ericboucher ericboucher deleted the add-stats-contributors branch January 26, 2022 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants