-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Respect Co-authored-by trailers in the contributors graph #30925
base: main
Are you sure you want to change the base?
Respect Co-authored-by trailers in the contributors graph #30925
Conversation
Now that I am looking at into it further, I think Pulse may need to account for co-authors too. I think this function is the one that needs updating |
} | ||
|
||
email = strings.TrimRight(parts[1], ">") | ||
if _, err := mail.ParseAddress(email); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since users can add anything between the <> brackets I believed that we could try to throw away lines that contain invalid email addresses.
Though while testing by creating blank repos and merging the commit history of existing projects into these repos I observed that some emails associated to bots will be rejected (e.g. the pre-commit bot on GitHub). I guess this happens because these emails don't follow RFC 5322. Not sure if I should do this validation check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on how we parse other trailer lines I suppose. I would make it consistent with those.
Hmm I'm not sure how common it is, but should I account for duplicate |
@@ -125,8 +127,7 @@ func getExtendedCommitStats(repo *git.Repository, revision string /*, limit int | |||
_ = stdoutWriter.Close() | |||
}() | |||
|
|||
gitCmd := git.NewCommand(repo.Ctx, "log", "--shortstat", "--no-merges", "--pretty=format:---%n%aN%n%aE%n%as", "--reverse") | |||
// AddOptionFormat("--max-count=%d", limit) | |||
gitCmd := git.NewCommand(repo.Ctx, "log", "--shortstat", "--no-merges", "--pretty=format:---%n%aN%n%aE%n%as%n%(trailers:key=Co-authored-by,valueonly=true)", "--reverse") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked compat on this trailers
option and it does appear to be present in git 2.27 at least which is the oldest version where git's site has docs for, so it's likely fine to use:
Recent change made the co-author parsing logic more robust to inputs where the co-author's name may include a '<'. I manually tested all the code paths for |
I guess duplicates (by email) should count as one. |
I'm reminded of a recent vulnerability GitHub had in their commit message parser surrounding these meta lines. Hope we don't replicate it, maybe you can double-check and add a test. |
BTW is it feasible to move all commit message parsing into a shared function? As I see it, parsing these Co-Authored-By lines is not specific to Commit graph and could be useful elsewhere too. |
Recent changes exports the trailer parsing logic and adds adds parameterized tests for it. We now also handle duplicated Co-authored-by entries (i.e. entries that have the same email). |
Nice read! It appears it is caused by their parser's use of regex, where it iterates a commit's metadata lines looking for its first match with a line that starts with I have included some tests for my parser and I'm open to include more possible inputs. |
Resolves #29183.
Added an automated test (and refactored the test code around it) and also manually tested the Contributors, Commit Frequency, and Recent Commits pages in a few small repos and the output appears to be fine.