Scoring Algorithm #9

Arrowbox · 2020-04-22T03:57:44Z

The scoring algorithm is probably the next big feature for this right now it goes like:

Score = Lines

That's nice but there are a few other factors I'd like to consider and use for ordering contributors.

Factors

Lines: This is definitely the first order approximation. If someone wrote 95 out of 100 lines, they are probably familiar with how it works.
Commits: Ideally, contributing many commits correlates with some understanding over time.
Most recent commit date: Writing 100 lines a year ago is not as useful as writing 10 lines yesterday.
Ownership of specific lines: If a specific set of lines has been requested for analysis (for example, a Pull-request bot that suggests reviewers including the set of lines that have changed) then contributors for those lines are of particular interest.

Roughly I'd like to implement a score closer to:

S = W(Lines)*(Lines/Total Lines) + W(Commits)*(Commits/Total Commits) + W(Date) * ((1/2)^((Today-Date)/Halflife))

The last term looks complicated but is just a half-life based exponential decay. All of the terms take the form for weight * normalized metric.

I'm going to break out a few test cases to see how the weighting might look.
Google Spreadsheet

The text was updated successfully, but these errors were encountered: