Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Has Linus Torvalds made literally 175 million commits? #564

Open
hafniz opened this issue Oct 9, 2020 · 11 comments · May be fixed by #1691
Open

Has Linus Torvalds made literally 175 million commits? #564

hafniz opened this issue Oct 9, 2020 · 11 comments · May be fixed by #1691
Labels
bug Something isn't working. ⭐ top bug Top bug. stats-card Feature, Enhancement, Fixes related to stats the stats card.

Comments

@hafniz
Copy link

hafniz commented Oct 9, 2020

Describe the bug
@torvalds joined GitHub on September 3, 2011, and makes 2000+ contributions every year (as seen on his profile) that sums up to about 22k. However, your badge shows that he has a total of 175308.6k commits, which is far from reality. Even the number itself is absurd- one who commits once every minute would need 333 years to achieve this number!

Expected behavior
"Total commits" shows a number that is comparable to the number of contributions given by Github.

Screenshots / Live demo link (paste the github-readme-stats link as markdown image)
image

Additional context
None

@anuraghazra
Copy link
Owner

haha nice observation, lol.
Thanks for reporting this, i think the commits are just 17k and the kFormatter is failed to parse it properly maybe.

plot twist: linus is actually 333years old

@anuraghazra anuraghazra added the bug Something isn't working. label Oct 9, 2020
@anuraghazra
Copy link
Owner

Okay so turns out, github returns this number from their server: 164882902 which should be correct since it's coming from their API.

and parsing it with a proper kFormatter results in "165M"

function nFormatter(num, digits) {
  var si = [
    { value: 1, symbol: "" },
    { value: 1E3, symbol: "k" },
    { value: 1E6, symbol: "M" },
    { value: 1E9, symbol: "G" },
    { value: 1E12, symbol: "T" },
    { value: 1E15, symbol: "P" },
    { value: 1E18, symbol: "E" }
  ];
  var rx = /\.0+$|(\.[0-9]*[1-9])0+$/;
  var i;
  for (i = si.length - 1; i > 0; i--) {
    if (num >= si[i].value) {
      break;
    }
  }
  return (num / si[i].value).toFixed(digits).replace(rx, "$1") + si[i].symbol;
}

nFormatter(164882902) // "165M"

@anuraghazra
Copy link
Owner

Damn then linus is actually 333years old

@hafniz
Copy link
Author

hafniz commented Oct 9, 2020

Hmm that's weird... unless they have some bot that auto-commits to some secret repos that don't show up in the published contribution count. Even so those "secret" commits have made up 99.99% of the total 🤔

@iamabs2001
Copy link

@rickstaa rickstaa added the stats-card Feature, Enhancement, Fixes related to stats the stats card. label Mar 9, 2022
@Rongronggg9
Copy link
Contributor

Okay so turns out, github returns this number from their server: 164882902 which should be correct since it's coming from their API.

Hmm that's weird... unless they have some bot that auto-commits to some secret repos that don't show up in the published contribution count. Even so those "secret" commits have made up 99.99% of the total thinking

The API counts commit in approximately this way: counts all commits from someone in all non-forked repositories, despite some of them having the same commit hashes. So the answer is clear, there are tons of non-forked Linux kernel source trees on GitHub, so each commit from Linus in each non-forked Linux kernel source tree participates in counting...

@Rongronggg9
Copy link
Contributor

Randomly pick some Linux kernel contributors and you will find that their commits have also been counted so many times.

https://api.github.com/search/commits?q=author:tiwai

https://api.github.com/search/commits?q=author:jhedberg

https://api.github.com/search/commits?q=author:Icenowy

@hafniz
Copy link
Author

hafniz commented Mar 30, 2022

The API counts commit in approximately this way: counts all commits from someone in all non-forked repositories, despite some of them having the same commit hashes. So the answer is clear, there are tons of non-forked Linux kernel source trees on GitHub, so each commit from Linus in each non-forked Linux kernel source tree participates in counting...

That tells a lot. Btw how did you know the way it counts?

@Rongronggg9
Copy link
Contributor

The API counts commit in approximately this way: counts all commits from someone in all non-forked repositories, despite some of them having the same commit hashes. So the answer is clear, there are tons of non-forked Linux kernel source trees on GitHub, so each commit from Linus in each non-forked Linux kernel source tree participates in counting...

That tells a lot. Btw how did you know the way it counts?

Let us take a look at https://api.github.com/search/commits?q=author:torvalds and you will find the answer :)

@Rongronggg9
Copy link
Contributor

The API counts commit in approximately this way: counts all commits from someone in all non-forked repositories, despite some of them having the same commit hashes. So the answer is clear, there are tons of non-forked Linux kernel source trees on GitHub, so each commit from Linus in each non-forked Linux kernel source tree participates in counting...

That tells a lot. Btw how did you know the way it counts?

Or you may directly check https://github.com/search?q=author:torvalds to get the same conclusion as mine.

@Rongronggg9 Rongronggg9 linked a pull request Apr 2, 2022 that will close this issue
@Rongronggg9
Copy link
Contributor

Rongronggg9 commented Apr 2, 2022

#1691

current my fork
torvalds_curr torvalds_my
tiwai_curr tiwai_my
jhedberg_curr jhedberg_my
Icenowy_curr Icenowy_my

context: #564 (comment)
The search API is entirely unreliable and we should never use it. Though using such a traversal could potentially cause rate limits, we should still get rid of using the search API. Why? Almost all Linux kernel contributors could never use this project as you have already seen above. It is absolutely unacceptable, isn't it? If we need to avoid being rate limited, we should consider using a database to cache users' commits in the past years.

Even I and @anuraghazra are affected:

current my fork
Rongronggg9_curr Rongronggg9_my
anuraghazra_curr anuraghazra_my

If you want to check if you are affected, my deployment is https://github-readme-stats-rongronggg9.vercel.app/api

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. ⭐ top bug Top bug. stats-card Feature, Enhancement, Fixes related to stats the stats card.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants