Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

npm total statistic is wrong #1278

Open
OlegKi opened this issue Nov 19, 2017 · 7 comments
Open

npm total statistic is wrong #1278

OlegKi opened this issue Nov 19, 2017 · 7 comments
Labels
bug Bugs in badges and the frontend service-badge Accepted and actionable changes, features, and bugs

Comments

@OlegKi
Copy link

OlegKi commented Nov 19, 2017

I'll explain the problem on an example. https://shields.io/ displays the statistic of "express" package as an example. It shows the value 195M (see https://img.shields.io/npm/dt/express.svg) as the total statistic of express downloads.
npm-express0

The npm-provider.js uses package download counts API to get NPM download statistic via request like

GET https://api.npmjs.org/downloads/point/{period}[/{package}]

One can easy verify that https://api.npmjs.org/downloads/point/2015-01-01:2017-12-31/express returns (today)

{"downloads":194785100,"start":"2016-05-20","end":"2017-11-20","package":"express"}

and 194785100 corresponds 195M displayed in the badge https://img.shields.io/npm/dt/express.svg. The problem is that one supposes that "2016-05-20" is really starting point of all downloads. On the other side one can make another request https://api.npmjs.org/downloads/point/2014-01-01:2016-05-19/express, which returns

{"downloads":57328706,"start":"2015-01-10","end":"2016-05-19","package":"express"}

(where "2016-05-19" is the date previous to "2016-05-20" from the "start" of the previous response). One more request https://api.npmjs.org/downloads/point/2014-01-01:2015-01-10/express returns

{"downloads":0,"start":"2015-01-10","end":"2015-01-10","package":"express"}

0 downloads. As the result, the total downloads in the interval from 2014-01-01 till today would be 194785100 + 57328706 = 252113806 and 252M instead of 195M would be the correct number of downloads of express.

As the reference one can open the page https://npm-stat.com/charts.html?package=express&from=2014-01-01&to=2017-11-19 and see exactly the same value:
npm-express

@platan
Copy link
Member

platan commented Nov 19, 2017

Limits section in NPM package download counts API states:

All other queries are limited to at most 18 months of data. The earliest date for which data will be returned is January 10, 2015.

This case is similar to #672

I can see two options:

  • load all data - currently we have to send 2 requests, but over time (every 18 months) number of requests will increase
  • do not change NPM total downloads badge behaviour, but add a clarification at https://shields.io/ what does "total downloads" really means; this will only inform users who adds badges to their sites.
    We could try add some explanation to badges generated at https://shields.io/ by adding titles, Markdown supports titles, e.g.
    [![npm](https://img.shields.io/npm/dt/express.svg "last 18 months")]() -> npm

@OlegKi
Copy link
Author

OlegKi commented Nov 19, 2017

Thanks for the quick response!

I think that loading of all data in 2 (or later more) requests, would be mostly interesting for the users. There are exist already the "last year" button https://img.shields.io/npm/dy/express.svg, which corresponds the request https://api.npmjs.org/downloads/point/2016-11-20:2017-11-20/express. I don't think that "downloads in the last 18 months" will be more interesting from the users point of view as already existing "last year" button.

The Limits says

All other queries are limited to at most 18 months of data. The earliest date for which data will be returned is January 10, 2015.

Then one could divide the time interval from today till "2015-01-10" in 18 months intervals and makes the sum of all results. It's nor really more complex as implementing of one requests. Currently it's enough 2 requests (like https://api.npmjs.org/downloads/point/2015-01-10:2016-06-30/express and https://api.npmjs.org/downloads/point/2016-07-01:2017-11-20/express), but later 3 or more requests would be required. One could add optionally the tooltip (like you suggested) that the total count take in considerations only the interval starting with January 10, 2015.

@paulmelnikow
Copy link
Member

Good points all around.

Some other options we could consider:

  1. Cache it
  2. Ask npm for a static dump of the annual historical data
  3. Run a job once to fetch all the historical data and put it on s3 or something

@paulmelnikow paulmelnikow added bug Bugs in badges and the frontend service-badge Accepted and actionable changes, features, and bugs labels Nov 19, 2017
OlegKi added a commit to free-jqgrid/jqGrid that referenced this issue Nov 19, 2017
The value of the badge "total npm downloads" shows wrong value. I posted the bug report as [the issue #1278](badges/shields#1278). At least till the problem is not fix I'll use "npm downloads per month" badge.
@astorije
Copy link

Hey there!

I'm just seeing this for thelounge where npm incorrectly says 76k total and npm-stat.com correctly says 102k total.

I think making a couple requests (I'm aware it will be one more request per 18 months) or caching historical data is preferable to inaccurate reports.
Since this project is a well-trusted project, an alternative could be to politely ask npm to whitelist an exception for your specific domain. Who knows, maybe they'd be open to that.

@paulmelnikow
Copy link
Member

Making multiple requests for every request is not a great option. We do need the data to be cached, or prefetched, in some way.

@brandon-d-mckay
Copy link

Making multiple requests for every request is not a great option. We do need the data to be cached, or prefetched, in some way.

Having an incorrect count is probably a worse option...

@paulmelnikow
Copy link
Member

Just wanted to put out there that Shields is a community project, and that means you can help make this happen! 👐

  • Ask npm for a static dump of the annual historical data
  • Run a job once to fetch all the historical data and put it on s3 or something

These approaches, for example, could easily be taken on independently of Shields. We could definitely point our implementations at them. Our maintainer team is happy to provide guidance on the API!

luiscarbonell added a commit to liquidcarrot/carrot that referenced this issue Jun 10, 2019
NPM and Shields.io are having bugs with their download stats and it's throwing our stats out of wack:

* badges/shields#1278
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs in badges and the frontend service-badge Accepted and actionable changes, features, and bugs
Projects
None yet
Development

No branches or pull requests

5 participants