Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Badge request: Total PyPI downloads #4319

Closed
JakobDev opened this issue Nov 11, 2019 · 14 comments · Fixed by #9564
Closed

Badge request: Total PyPI downloads #4319

JakobDev opened this issue Nov 11, 2019 · 14 comments · Fixed by #9564
Labels
needs-upstream-help Not actionable without help from a service provider service-badge Accepted and actionable changes, features, and bugs

Comments

@JakobDev
Copy link
Contributor

It would be nice, if I can get all Downloads from PyPI since the project was uploaded and not only a specific period.

@JakobDev JakobDev added the service-badge Accepted and actionable changes, features, and bugs label Nov 11, 2019
@calebcartwright
Copy link
Member

calebcartwright commented Nov 21, 2019

We'd love to be able to support all downloads from PyPI like we do for many of our other download-related badges. However, the challenge is that the upstream API we use to get download stats from PyPI does not support the all-time download count in the API response.

In order to support this, there'd need to be a viable endpoint where Shields could feasibly retrieve the data. PyPI Stats does provide an API that returns a massive response with the download count for each day since the package was first published, but I suspect that would be problematic for Shields due to having to aggregate the data. There's also the raw data in GCP, though I'm not sure how feasible that would be for Shields either.

If anyone is interested in seeing this implemented, a great way to help would be to try to find such an endpoint!

@paulmelnikow paulmelnikow changed the title Get Downloads from PyPI without Period Badge request: Total PyPI downloads Apr 5, 2020
@paulmelnikow paulmelnikow added the needs-upstream-help Not actionable without help from a service provider label Apr 5, 2020
@Akul2010
Copy link

How about this website?
https://www.pepy.tech/

@chris48s
Copy link
Member

chris48s commented Aug 17, 2023

For reference, here's the pepy JSON endpoint: https://api.pepy.tech/api/v2/projects/django

The problem with getting our day/week/monthly stats from pypistats and using pepy for the total is they count slightly different things. Pypistats presents summary statistics excluding mirrors only, Pepy only provides stats that do include mirrors so they aren't quite a like-for-like comparison. For some packages, they can be quite different.

In retrospect, I think we should have made the existing pypistats badges /pypistats/(dm|dw|dd) rather than /pypi/(dm|dw|dd). That would have kind of made it easier to just add pepy as another service. There isn't really a good reason why one source is any more valid than the other. They're both third parties making slightly different assumptions over the same source data. For historical reasons we kind of blessed pypistats as the "official" one, but that ship has now sailed.

Another consideration here is: The PyPI Downloads badges get a lot of traffic. We know we are the single largest source of traffic to pypistats. In the last hour we sent over 8,000 requests their way, but that is by no means peak. Pepy is a volunteer run service and they've indicated in psincraian/pepy#477 that although they are happy for people to use their API they may not be able or happy to handle a large amount of traffic. Given that, I think I also wouldn't like to completely switch from pypistats to pepy for this data. We know pypistats can reasonably reliably handle the traffic we throw their way.

So I think there are a few different ways we could go with this...

  1. Mix and match 1: Add a "PyPI Total Downloads" badge using pepy. Change nothing else. Accept that the day/week/monthly badges are counting a slightly different thing from the total downloads badge. Maybe it doesn't matter and I should just stop being a pedant about mirrors and move on.
  2. Mix and match 2: Add a "PyPI Total Downloads" badge using pepy. Switch to using a different API endpoint on pypistats to include mirrors on the day/week/monthly badges so they are more comparable. I haven't tested this, but I think if we switched from using https://pypistats.org/api/packages/django/recent to using https://pypistats.org/api/packages/django/overall we could assemble with_mirrors totals. this would be conceptually more consistent, but its a bigger API response to download and parse each time, and we'd have to sum them up ourselves. Not the end of the world, but as I say we serve a lot of these badges.
  3. Add pepy as a seperate service.

I think all in all, I'm in favour of 1. It is the simplest and most performant option, even if there is a bit of an apples and oranges comparison going on there. Anyone else got strong opinions on this?

Given the potential amount of traffic involved, I think I'd still want to open an issue on https://github.com/psincraian/pepy before someone works on a PR to add this. As I say:

  • they are a volunteer run service
  • this specific badge has the potential to become quite popular
  • although I don't think they explicitly rate limit, [Document] pepy api endpoint's psincraian/pepy#477 indicates that a large amount of traffic could be a problem

@Borda
Copy link

Borda commented Aug 18, 2023

How about this website? pepy.tech

we have been using it but it quite often returning 404 :(

@Akul2010
Copy link

@chris48s Not that I'm the biggest expert on Javascript, but can't you just take the number from the part of the project page that says "Total downloads"?

(SHowed what I'm talking about for one of my own packages)
Screenshot 2023-08-18 155530

@calebcartwright
Copy link
Member

@chris48s Not that I'm the biggest expert on Javascript, but can't you just take the number from the part of the project page that says "Total downloads"?

We don't do screen scraping of websites to get data for myriad reasons. We need a well formed API from which to get the data

@chris48s
Copy link
Member

Just to be clear: Getting the data is not one of the issues here. Pepy exposes a json API https://api.pepy.tech/api/v2/projects/rlvoice-1

@calebcartwright
Copy link
Member

Just to be clear: Getting the data is not one of the issues here. Pepy exposes a json API https://api.pepy.tech/api/v2/projects/rlvoice-1

Unclear if this was as intended to be a response to my prior comment @chris48s, but in case it was, I'll clarify that my comment was specifically in response to my understanding of @Akul2010's comment in #4319 (comment) that was suggesting an alternative way (screen scraping) of getting the download data instead of the specific API's we've discussed on this issue (pepy and pypistats) due to some of the limitations/tradeoffs that exist with those.

@chris48s
Copy link
Member

I opened an issue over on pepy about API usage psincraian/pepy#573

@WenjieDu
Copy link

Hey, just noticed you guy's discussion here. Want to raise another issue for discussion that the current PePy API doesn't provide any abbreviation of total downloads, e.g. https://api.pepy.tech/api/v2/projects/pypots, as you can see the total downloads of pypots package is 26159, rather than 26k. Although we can use data from pepy to build dynamic JSON badges with shields like , for packages with large amount of downloads, their badges may have big width. May request PePy to provide abbreviation num of downloads?

@calebcartwright
Copy link
Member

@WenjieDu - if you'd like a feature implemented in a service like PePy then that's best directed to that service/platform, as no such change/decision could be made by the Shields team.

However, that's not something Shields actually needs or would really want. We have a strong preference for the APIs to return the raw, unabbreviated data points so that we can apply our own standard abbreviation/rounding logic for consistency across our other badges

@WenjieDu
Copy link

@calebcartwright So do you provide "your own standard abbreviation/rounding logic" as a parameter or an argument to let users round raw numbers in your service like https://shields.io/badges/dynamic-json-badge? I didn't find it in your docs. Maybe you can help me with it? Thanks.

@calebcartwright
Copy link
Member

@calebcartwright So do you provide "your own standard abbreviation/rounding logic" as a parameter or an argument to let users round raw numbers in your service like https://shields.io/badges/dynamic-json-badge? I didn't find it in your docs. Maybe you can help me with it? Thanks.

This is quickly getting off topic of the PyPI badge request of this issue, but the answer is "No".

Use the Custom Endpoint Badge if you want to have that level of control of the message value, especially in cases where the dynamic badge query doesn't provide one's desired transformation functions/utility (e.g. #6071)

@chris48s
Copy link
Member

A point that comes out of psincraian/pepy#573 (comment)

A total downloads based on pepy's number (including mirrors) isn't really "PyPI total downloads" - it is "python package total downloads" (most of those downloads being from PyPI but a small number being from not-PyPI).

I think that's leading me towards saying "python package downloads from pepy" shouldn't be /pypi/dt/:packageName.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-upstream-help Not actionable without help from a service provider service-badge Accepted and actionable changes, features, and bugs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants