Skip to content
This repository has been archived by the owner on Apr 13, 2021. It is now read-only.

Have NetKAN bot collect download counts #67

Merged
merged 1 commit into from
Aug 24, 2018

Conversation

HebaruSan
Copy link
Member

@HebaruSan HebaruSan commented Aug 23, 2018

Motivation

KSP-CKAN/CKAN#2415 suggests using publicly available data from host APIs to display aggregate download counts in CKAN. If this was in a GUI column, users could sort by "popularity" to find major mods that they haven't tried before.

Changes

This pull request is a first step towards making that idea a reality on the infrastructure side.

Now after the NetKAN bot finishes inflating all modules, it will generate a ~37 KB file at CKAN-meta/download_counts.json, that looks like this (but without the whitespace):

{
    "10kmOmniEVA4RT": 5407,
    "2KASJ0523Planetarysystem": 1214,
    "64k-overhaul": 445,
    "64xscale128xdistance": 806,
    "ABCORS": 1832,
    "AblativeAirbrake": 24636,
    "ABLaunchers": 9013,
    "AbovetheSkySoyuzTMAandR7": 2285,
    "Achievements": 13541,
...
    "YellowGreenJool": 1433,
    "YongeTechTreesPlugin-Revived": 5256,
    "Zenit3SLBReduzx": 1525,
    "ZeroFighter1": 427,
    "ZeroMiniAVC": 994,
    "ZeroPointInlineFairings": 9828,
    "ZPEPropulsionSystem": 947,
    "ZPIF-4-Stock": 1708,
    "ZZZRadioTelescope": 9033
}

This file is then added, committed, and pushed to the CKAN-meta repo. Since clients download the master.tar.gz of CKAN-meta when they update the registry, this will give us the ability in a future pull request to parse this json file into a new Dictionary<string, int> so the counts can be shown in GUI.

Alternatives considered

In theory we could have the client collect this data itself, but it takes several minutes, and users probably don't want to wait for that. It also effectively requires a GitHub token to work.

We could store a download_count property in .ckan files, but I don't think we want to have the bot constantly updating these files as the counts change. I also think we don't want the counts to be version-specific, as they would be if they were in .ckan files.

Caveats

The download counting isn't perfect, but then what is?

  • Only mods on Curse, GitHub, and SpaceDock will have download counts, since these are the ones with API documentation that I know about. All others are omitted from the file, but could be added later if they have an API that provides download counts.
  • If a mod is on multiple supported hosts, only the one from its $kref is checked. So if your $kref points to SpaceDock, then your download count will only be your SpaceDock download count, with any downloads from Curse or GitHub not included.
  • Mods that share a GitHub repo will share the same count. asset_match isn't supported, but could be added in the future. Mods that share downloads will probably be impossible to separate.
  • Mods on GitHub that use x_netkan_github.use_source_archive will not be counted, we only add up values from the assets list of releases
  • Mods on GitHub that have switched forks will not include the download counts from older forks. This will make some "classic" mods seem dramatically less popular (e.g., KER).
  • Four mods with meta netkans have syntax errors that jq can't handle. I submitted pull requests for 3 of them, but I think the 4th is valid syntax that jq doesn't support (// comments). These will also be excluded:
    1. Remove extra comma from netkan mmoench/KRnD#7
    2. Remove extra comma from netkan jrossignol/KerbalSports#5
    3. Remove extra comma from netkan jrossignol/Strategia#75
    4. https://raw.githubusercontent.com/InsaneDruid/Proton-M/master/ProtonMBreezeM.netkan

@techman83
Copy link
Member

I'll take a close look next coffee break, but we already have all the JSON parsing + http tools available as part of the Perl stack. Is there a good reason to do this in bash?

@HebaruSan
Copy link
Member Author

No, no reason, I just started it as a standalone script to see how far I could get with it, and I like bash for rapid prototyping. I'm more familiar with wget and jq than their Perl equivalents these days.

I'll take that as a recommendation to rewrite in Perl...

@techman83
Copy link
Member

Yeah it's a full package install. You can use cpanm + lib::local to do it, instructions in the readme. I've gotta package it up with docker, that'd make deployments much easier! Also then you can grab the docker image instead for testing.

@techman83
Copy link
Member

I'd probably do something along the lines of the Status module if I were to reimplement it, but I'm guessing you were going for a quick win 🙂 (which is totally fair)

@HebaruSan
Copy link
Member Author

Ahh, the readme, of course!

@techman83
Copy link
Member

Hmm.

touch Changes

in the repo. DZIL is super heavy weight, but it makes release management a breeze. The install will take care of installing the files into bin, and I'm pretty sure Local::Lib sorts out pathing from memory.

@HebaruSan
Copy link
Member Author

Perl port completed! I'm going to go back and delete some of my comments about installing and errors and tests and so forth...

@coveralls
Copy link

coveralls commented Aug 24, 2018

Coverage Status

Coverage decreased (-1.3%) to 90.402% when pulling 50e0b11 on HebaruSan:feature/download-counts into 75e4c35 on KSP-CKAN:master.

@HebaruSan HebaruSan force-pushed the feature/download-counts branch 2 times, most recently from c43e091 to dafb2d7 Compare August 24, 2018 03:20
Copy link
Member

@techman83 techman83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really neat addition, I can't see anything to prevent merging. Awesome work!


method _build__http {
return HTTP::Tiny->new(timeout => 15);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have our own wrapped HTTP::Tiny, but considering the utility I'm not overly stressed.

@techman83 techman83 merged commit ab2a748 into KSP-CKAN:master Aug 24, 2018
@HebaruSan HebaruSan deleted the feature/download-counts branch August 24, 2018 05:17
@yalov
Copy link

yalov commented Aug 25, 2018

"all downloads counts" makes it less "popularity" and more "how often mod was updated, and how many bug-fixing releases there was".

May be count popularity as maximum of the downloads of the last 2 versions?

It does neutralize the above problem, the last-hour-release fall, and get KER back in the row,
but then some popular rarely updated mods will completely dominate (RSS, Ven's) —
the last Ven's Revamp 1.9.6 for 1.2.2 has 111k downloads on github

@HebaruSan
Copy link
Member Author

HebaruSan commented Aug 25, 2018

The plurality of mods are hosted on SpaceDock, which only provides one overall count value, sample:

https://spacedock.info/api/mod/124

@HebaruSan
Copy link
Member Author

HebaruSan commented Aug 26, 2018

@techman83, did this get deployed to the server? The file hasn't shown up yet, but the bot otherwise appears to be functioning normally, so if the new code is running then I have some debugging to do...

@techman83
Copy link
Member

Hasn't been deployed yet, I wasn't at home at all over the weekend. I'll take a look this arvo if I get a moment.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants