Purge downloads that failed to index from Netkan cache #2526

HebaruSan · 2018-09-27T23:35:14Z

Background

CKAN and Netkan each have a download cache, which is a folder on disk containing files with names like this:

C43B5474-BasicDeltaV-3.0.zip
A77B71AE-netkan-CraftManager.zip

The 8 hexadecimal digits are a portion of the hash of the origin URL. When we attempt to access any URL, we first calculate that hash and look for a matching file in the cache, and if found, we use that file instead of downloading it again.

Problem

Currently the Netkan bot can get "stuck" if there is a problem with a download. As an example, this just happened with @linuxgurugamer's NIMBY mod; the 1.1.1.1 version of that mod contained one file called NIMBY.version and another called just .version, which caused this error in Netkan and prevented that release from being indexed:

Too many .version files located: GameData/NIMBY/.version, GameData/NIMBY/NIMBY.version

However, when the author corrected the download, the problem persisted. This was because the Netkan bot was not re-downloading the fixed file, but instead retrieving the broken file from its cache and re-processing it.

This is a recurring problem that periodically requires us to request @techman83 to delete specific files from the bot server. A more automated solution would be better.

#2337 is a related but different approach to this overall issue.

Cause

The cache object assumes that a successful Store action should last forever, and the only clean-up actions that Netkan takes on failure to index is to print an error message. So once a file is downloaded, we'll never re-acquire that URL again, even if there's a fatal problem with the file.

Changes

Now Netkan's CachingHttpService keeps track of all the URLs you request from it during the current run
Now Netkan's outermost exception handler uses that list to purge the files it downloaded on failure

This will ensure that if a module fails to index, its download will be re-acquired on subsequent passes until it finally succeeds. In the NIMBY example, it would have prevented the download from being cached, so the fixed file would have been acquired and indexed.

Known limitations

Note that this does not fully solve the "stuck in bot's cache" problem. Specifically, if a problem occurs that does not prevent a module from being indexed, such as incorrect game version info, such a download will still persist and not be re-downloaded (unless it's from GitHub as per #2337). This pull request only helps in cases where some part of the Netkan process throws an exception.

Purge downloads that failed to index from Netkan cache

254cf8b

HebaruSan added Enhancement Pull request Netkan Issues affecting the netkan data labels Sep 27, 2018

politas merged commit 254cf8b into KSP-CKAN:master Oct 7, 2018

politas added a commit that referenced this pull request Oct 7, 2018

Merge #2526 Purge downloads that failed to index from Netkan cache

114a227

HebaruSan deleted the feature/netkan-purge-on-fail branch October 7, 2018 01:07

DasSkelett mentioned this pull request Jun 28, 2021

Might have to clear cache on RP-1 KSP-CKAN/NetKAN#8603

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Purge downloads that failed to index from Netkan cache #2526

Purge downloads that failed to index from Netkan cache #2526

HebaruSan commented Sep 27, 2018

Purge downloads that failed to index from Netkan cache #2526

Purge downloads that failed to index from Netkan cache #2526

Conversation

HebaruSan commented Sep 27, 2018

Background

Problem

Cause

Changes

Known limitations