Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Purge downloads that failed to index from Netkan cache #2526

Merged
merged 1 commit into from Oct 7, 2018

Conversation

HebaruSan
Copy link
Member

Background

CKAN and Netkan each have a download cache, which is a folder on disk containing files with names like this:

  • C43B5474-BasicDeltaV-3.0.zip
  • A77B71AE-netkan-CraftManager.zip

The 8 hexadecimal digits are a portion of the hash of the origin URL. When we attempt to access any URL, we first calculate that hash and look for a matching file in the cache, and if found, we use that file instead of downloading it again.

Problem

Currently the Netkan bot can get "stuck" if there is a problem with a download. As an example, this just happened with @linuxgurugamer's NIMBY mod; the 1.1.1.1 version of that mod contained one file called NIMBY.version and another called just .version, which caused this error in Netkan and prevented that release from being indexed:

Too many .version files located: GameData/NIMBY/.version, GameData/NIMBY/NIMBY.version

However, when the author corrected the download, the problem persisted. This was because the Netkan bot was not re-downloading the fixed file, but instead retrieving the broken file from its cache and re-processing it.

This is a recurring problem that periodically requires us to request @techman83 to delete specific files from the bot server. A more automated solution would be better.

#2337 is a related but different approach to this overall issue.

Cause

The cache object assumes that a successful Store action should last forever, and the only clean-up actions that Netkan takes on failure to index is to print an error message. So once a file is downloaded, we'll never re-acquire that URL again, even if there's a fatal problem with the file.

Changes

  • Now Netkan's CachingHttpService keeps track of all the URLs you request from it during the current run
  • Now Netkan's outermost exception handler uses that list to purge the files it downloaded on failure

This will ensure that if a module fails to index, its download will be re-acquired on subsequent passes until it finally succeeds. In the NIMBY example, it would have prevented the download from being cached, so the fixed file would have been acquired and indexed.

Known limitations

Note that this does not fully solve the "stuck in bot's cache" problem. Specifically, if a problem occurs that does not prevent a module from being indexed, such as incorrect game version info, such a download will still persist and not be re-downloaded (unless it's from GitHub as per #2337). This pull request only helps in cases where some part of the Netkan process throws an exception.

@HebaruSan HebaruSan added Enhancement Pull request Netkan Issues affecting the netkan data labels Sep 27, 2018
@politas politas merged commit 254cf8b into KSP-CKAN:master Oct 7, 2018
@HebaruSan HebaruSan deleted the feature/netkan-purge-on-fail branch October 7, 2018 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Netkan Issues affecting the netkan data Pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants