Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Purge downloads that failed to index from Netkan cache #2526
CKAN and Netkan each have a download cache, which is a folder on disk containing files with names like this:
The 8 hexadecimal digits are a portion of the hash of the origin URL. When we attempt to access any URL, we first calculate that hash and look for a matching file in the cache, and if found, we use that file instead of downloading it again.
Currently the Netkan bot can get "stuck" if there is a problem with a download. As an example, this just happened with @linuxgurugamer's NIMBY mod; the 18.104.22.168 version of that mod contained one file called
However, when the author corrected the download, the problem persisted. This was because the Netkan bot was not re-downloading the fixed file, but instead retrieving the broken file from its cache and re-processing it.
This is a recurring problem that periodically requires us to request @techman83 to delete specific files from the bot server. A more automated solution would be better.
#2337 is a related but different approach to this overall issue.
The cache object assumes that a successful
This will ensure that if a module fails to index, its download will be re-acquired on subsequent passes until it finally succeeds. In the NIMBY example, it would have prevented the download from being cached, so the fixed file would have been acquired and indexed.
Note that this does not fully solve the "stuck in bot's cache" problem. Specifically, if a problem occurs that does not prevent a module from being indexed, such as incorrect game version info, such a download will still persist and not be re-downloaded (unless it's from GitHub as per #2337). This pull request only helps in cases where some part of the Netkan process throws an exception.