Indexing appears to have come to a full stop #13
Comments
Hoping that KSP-CKAN/CKAN-meta#564 and KSP-CKAN/CKAN-meta#565 won't interefere with anything later, just fixing some mods that where requested |
That's weird, the bot literally just runs netkan.exe on all the netkans and outputs the results to the CKAN-meta repo, then pushes changes. Unfortunately it doesn't captuer the errors just yet, though there are code improvements in the works that will make this process more robust (and better logging when things don't work for some reason). |
Oh I see what's happening! Process 23329 timed out! at ./bin/netkan-indexer line 14.
Sending TERM to 23329 at ./bin/netkan-indexer line 14. I don't know why or what, but it appears something is causing the inflation to take a long time. So it's likely that what you are seeing is that the process runs in alphabetical order and anything past a certain netkan isn't being run. This is what caused the NetKarmgeddon of Saturday morning. So it's not really a bug in the indexer, rather the thing designed to prevent a re-occurrence of NetKarmgeddon. After this run I'll ponder a way to figure out how best to diagnose what is going on. I think I'll also add some time stamping warns to let us know when something takes longer than X to inflate. I wonder Time::Limit can be lexical scoped, so we can limit how long an individual inflation takes - though I'm hesitant due to mods with large downloads never getting inflated. |
This sounds like Time::Out, which I've not used myself, but definitely sounds like a great idea as our most likely situation for timeouts is one mod that's causing problems. (Time::Limit may not scale the way we want if we start processing tens of thousands of mods, for example.) In a super ideal world we'd have worker processes that handle mods in parallel. :) |
See #12 :D I've hacked in some debugging and disabled the cron job for now, I'll leave it running and see what is the hold up. [Tue Jun 2 02:36:55 2015] bin/netkan-indexer:29172 (DEBUG) Downloading metadata for AnimatedDecouplers-x86...
[Tue Jun 2 02:36:59 2015] bin/netkan-indexer:29172 (DEBUG) NetKAN/AnimatedDecouplers-x86.netkan took 4 seconds to inflate |
So we were averaging 1 second per metadata inflation, since friday it looks like were often > 5 seconds. With the number of Netkans and the allowed time (3000 seconds) - we're simply taking too long inflating metadata to complete all updates time for the indexer to finish. I've no idea why this is suddenly an issue, but I think #2 would go a long way to making this better. |
Out put of the debug run that was chopped at the 50 minute mark, only got up to SETI. |
Found the problem, we've used up all our initial CPU burst credits, so our instance is being throttled to baseline performance levels. I think #2 will alleviate that significantly. For now we can set it to run 2 hourly and that will at least allow things to start working again. |
We can also redeploy onto an m3 or c4 instance, which would avoid the CPU throttling, and give us more grunt in general. In theory that should just be shutdown instance, change instance type, restart instance. (In practice it may be different.) I suspect giving |
Yeah, throwing grunt at it would also solve the problem. Though I'm working through the improvements to the bot, so should hopefully have something for review that implements what we currently have with a bit more sanity. Taking a list would be great! I'd suggest there would be some working in getting the exception handling easy to capture though. |
Changing the scheduling to run every 3 hours has contained the issue. #2 will hopefully allow us to index much more frequently than that (I'm hoping every ~15 minutes). |
As of yesterday (some 30 hours prior to this post) it seems the Indexing robot is no longer inflating .netkans to create new metadata which means some mods are now not updated in CKAN-meta. An example is InterstellarFuelSwitch which is now a version behind in CKAN.
I'm not sure about the distinctions between the bots but it seems that what we on irc see as "netkan-bot" is still inflating newly added .netkan files and pushing to CKAN-meta as can be seen e.g. here and originating from here. The entity known on irc as "NetKAN inflator Robot" though seems to not have done much (if anything) since KSP-CKAN/CKAN-meta@6809cc3
The text was updated successfully, but these errors were encountered: