Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Scheduler failing CKAN-meta checkout #279

Closed
HebaruSan opened this issue Jan 29, 2023 · 6 comments · Fixed by #280
Closed

[Bug] Scheduler failing CKAN-meta checkout #279

HebaruSan opened this issue Jan 29, 2023 · 6 comments · Fixed by #280
Assignees
Labels
Bug Something isn't working Environment It's about the dev/prod environment Mirrorer Uploads mods to archive.org Scheduler Adds netkans to the queue to be inflated

Comments

@HebaruSan
Copy link
Member

HebaruSan commented Jan 29, 2023

Problem

The Scheduler has been printing this to Discord every 30 minutes starting about 7 hour ago (which means the bot is no longer running):

Uncaught exception:
Traceback (most recent call last):
  File ".local/bin/netkan", line 8, in <module>
    sys.exit(netkan())
  File "/home/netkan/.local/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/netkan/.local/lib/python3.7/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/netkan/.local/lib/python3.7/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/netkan/.local/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/netkan/.local/lib/python3.7/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/netkan/.local/lib/python3.7/site-packages/click/decorators.py", line 84, in new_func
    return ctx.invoke(f, obj, *args, **kwargs)
  File "/home/netkan/.local/lib/python3.7/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/netkan/.local/lib/python3.7/site-packages/netkan/cli/services.py", line 60, in scheduler
    common.netkan_repo, common.ckanmeta_repo, common.queue, common.token,
  File "/home/netkan/.local/lib/python3.7/site-packages/netkan/cli/common.py", line 68, in __getattribute__
    attr = super().__getattribute__(name)
  File "/home/netkan/.local/lib/python3.7/site-packages/netkan/cli/common.py", line 101, in ckanmeta_repo
    init_repo(self._ckanmeta_remote, '/tmp/CKAN-meta', self.deep_clone))
  File "/home/netkan/.local/lib/python3.7/site-packages/netkan/utils.py", line 14, in init_repo
    Repo.clone_from(metadata, clone_path, depth=1))
  File "/home/netkan/.local/lib/python3.7/site-packages/git/repo/base.py", line 1275, in clone_from
    return cls._clone(git, url, to_path, GitCmdObjectDB, progress, multi_options, **kwargs)
  File "/home/netkan/.local/lib/python3.7/site-packages/git/repo/base.py", line 1194, in _clone
    finalize_process(proc, stderr=stderr)
  File "/home/netkan/.local/lib/python3.7/site-packages/git/util.py", line 419, in finalize_process
    proc.wait(**kwargs)
  File "/home/netkan/.local/lib/python3.7/site-packages/git/cmd.py", line 559, in wait
    raise GitCommandError(remove_password_if_present(self.args), status, errstr)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git clone -v --depth=1 git@github.com:KSP-CKAN/CKAN-meta.git /tmp/CKAN-meta
  stderr: 'Cloning into '/tmp/CKAN-meta'...
Updating files:  35% (9553/26646)
Updating files:  36% (9593/26646)
Updating files:  37% (9860/26646)
Updating files:  38% (10126/26646)
Updating files:  39% (10392/26646)
Updating files:  40% (10659/26646)
Updating files:  41% (10925/26646)
Updating files:  42% (11192/26646)
Updating files:  43% (11458/26646)
Updating files:  43% (11527/26646)
Updating files:  44% (11725/26646)
Updating files:  45% (11991/26646)
Updating files:  45% (11994/26646)
Updating files:  45% (12157/26646)
Updating files:  46% (12258/26646)
Updating files:  46% (12440/26646)
Updating files:  47% (12524/26646)
Updating files:  47% (12542/26646)
Updating files:  48% (12791/26646)
Updating files:  49% (13057/26646)
Updating files:  49% (13077/26646)
Updating files:  50% (13323/26646)
Updating files:  51% (13590/26646)
Updating files:  51% (13659/26646)
Updating files:  51% (13768/26646)
Updating files:  52% (13856/26646)
Updating files:  52% (14122/26646)
Updating files:  53% (14123/26646)
Updating files:  54% (14389/26646)
Updating files:  55% (14656/26646)
Updating files:  55% (14696/26646)
Updating files:  56% (14922/26646)
Updating files:  56% (14929/26646)
Updating files:  56% (15026/26646)
Updating files:  57% (15189/26646)
Updating files:  57% (15230/26646)
Updating files:  58% (15455/26646)
Updating files:  59% (15722/26646)
Updating files:  60% (15988/26646)
Updating files:  60% (16206/26646)
Updating files:  61% (16255/26646)
Updating files:  61% (16328/26646)
Updating files:  62% (16521/26646)
Updating files:  63% (16787/26646)
Updating files:  63% (16886/26646)
Updating files:  64% (17054/26646)
Updating files:  64% (17192/26646)
Updating files:  64% (17218/26646)
Updating files:  65% (17320/26646)
Updating files:  65% (17341/26646)
Updating files:  65% (17352/26646)
Updating files:  66% (17587/26646)
Updating files:  66% (17612/26646)
Updating files:  66% (17618/26646)
Updating files:  67% (17853/26646)
Updating files:  67% (17944/26646)
Updating files:  67% (17991/26646)
Updating files:  67% (18009/26646)
Updating files:  68% (18120/26646)
Updating files:  68% (18173/26646)
Updating files:  69% (18386/26646)
Updating files:  69% (18456/26646)
Updating files:  69% (18537/26646)
Updating files:  70% (18653/26646)
Updating files:  70% (18697/26646)
Updating files:  70% (18779/26646)
Updating files:  70% (18871/26646)
Updating files:  71% (18919/26646)
Updating files:  71% (19106/26646)
Updating files:  72% (19186/26646)
Updating files:  72% (19191/26646)
Updating files:  72% (19218/26646)
Updating files:  72% (19264/26646)
Updating files:  72% (19279/26646)
Updating files:  72% (19301/26646)
Updating files:  72% (19340/26646)
Updating files:  72% (19370/26646)
Updating files:  72% (19410/26646)
Updating files:  72% (19443/26646)
Updating files:  73% (19452/26646)
Updating files:  73% (19483/26646)
Updating files:  73% (19502/26646)
Updating files:  73% (19508/26646)
Updating files:  73% (19534/26646)
Updating files:  73% (19569/26646)
Updating files:  73% (19591/26646)
Updating files:  73% (19614/26646)
Updating files:  73% (19632/26646)
Updating files:  73% (19646/26646)
Updating files:  73% (19672/26646)
Updating files:  74% (19719/26646)
Updating files:  74% (19816/26646)
Updating files:  74% (19942/26646)
Updating files:  75% (19985/26646)
Updating files:  75% (20021/26646)
Updating files:  75% (20046/26646)
Updating files:  75% (20058/26646)
Updating files:  75% (20114/26646)
Updating files:  75% (20129/26646)
Updating files:  75% (20165/26646)
Updating files:  75% (20248/26646)
Updating files:  76% (20251/26646)
Updating files:  76% (20271/26646)
Updating files:  76% (20308/26646)
Updating files:  76% (20325/26646)
Updating files:  76% (20353/26646)
Updating files:  76% (20386/26646)
Updating files:  76% (20481/26646)
Updating files:  77% (20518/26646)
Updating files:  77% (20530/26646)
Updating files:  77% (20553/26646)
Updating files:  77% (20576/26646)
Updating files:  77% (20633/26646)
Updating files:  77% (20676/26646)
Updating files:  77% (20766/26646)
Updating files:  78% (20784/26646)
Updating files:  78% (20811/26646)
Updating files:  78% (20820/26646)
Updating files:  78% (20834/26646)
Updating files:  78% (20856/26646)
Updating files:  78% (20871/26646)
Updating files:  78% (20904/26646)
Updating files:  78% (20921/26646)
Updating files:  78% (20950/26646)
Updating files:  78% (21013/26646)
Updating files:  78% (21029/26646)
Updating files:  79% (21051/26646)
Updating files:  79% (21061/26646)
Updating files:  79% (21070/26646)
Updating files:  79% (21085/26646)
Updating files:  79% (21094/26646)
Updating files:  79% (21109/26646)
Updating files:  79% (21128/26646)
Updating files:  79% (21147/26646)
Updating files:  79% (21189/26646)
Updating files:  79% (21205/26646)
Updating files:  79% (21238/26646)
Updating files:  79% (21254/26646)
Updating files:  79% (21278/26646)
Updating files:  80% (21317/26646)
Updating files:  80% (21326/26646)
Updating files:  80% (21350/26646)
Updating files:  80% (21397/26646)
Updating files:  80% (21405/26646)
Updating files:  80% (21441/26646)
Updating files:  80% (21451/26646)
Updating files:  80% (21472/26646)
Updating files:  80% (21532/26646)
Updating files:  81% (21584/26646)
Updating files:  81% (21588/26646)
Updating files:  81% (21661/26646)
Updating files:  81% (21691/26646)
Updating files:  81% (21723/26646)
Updating files:  81% (21748/26646)
Updating files:  81% (21766/26646)
Updating files:  81% (21788/26646)
Updating files:  81% (21820/26646)
Updating files:  81% (21838/26646)
Updating files:  82% (21850/26646)
Updating files:  82% (21863/26646)
Updating files:  82% (21877/26646)
Updating files:  82% (21890/26646)
Updating files:  82% (21924/26646)
Updating files:  82% (21945/26646)
Updating files:  82% (22047/26646)
Updating files:  82% (22071/26646)
Updating files:  82% (22104/26646)
Updating files:  83% (22117/26646)
Updating files:  83% (22171/26646)
Updating files:  83% (22198/26646)
Updating files:  83% (22242/26646)
Updating files:  83% (22268/26646)
Updating files:  83% (22287/26646)
Updating files:  83% (22327/26646)
Updating files:  83% (22361/26646)
Updating files:  84% (22383/26646)
Updating files:  84% (22409/26646)
Updating files:  84% (22447/26646)
Updating files:  84% (22498/26646)
Updating files:  84% (22542/26646)
Updating files:  84% (22560/26646)
Updating files:  84% (22586/26646)
Updating files:  84% (22623/26646)
Updating files:  84% (22648/26646)
Updating files:  85% (22650/26646)
Updating files:  85% (22682/26646)
Updating files:  85% (22692/26646)
Updating files:  85% (22719/26646)
Updating files:  85% (22729/26646)
Updating files:  85% (22738/26646)
Updating files:  85% (22785/26646)
Updating files:  85% (22803/26646)
Updating files:  85% (22815/26646)
Updating files:  85% (22827/26646)
Updating files:  85% (22845/26646)
Updating files:  85% (22853/26646)
Updating files:  85% (22871/26646)
Updating files:  85% (22881/26646)
Updating files:  85% (22915/26646)
Updating files:  86% (22916/26646)
Updating files:  86% (22961/26646)
Updating files:  86% (22974/26646)
Updating files:  86% (23008/26646)
Updating files:  86% (23021/26646)
Updating files:  86% (23044/26646)
Updating files:  86% (23080/26646)
Updating files:  86% (23125/26646)
Updating files:  86% (23133/26646)
Updating files:  86% (23161/26646)
Updating files:  86% (23173/26646)
Updating files:  87% (23183/26646)
Updating files:  87% (23184/26646)
Updating files:  87% (23229/26646)
Updating files:  87% (23244/26646)
Updating files:  87% (23258/26646)
Updating files:  87% (23278/26646)
Updating files:  87% (23299/26646)
Updating files:  87% (23320/26646)
Updating files:  87% (23365/26646)
Updating files:  87% (23389/26646)
Updating files:  87% (23403/26646)
Updating files:  87% (23446/26646)
Updating files:  88% (23449/26646)
Updating files:  88% (23472/26646)
Updating files:  88% (23500/26646)
Updating files:  88% (23517/26646)
Updating files:  88% (23529/26646)
Updating files:  88% (23557/26646)
Updating files:  88% (23572/26646)
Updating files:  88% (23603/26646)
Updating files:  88% (23627/26646)
Updating files:  88% (23645/26646)
Updating files:  88% (23654/26646)
Updating files:  88% (23680/26646)
Updating files:  88% (23692/26646)
Updating files:  88% (23708/26646)
Updating files:  89% (23715/26646)
Updating files:  89% (23760/26646)
Updating files:  89% (23773/26646)
Updating files:  89% (23792/26646)
Updating files:  89% (23840/26646)
Updating files:  89% (23896/26646)
Updating files:  89% (23944/26646)
Updating files:  90% (23982/26646)
Updating files:  90% (23988/26646)
Updating files:  90% (24041/26646)
Updating files:  90% (24060/26646)
Updating files:  90% (24078/26646)
Updating files:  90% (24099/26646)
Updating files:  90% (24134/26646)
Updating files:  90% (24173/26646)
Updating files:  90% (24209/26646)
Updating files:  90% (24229/26646)
Updating files:  91% (24248/26646)
Updating files:  91% (24256/26646)
Updating files:  91% (24309/26646)
Updating files:  91% (24329/26646)
Updating files:  91% (24338/26646)
Updating files:  91% (24365/26646)
Updating files:  91% (24390/26646)
Updating files:  91% (24425/26646)
Updating files:  91% (24444/26646)
Updating files:  91% (24463/26646)
Updating files:  91% (24477/26646)
Updating files:  91% (24487/26646)
Updating files:  92% (24515/26646)
Updating files:  92% (24535/26646)
Updating files:  92% (24581/26646)
Updating files:  92% (24606/26646)
Updating files:  92% (24630/26646)
Updating files:  92% (24662/26646)
Updating files:  92% (24710/26646)
Updating files:  92% (24759/26646)
Updating files:  93% (24781/26646)
Updating files:  93% (24784/26646)
Updating files:  93% (24842/26646)
Updating files:  93% (24858/26646)
Updating files:  93% (24890/26646)
Updating files:  93% (24900/26646)
Updating files:  93% (24923/26646)
Updating files:  93% (24947/26646)
Updating files:  93% (24997/26646)
Updating files:  93% (25016/26646)
Updating files:  93% (25032/26646)
Updating files:  94% (25048/26646)
Updating files:  94% (25092/26646)
Updating files:  94% (25141/26646)
Updating files:  94% (25250/26646)
Updating files:  94% (25289/26646)
Updating files:  95% (25314/26646)
Updating files:  95% (25326/26646)
Updating files:  95% (25360/26646)
Updating files:  95% (25401/26646)
Updating files:  95% (25406/26646)
Updating files:  95% (25443/26646)
Updating files:  95% (25493/26646)
Updating files:  95% (25504/26646)
Updating files:  95% (25519/26646)
Updating files:  95% (25538/26646)
Updating files:  95% (25552/26646)
Updating files:  95% (25565/26646)
Updating files:  96% (25581/26646)
Updating files:  96% (25585/26646)
Updating files:  96% (25602/26646)
Updating files:  96% (25622/26646)
Updating files:  96% (25685/26646)
Updating files:  96% (25713/26646)
Updating files:  96% (25759/26646)
Updating files:  96% (25785/26646)
Updating files:  96% (25816/26646)
Updating files:  97% (25847/26646)
Updating files:  97% (25849/26646)
Updating files:  97% (25868/26646)
Updating files:  97% (25901/26646)
Updating files:  97% (25955/26646)
Updating files:  97% (25989/26646)
Updating files:  97% (26022/26646)
Updating files:  98% (26114/26646)
Updating files:  98% (26115/26646)
Updating files:  98% (26173/26646)
Updating files:  98% (26208/26646)
Updating files:  98% (26224/26646)
Updating files:  98% (26254/26646)
Updating files:  98% (26273/26646)
Updating files:  98% (26290/26646)
Updating files:  98% (26320/2664
Updating files:  98% (26336/26646)
Updating files:  98% (26351/26646)
Updating files:  98% (26374/26646)
Updating files:  99% (26380/26646)
Updating files:  99% (26449/26646)
Updating files:  99% (26475/26646)
Updating files:  99% (26552/26646)
Updating files:  99% (26590/26646)
Updating files: 100% (26646/26646)
Updating files: 100% (26646/26646), done.
fatal: unable to write new index file
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

'

Maybe it's running out of disk space?

@HebaruSan HebaruSan added Bug Something isn't working Scheduler Adds netkans to the queue to be inflated Environment It's about the dev/prod environment labels Jan 29, 2023
@HebaruSan
Copy link
Member Author

HebaruSan commented Jan 29, 2023

Is this where the disk size is set?

VolumeSize='100',

du on my local disk indicates that CKAN-meta is larger than 100 MB. CKAN-meta:

$ du -sh
217M    .

NetKAN:

$ du -sh
126M    .

@HebaruSan
Copy link
Member Author

Hmm, looks like that would be 100 GB, not MB:

https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_EbsBlockDevice.html

VolumeSize (request), volumeSize (response)
The size of the volume, in GiBs. You must specify either a snapshot ID or a volume size. If you specify a snapshot, the default is the snapshot size. You can specify a volume size that is equal to or larger than the snapshot size.

@HebaruSan
Copy link
Member Author

It has also been skipping some runs.

Run skipped, below volume credit target percentage (Current Avg: 0)

@HebaruSan
Copy link
Member Author

The Mirrorer just came up with a new one:

Uncaught exception:
Traceback (most recent call last):
  File ".local/bin/netkan", line 8, in <module>
    sys.exit(netkan())
  File "/home/netkan/.local/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/netkan/.local/lib/python3.7/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/netkan/.local/lib/python3.7/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/netkan/.local/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/netkan/.local/lib/python3.7/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/netkan/.local/lib/python3.7/site-packages/click/decorators.py", line 84, in new_func
    return ctx.invoke(f, obj, *args, **kwargs)
  File "/home/netkan/.local/lib/python3.7/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/netkan/.local/lib/python3.7/site-packages/netkan/cli/services.py", line 76, in mirrorer
    ).process_queue(common.queue, common.timeout)
  File "/home/netkan/.local/lib/python3.7/site-packages/netkan/mirrorer.py", line 321, in process_queue
    if self.try_mirror(CkanMirror(self.ia_collection, path)):
  File "/home/netkan/.local/lib/python3.7/site-packages/netkan/mirrorer.py", line 353, in try_mirror
    download_stream_to_file(source_url, tmp)
  File "/home/netkan/.local/lib/python3.7/site-packages/netkan/common.py", line 54, in download_stream_to_file
    dest_file.write(chunk)
  File "/usr/local/lib/python3.7/tempfile.py", line 481, in func_wrapper
    return func(*args, **kwargs)
OSError: [Errno 12] Cannot allocate memory

@HebaruSan HebaruSan added the Mirrorer Uploads mods to archive.org label Jan 29, 2023
@HebaruSan
Copy link
Member Author

HebaruSan commented Jan 29, 2023

Tried merging all the pending CKAN-meta PRs to see if that reduces the needed disk space.

After a local git pull and git gc:

$ du -sh NetKAN CKAN-meta
56M     NetKAN
209M    CKAN-meta

So maybe we're using some kind of default disk volume 256 MB in size?

... It's still failing, same error.

@HebaruSan
Copy link
Member Author

HebaruSan commented Jan 29, 2023

Run skipped, below volume credit target percentage (Current Avg: 0)

Hmm, these seem to have happened on passes when we didn't get the error in the OP. Do we clone the repos before we check the credits? It would make more sense to check the credits first, since that check can stop the whole run.

... confirmed, the common.*_repo calls clone the repos, before can_schedule is called:

def scheduler(common: SharedArgs, max_queued: int, group: str, min_cpu: int, min_io: int) -> None:
sched = NetkanScheduler(
common.netkan_repo, common.ckanmeta_repo, common.queue, common.token,
nonhooks_group=(group in ('all', 'nonhooks')),
webhooks_group=(group in ('all', 'webhooks')),
)
if sched.can_schedule(max_queued, common.dev, min_cpu, min_io):
sched.schedule_all_netkans()
logging.info("NetKANs submitted to %s", common.queue)

Cleaning that up might allow faster recovery of credits, since we'd be doing ~260 MB less network activity per aborted run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Environment It's about the dev/prod environment Mirrorer Uploads mods to archive.org Scheduler Adds netkans to the queue to be inflated
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants