Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix isolate options #913

Closed
wants to merge 2 commits into from
Closed

Fix isolate options #913

wants to merge 2 commits into from

Conversation

gollux
Copy link
Contributor

@gollux gollux commented Mar 24, 2018

I reviewed how current CMS calls isolate and there are several problems with it:

(1) When using cgroups (which is currently the CMS default), --cg-timing is not specified, so the time limit does not work properly.

(2) When not using cgroups, memory limit is still passed as --cg-mem, which does not work.

(3) We do not set --quota, so unless root set up FS quotas for all UIDs used by isolate, the disk space consumed by the solution is not limited at all.

This pull request solves (1) and (2). I would like to know you opinion how to specify the quota: should I add a setting in the config file for that? Or a more general config setting for adding arbitrary isolate options (--extra-time could be handy, too)?


This change is Reviewable

@codecov
Copy link

codecov bot commented Mar 24, 2018

Codecov Report

Merging #913 into master will increase coverage by <.01%.
The diff coverage is 75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #913      +/-   ##
==========================================
+ Coverage   61.08%   61.08%   +<.01%     
==========================================
  Files         203      203              
  Lines       16938    16940       +2     
==========================================
+ Hits        10346    10348       +2     
  Misses       6592     6592
Flag Coverage Δ
#functionaltests 47.14% <75%> (+0.05%) ⬆️
#unittests 37.43% <0%> (-0.02%) ⬇️
Impacted Files Coverage Δ
cms/grading/Sandbox.py 67.01% <75%> (+0.29%) ⬆️
cms/io/PsycoGevent.py 80.48% <0%> (-4.88%) ⬇️
cms/db/base.py 84.25% <0%> (-3.71%) ⬇️
cms/service/Worker.py 80% <0%> (-3.16%) ⬇️
cms/grading/Job.py 90.38% <0%> (-0.97%) ⬇️
cms/service/EvaluationService.py 68.97% <0%> (-0.77%) ⬇️
cms/service/ProxyService.py 70.2% <0%> (-0.51%) ⬇️
cms/server/admin/handlers/base.py 68.3% <0%> (+0.32%) ⬆️
cms/service/workerpool.py 61.57% <0%> (+0.49%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0591fe7...fff8912. Read the comment docs.

@stefano-maggiolo
Copy link
Member

Thanks for looking at the options! As you said, use_cgroup is always enabled so the second commit doesn't change the behaviour (still appreciated!). I'm curious about the first, since we have test that should pick up if timing is not checked appropriately. Is it only wrong when using multiple processes/threads? In which direction?

@gollux
Copy link
Contributor Author

gollux commented Mar 24, 2018 via email

@stefano-maggiolo
Copy link
Member

I cannot reproduce, can you help me understand (trying to figure out if this justify backporting and releasing a new version for 1.3).

With python & threading:

import sys
import threading
import time

def w():
    t = time.time()
    i = 0
    while time.time() - t < 1:
        i += 1

n = int(sys.stdin.readline().strip())
t = threading.Thread(target=w)
t.start()
t.join()
sys.stdout.write("correct %d\n" % n)
$ echo 1 | time python /tmp/batchstdio.py 
correct 1
1.04user 0.01system 0:01.06elapsed 99%CPU (0avgtext+0avgdata 9884maxresident)k

CMS: Execution timed out | (0.588 s) (0.633 s) (3 MiB)

With C & fork:

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>

int main() {
    int n;

    pid_t ret = fork();

    if (ret == 0) {
      int i = 0;
      struct timespec start, end;
      clock_gettime(CLOCK_MONOTONIC_RAW, &start);
      while(1) {
        i++;
        clock_gettime(CLOCK_MONOTONIC_RAW, &end);
        long long delta_us = (end.tv_sec - start.tv_sec) * 1000000 + (end.tv_nsec - start.tv_nsec) / 1000;
        if (delta_us > 1000000) {
          break;
        }
      }
      if (i < 10000) {
        printf("slow\n");
      }
    } else {
      if (ret != -1) {
        waitpid(ret, NULL, 0);
        scanf("%d", &n);
        printf("correct %d\n", n);
      }
    }

    return 0;
}
$ echo 1 | time /tmp/fork 
correct 1
0.11user 0.88system 0:01.00elapsed 99%CPU (0avgtext+0avgdata 2672maxresident)k
0inputs+0outputs (0major+168minor)pagefaults 0swaps

CMS: Execution timed out | 0 | (1.004 s) (1.055 s) (0 MiB)

(note these are not hitting wall time limit, which is 2).

@gollux
Copy link
Contributor Author

gollux commented Mar 25, 2018 via email

@stefano-maggiolo
Copy link
Member

So it's more a matter of detecting the TO early, rather than detecting it at all?

@gollux
Copy link
Contributor Author

gollux commented Mar 25, 2018 via email

@stefano-maggiolo
Copy link
Member

Thank you, merged in both and we'll release a 1.3.1 in O(days), possibly with the quota fix too.

Speaking of quota. We use -f to limit the size of each file written (default to 1GB), and we have strict control on what files are writable (at most 1 for Batch, 0 for Communication and TwoSteps). Isn't that enough?

@gollux
Copy link
Contributor Author

gollux commented Mar 26, 2018 via email

@stefano-maggiolo
Copy link
Member

Good idea on avoiding incompatible combinations of options.

For quota: I think we want a subdirectory of isolate's dir as fully writable by the program, in case (we write other stuff in the main dir, such as run.log and commands.log). I agree with the idea, the quota limit seems fairer and tiny bit safer than the current approach (but again, I'm far from being an expert).

In case I think we can have a reasonable default stored in an undocumented option similar to cgroup, so that it's easy to change in case of peculiar needs. I'm going to introduce some more of those "reasonable default" options soon to limit trusted executions, like the checker. (We should probably document those options in a "you shouldn't change those, but in case you need here they are" section.)

As you said, it seems less important for 1.3 though.

@stefano-maggiolo
Copy link
Member

I'm going to close this and create an issue to track quota.

gollux added a commit to ioi/isolate that referenced this pull request Mar 26, 2018
Using cgroups without --cg-timing makes little sense and it is too
easy to forget to ask for --cg-timing explicitly.

Whoever wishes to use the single-process time limit mechanism with CG,
he can make his wish explicit by --no-cg-timing.

For more background, see cms-dev/cms#913.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants