Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: master
Fetching contributors…

Cannot retrieve contributors at this time

312 lines (229 sloc) 13.802 kb
========================================
==== Atomic All-Nighters ====
==== Ben Blum - 15-799A - Fall 2011 ====
========================================
!!! PLEASE NOTE!!!
Before you get started on reading this, please make a note of the time. Section
5.2 will ask you for how long it took you to get up to speed on using the tool,
most of which should be reading this document.
==== 1. Introduction ====
Read the note above if you haven't already.
The "atomic sleep" problem happens in kernels when a piece of code that executes
with interrupts/preemption disabled ("in atomic context") calls something that
causes the scheduler to be invoked. For example, in Linux...
spin_lock(foo); mutex_lock(bar);
...spinlocks are "disable preemption on this CPU and spin-wait in case another
CPU has the lock", and mutexes are the higher-level descheduling primitive.
(When I say preemption, I mean preemption or interrupts, depending on the
kernel.)
This can cause all sorts of problems:
- If the preemption-disabling mechanism works by a global (percpu) counter, this
can cause the preemption state to be corrupted, especially if they get
automatically restored by going back to userspace, and on the next syscall the
kernel will think it is still off.
- Even without a counter, this can cause preemption disabling to be "leaked" -
i.e., switching to another thread, interrupts can stay off for arbitrarily
long.
- With SMP spinlocks, this can cause deadlock: thread 1 takes a spinlock, does
something which causes a reschedule to thread 2 on the same core, which tries
to take the same spinlock. That cpu will be stuck spinning forever.
The point of Atomic All-Nighters is to statically check kernel code to ensure
that it never sleeps in atomic sections.
==== 2.1. Contexts ====
We refer to the implicit state of preemption- or interrupt-disabling at a given
point in the code as the "Context".
data Context = Nested Int | Infinity
Many kernels need sometimes to do multiple successive things that each would
disable preemption, and so instead of explicitly toggling the preemption state,
maintain a counter that tracks the "preemption nesting depth" (undoing any one
of those things "tries" to reenable preemption, but actually just decrements the
counter, and only really turns on preemption when the counter hits zero). It is
easiest to think of this counter as representing "How many spin-locks are held".
"Nested <N>" means N spinlocks are held (or analogous), so it will take N calls
to spin_unlock (or analogous) before preemption is re-enabled. "Nested 0" means
preemption is already enabled. "Infinity" is a special-case used for checking
functions that promise that they will never sleep; i.e., they are okay to be
called with arbitrarily many spinlocks held.
==== 2.2. Rules ====
Each function (and function pointer) is to be annotated with a "Rule".
newtype Rule = Rule Context
The rule expresses the "most restrictive context" that the function is allowed
to be called in. To think of Atomic All-Nighters as a type system / typechecker,
the rule of each function is its type. We define a total ordering on
contexts/rules to express "restrictiveness", which serves as a subtyping
relation.
Infinity <= x = True
Nested x <= Nested y = (y <= x)
(Intuitively: Preemption being enabled is the least restrictive context.)
It is legal to call a function with rule "Rule C1" from code with context C2 iff
C1 is a subtype of C2.
It is legal to assign a function or function pointer with rule "Rule C1" to a
function pointer with rule "Rule C2" iff C1 is a subtype of C2 (special rules
apply if the function takes other function pointers as arguments, or if the
function pointer is inside a reference cell - see contravariant.c, covariant.c,
banana-bowl.c, etc).
==== 2.3. Effects ====
Each function (and function pointer) is also to be annotated with an "Effect".
data Effect = IncDec Int | Enable | Disable
"IncDec <X>" means to modify the preempt nesting depth by X (no effect on
Infinity). "Enable" means to set the preempt counter to zero no matter its
value (even Infinity). "Disable" means to set the preempt counter to Infinity
to matter its value.
To call a function with an effect changes the current context, as follows (with
"Just <result>" meaning the call is legal, and "Nothing" meaning illegal):
effect (Enable) _ = Just (Nested 0)
effect (Disable) _ = Just (Infinity)
effect (IncDec x) (Infinity) = Just (Infinity)
effect (IncDec x) (Nested n) =
if x+n < 0 then Nothing else Just (Nested (n+x))
It is legal to assign a function or function pointer to a function pointer iff
their effects are identical.
==== 3. Annotations ====
An annotation is a pair of a rule and an effect, and is given using the
__attribute__ keyword. The file aan.h defines several commonly-used aliases as
cpp macros.
MAY_SLEEP
The function might invoke the scheduler, or it might not. It has no
effect on the context ("Nested 0, IncDec 0"). Example: mutex_lock
WONT_SLEEP
The function is guaranteed not to, and is safe to call from atomic
context. It has no effect. ("Infinity, IncDec 0") Example: set_esp0
EXIT_AND_SLEEP
The function starts with preemption disabled once, and must enable it
once (and is allowed to sleep after that). ("Nested 1, IncDec -1")
Example: page_fault_handler
ENTER_ATOMIC
The function disables preemption, without regard for nesting.
("Infinity, Disable") Example: disable_interrupts
EXIT_ATOMIC
The function enables preemption, without regard for nesting.
("Infinity, Enable") Example: enable_interrupts
ENTER_ATOMIC_NESTED
The function effects another level of preemption disabling. ("Infinity,
IncDec 1") Example: spin_lock
EXIT_ATOMIC_NESTED
The function reverts a level of preemption disabling. ("Infinity, IncDec
-1") Example: spin_unlock
Because C's syntax allows for __attribute__ declarations in many places, there
is some flexibility with how you annotate functions and function pointers. The
recommended way, however, is that for functions, the annotation goes before the
return type (but after storage class specifiers), and for function pointers,
it goes after the argument list:
static MAY_SLEEP struct tcb *lookup_by_tid(int tid);
typedef void (*swexn_handler_t)(void *arg, ureg_t *ureg) MAY_SLEEP;
==== 4. Using the tool ====
To build, just invoke "make"; you will need GHC, and Language.C version
0.3.something (I use 0.3.2.1). On the GHC machines, you can run
"./get_language_c.sh" and hopefully then a make will work.
For each file you wish to check, run: ./bin/aan [include paths] path/to/file.c
"list_includes.sh" should be used, from the directory with 410kern in it, to
generate the include paths.
Annotated versions of 410kern/, with aan.h inside, and spec/, are provided. You
have to annotate the contents of kern/.
No static analysis is perfect, so even with correct annotations there may be
either spurious warnings or false positives. Errors are generally correct.
You will have to manually inspect the code for many cases, to determine either
that a message is spurious, or that there is a real bug, or that your
annotations were wrong.
Meaning of messages:
- Informational/tracing messages. These are off by default.
- C typechecking warnings. If the code would confuse GCC, it will confuse AAN.
- AAN warnings. These indicate "dubious" patterns to do with annotations, and
mean there might be a bug that we didn't notice.
- "merging flow" means the branches of an if/switch ended up with different
contexts. AAN chooses the "most restrictive" context at the end.
- "backward jump": AAN will not do multi-pass, so something that it said was
legal on its first pass might become illegal upon this backwards edge in
the control flow graph.
- "missing annotation"/"invalid annotation": should be pretty clear.
- "variadic args contain arrows": passing function pointers as variadic
arguments necessarily loses their type information, like casting to void*.
- Errors. This means either AAN thinks it found an atomic-sleep bug or that your
annotation was incorrect.
- "unmergeable annotations": in a ternary statement that returns function
pointers, the annotations for each are incompatible.
- "illegal subtyped function pointer assignment": attempt to assign e.g. a
MAY_SLEEP function to a WONT_SLEEP pointer.
- "illegal invariant function pointer assignment": in reference cells
(mutability), even subtyping is illegal.
- "illegal function call": attempt to call e.g. a MAY_SLEEP function in
atomic context.
- "exit context != advertised effect": The professed effect of the function
being checked did not match the changed context that came out at the end.
- "not all exit contexts match each other": should be pretty clear.
- "illegal context effect": Attempt to decrement the preempt counter when
it's already zero.
There are certain idioms and tricks you have to watch out for.
- Casting data to and from void*, or even int, is pretty common. If the thing
being casted has function pointers anywhere in it (it is one, it is a struct
with one, it is an array of them...), the tool will emit a warning, because
this might let the code bypass the soundness of the annotation system. More
frequently, this is innocent behaviour, so warnings for void* casts are off by
default. You can turn them on with --warn-void-casts, and manually inspect
each one, for completeness.
- Some functions are "magic" in terms of their annotation; i.e., they are the
functions that implement the rules/effects themselves. It's quite likely the
tool will error on these; if I had more time, I would have a "MAGIC"
annotation which would make the tool ignore them, but I don't, so I don't.
- Some functions have conditional behaviour. This may require annotating them
in a way that will cause errors on their implementation but will accurately
reflect their use case (for example, if mutex_unlock checks if the scheduler
is locked, and only reschedules if it isn't, mark it WONT_SLEEP rather than
MAY_SLEEP). Or, it may require refactoring some of the code itself, to make
function boundaries you can actually annotate ("outb", when used to ack the
current interrupt, should be EXIT_ATOMIC_NESTED, but otherwise not - consider
wrapping that particular use case in a "magic" "acknowledge_interrupt"
function).
- Interrupt handlers should in most cases be marked either WONT_SLEEP or
EXIT_ATOMIC_NESTED - while rescheduling is sometimes appropriate in them,
doing mutex_lock in them is wrong, and that's what you're trying to catch.
Even though the context switcher itself is the essence of rescheduling, in
some kernels it may be wrong to mark it MAY_SLEEP, e.g. if it has to be called
with interrupts already off, if it gets called from the timer handler, etc.
It's the functions which indirectly call it (mutex, cond, sem, yield) which
you care about.
==== 5.1 World Map ====
Download http://maximegalon.andrew.cmu.edu/worldmap.zip - its password is the
same password as for ##410ta.
The following student kernels are provided:
- group-04 (F11). Easy mode. Bugs all over the place.
- group-19 (F11). Try syscalls.c (you should be able to get it to print no
messages at all), and keyboard_driver.c (you should find a bug or two).
- group-15 (F10). One of the inspiring examples for this project. Try
scheduler.c.
- group-14 (F11). Hard mode! You should verify that there are no bugs, but may
need to restructure some of the interfaces to properly annotate them.
I advise attempting to annotate each student kernel in two phases: start by
annotating all functions in all header files, and then pick a .c file and
annotate the functions in that and check just that file until you are content
with it.
==== 5.2 Feedback ====
Please note these things, and anything else of interest, while using AAN:
- How long did it take for you to get up to speed on the tool? (How long did it
it take you to read this guide, to understand the annotation language and
rules, etc.)
- How long did it take you to annotate all the *header files* of the student
kernel?
- Once the headers were annotated, how long did it take for you to be "content"
with AAN's output on each successive .c file that you tested? (i.e., how much
time did you spend manually inspecting, and patching up or ignoring the false
positives?)
- How long did it take you to be "content" with AAN's output on every .c file in
the kernel?
- What "strange coding idioms" did you encounter, that you either needed a
special annotation for, or had to refactor across function boundaries to make
sense of?
==== 6. Bugs ====
The implementation of AAN is not perfect. Its known deficiencies are:
- memcpy() to copy/paste struct contents around may break the system. This can
be caught at least by the void* cast warning.
- We currently assume the body of each loop will run exactly once. A loop body
that goes, "mutex_lock(); spin_lock();" and possibly goes around again, is
clearly wrong, but will not be caught. The fix for this is trivial, but not
implemented.
- It would be nice to have a "MAGIC" annotation that indicates a function is
right for sure.
- It does not handle "fall-through" cases in switch statements. (It assumes each
case can only be entered from the top of the switch. "case 1: spin_lock();
case 2: mutex_lock(); spin_lock(); break;" will foil AAN.)
# vim: tw=80 ts=4 expandtab
Jump to Line
Something went wrong with that request. Please try again.