Store actor IDs in an unordered map protected via RW lock instead of using a non-portable `__thread` variable. Though this is less efficient, the performance overhead is still negligible compared to the overall cost of enabling logging in the first place. Close #258; a more generic `thread_specific_ptr` implementation turned out have too much implementation overhead to be worth the effort.
Use `atomic<int>` instead of `int` for `local_actor::m_flags`. The atomic uses only relaxed memory ordering, since all flags that are allowed to be read by others never change after an actor has launched. This should produce the same compiler output as before---at least on x86 or any platform with atomic load/store for word sized memory regions---but suppresses false positives from analyser tools such as Thread Sanitizer. Close #255.
This has the advantage that it makes each sample independent and one can now easily "slice" profiler log files. For example, if the first half of the execution time is quite different from the second (due to caching or different user input), then one can now use the first half of the log file and the second half independently, which was not possible before because log entries were cumulative.