Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: CoreTiming Cleanup (Add UnitTests) #4168

Merged
merged 9 commits into from Sep 3, 2016

Conversation

EmptyChaos
Copy link
Contributor

@EmptyChaos EmptyChaos commented Sep 1, 2016

I've been poking around in the core and noticed that CoreTiming could use some tidying.

Changes:

  • Convert camel case to snake case, add s_ prefix to statics
  • Replace LinkedListItem with a slightly modified std::priority_queue
  • Made it more sensitive to overwriting existing callbacks
  • Change the event key from int to EventType* (type safety)
  • Added some UnitTests for CoreTiming.

I haven't measured any significant performance differences.


This change is Reviewable

@EmptyChaos
Copy link
Contributor Author

@phire Is CoreTiming supposed to work like this? There's no comments explaining the behavior.

void Test()
{
   CoreTiming::Init()
   auto key = CoreTiming::RegisterEvent("test", TestCallback);
   CoreTiming::ScheduleEvent(1000, key, 0);
   PowerPC::ppcState.downcount = 0;
   CoreTiming::Advance();
}

void TestCallback(u64 userdata, s64 late)
{
   _assert_(late == 19000);  // <-- This is always true
}

This seems wrong but I don't know if there's a reason for it or not.

@phire
Copy link
Member

phire commented Sep 1, 2016

Not anymore. However you don't get the correct result until
globalTimerIsSane is set to false, and CoreTiming::Init() sets it to true.

I think Init is making the assumption that Advance will be called before
any jited code run.

On Sep 1, 2016 5:21 PM, "EmptyChaos" notifications@github.com wrote:

@phire https://github.com/phire Is CoreTiming supposed to work like
this? There's no comments explaining the behavior.

void Test()
{
CoreTiming::Init()
auto key = CoreTiming::RegisterEvent("test", TestCallback);
CoreTiming::ScheduleEvent(1000, key, 0);
PowerPC::ppcState.downcount = 0;
CoreTiming::Advance();
}
void TestCallback(u64 userdata, s64 late)
{
assert(late == 19000); // <-- This is always true
}

This seems wrong but I don't know if there's a reason for it or not.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#4168 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAIc9ABlmm4w8w_3VHljjc2TiIFdLoamks5qlmDLgaJpZM4JyUzs
.

@lioncash
Copy link
Member

lioncash commented Sep 1, 2016

Source/Core/Core/CoreTiming.cpp, line 57 [r1] (raw file):

// NOTE: 'c' is the underlying container (vector), 'comp' is the comparator
class EventQueue : public std::priority_queue<Event, std::vector<Event>, std::greater<Event>>
{

Please don't inherit from the standard library types (I shouldn't even need to explain why this is generally a bad idea, there's numerous explanations already written on the net).

If necessary, compose a type that builds off a priority_queue through composition or utilize some other means of doing the same thing.


Comments from Reviewable

@EmptyChaos
Copy link
Contributor Author

Review status: 0 of 26 files reviewed at latest revision, 1 unresolved discussion.


Source/Core/Core/CoreTiming.cpp, line 57 [r1] (raw file):

Previously, lioncash (Mat M.) wrote…

Please don't inherit from the standard library types (I shouldn't even need to explain why this is generally a bad idea, there's numerous explanations already written on the net).

If necessary, compose a type that builds off a priority_queue through composition or utilize some other means of doing the same thing.

There's no risk of slicing, it's not a library type in a header, it's not copied or moved. None of the base class functions are overridden, the extended interface is transparent with no new member variables. `std::priority_queue` is also not a container, it's an adaptor.

You can't compose std::priority_queue, the container and comparator are protected member variables, there's no way to access them without using inheritance. I'd have to write a custom priority queue from scratch.

Would you prefer me to write a custom container class (it would mostly duplicate std::priority_queue anyway)?


Comments from Reviewable

@lioncash
Copy link
Member

lioncash commented Sep 1, 2016

Source/Core/Core/CoreTiming.cpp, line 57 [r1] (raw file):

Previously, EmptyChaos wrote…

There's no risk of slicing, it's not a library type in a header, it's not copied or moved. None of the base class functions are overridden, the extended interface is transparent with no new member variables. std::priority_queue is also not a container, it's an adaptor.

You can't compose std::priority_queue, the container and comparator are protected member variables, there's no way to access them without using inheritance. I'd have to write a custom priority queue from scratch.

Would you prefer me to write a custom container class (it would mostly duplicate std::priority_queue anyway)?

As far as I'm aware, the point of a queue is so you literally don't have random/free-form access to the underlying container. Otherwise there'd be no point to the queue interface in the first place. e.g. `EraseByType` isn't something a queue should have, because the only reason to ever use a queue is when the behavior is dependent on the ordering of the items, not the type of them.

So, utilizing a queue as a base for non-queue behavior doesn't really make sense to begin with, even if the functionality is similar.


Comments from Reviewable

@lioncash
Copy link
Member

lioncash commented Sep 1, 2016

Source/Core/Core/CoreTiming.cpp, line 70 [r1] (raw file):

  {
    bool found = false;
    c.erase(std::remove_if(c.begin(), c.end(),

Note that you can utilize the returned iterator from remove_if to determine whether or not a match was found instead of capturing by reference (simply check against the end iterator).


Comments from Reviewable

@EmptyChaos
Copy link
Contributor Author

I've set globalTimerIsSane to false in Init(). Dolphin still seems to boot fine and it makes the Unit Test a little simpler and more intuitive.


Review status: 0 of 26 files reviewed at latest revision, 2 unresolved discussions.


Source/Core/Core/CoreTiming.cpp, line 57 [r1] (raw file):

Previously, lioncash (Mat M.) wrote…

As far as I'm aware, the point of a queue is so you literally don't have random/free-form access to the underlying container. Otherwise there'd be no point to the queue interface in the first place. e.g. EraseByType isn't something a queue should have, because the only reason to ever use a queue is when the behavior is dependent on the ordering of the items, not the type of them.

So, utilizing a queue as a base for non-queue behavior doesn't really make sense to begin with, even if the functionality is similar.

Sort of. I'd argue that removing items from a queue out of order (cancellation) is a pretty standard behavior for a queue to support. Semantically it is an extended queue, a queue that is still a queue but can do more things that the base can't.

I've unrolled the class. It now just uses std::push_heap/pop_heap on a vector directly.


Source/Core/Core/CoreTiming.cpp, line 70 [r1] (raw file):

Previously, lioncash (Mat M.) wrote…

Note that you can utilize the returned iterator from remove_if to determine whether or not a match was found instead of capturing by reference (simply check against the end iterator).

Done.

Comments from Reviewable

@phire
Copy link
Member

phire commented Sep 1, 2016

I'm not sure that's the proper way to fix the bug.

With the original code, this works:

Init()
Advance()
Schedule_event()
downcounts to zero
Advance()

This works too:

Init()
Schedule_event()
Advance()
downcounts to zero
Advance()

Only this sequence fails

Init()
Schedule_event()
downcounts to zero
Advance()

I would be in favor of simply not doing the latter and adding documentation that it shouldn't be done.

@EmptyChaos
Copy link
Contributor Author

EmptyChaos commented Sep 1, 2016

@phire Okay, but what documentation should I put in? There doesn't seem to be an obvious reason that CoreTiming needs to delay all events by 20000 cycles at startup.

Advance() doesn't really affect the state at all if the downcount is unchanged, it just recomputes the exact same values as Init() did and clears the timer flag. (Obviously it computes the correct downcount, but the running downcount from ScheduleEvent gives the same result)

@EmptyChaos EmptyChaos force-pushed the coretiming-cleanup branch 2 times, most recently from f8983b2 to 8ca5ad6 Compare September 1, 2016 08:07
@degasus
Copy link
Member

degasus commented Sep 1, 2016

Reviewed 23 of 26 files at r1, 1 of 1 files at r2, 1 of 2 files at r3.
Review status: 25 of 26 files reviewed at latest revision, 6 unresolved discussions.


Source/Core/Core/CoreTiming.cpp, line 32 [r3] (raw file):

{
  TimedCallback callback;
  const std::string* name;

How often do we use this reference to name here? I don't see much usage in storing it, and it would simplify the map string->callback.


Source/Core/Core/CoreTiming.cpp, line 56 [r3] (raw file):

// STATE_TO_SAVE
static std::vector<Event> s_event_queue;  // Priority Heap (std::make_heap/push_heap/pop_heap)

I think we need at least a good comment here why you don't use a std::priority_queue.

Also, I'm doubtful that your performance check is valid, as there are games which triggers a lot of events. eg Silent Hill should be checked: https://forums.dolphin-emu.org/Thread-is-silent-hill-fixed-yet?pid=418166#pid418166


Source/Core/Core/CoreTiming.cpp, line 363 [r3] (raw file):

{
  auto clone = s_event_queue;
  std::sort(clone.begin(), clone.end());

Isn't the event queue already sorted?


Source/Core/Core/CoreTiming.cpp, line 408 [r3] (raw file):

  {
    g_slicelength = (int)(first->time - g_globalTimer);
    if (g_slicelength > maxslicelength)

With dropping this line, you'll increase the max delay of asynchronous events. Was this done on purpose?


Source/Core/Core/HW/EXI_DeviceMemoryCard.cpp, line 88 [r3] (raw file):

  }
  // Because the memory cards can be changed while emulation is running, the callbacks may
  // have already been registered by previous instances.

Maybe a small comment which behavior is expected here with queued events: assert, called now, or dropped.


Source/Core/Core/PowerPC/PowerPC.cpp, line 39 [r3] (raw file):

PPCDebugInterface debug_interface;

static CoreTiming::EventType* s_invalidate_cache_thread_safe = nullptr;

By the way, "= nullptr" is redundant here. Global static pointers are initialized with nullptr.


Comments from Reviewable

@@ -31,6 +31,7 @@ extern u64 g_fake_TB_start_ticks;
extern int g_slice_length;
extern float g_last_OC_factor_inverted;

// NOTE: ScheduleEvent will not start updating the PowerPC downcount until after the first Advance.

This comment was marked as off-topic.

@phire
Copy link
Member

phire commented Sep 1, 2016

Though after looking thought the code, I'm not actually seeing a good reason to keep the behavior I originally gave it.

Perhaps Init should just set globalTimerIsSane to false. It has no way of actually ensuring Advance is actually called before cycle zero. ScheduleEvent will end up doing a bit of wasted work during initialization if Advance is guaranteed to be called... but it doesn't change the actual outcome.

@EmptyChaos EmptyChaos force-pushed the coretiming-cleanup branch 2 times, most recently from 486ae63 to 4184984 Compare September 1, 2016 11:27
@degasus
Copy link
Member

degasus commented Sep 1, 2016

Or just make it explicit. Init() for the internal structurs, then a few RegisterEvents, then maybe a Start() call on the initial PPC run.

@degasus
Copy link
Member

degasus commented Sep 1, 2016

Reviewed 18 of 19 files at r4.
Review status: 25 of 26 files reviewed at latest revision, 5 unresolved discussions.


Comments from Reviewable

@EmptyChaos
Copy link
Contributor Author

Review status: 24 of 26 files reviewed at latest revision, 7 unresolved discussions.


Source/Core/Core/CoreTiming.cpp, line 32 [r3] (raw file):

Previously, degasus (Markus Wick) wrote…

How often do we use this reference to name here? I don't see much usage in storing it, and it would simplify the map string->callback.

It's used in `DoState` and some of the log messages. The pointer is needed to go backwards from the value to the key.

The reason I didn't just use std::map<std::string, TimedCallback> is because I couldn't figure out how to forward declare an opaque pointer to a std::pair. Do you have an opinion about that?


Source/Core/Core/CoreTiming.cpp, line 56 [r3] (raw file):

Previously, degasus (Markus Wick) wrote…

I think we need at least a good comment here why you don't use a std::priority_queue.

Also, I'm doubtful that your performance check is valid, as there are games which triggers a lot of events. eg Silent Hill should be checked: https://forums.dolphin-emu.org/Thread-is-silent-hill-fixed-yet?pid=418166#pid418166

Unfortunately, I don't have Silent Hill to test. Are there any other games that are similarly event heavy?

The overall algorithms are not significantly different, the biggest change is that the event queue was changed to a vector instead of a list. Vectors have better cache efficiency which I measured as a non-significant performance increase.


Source/Core/Core/CoreTiming.cpp, line 363 [r3] (raw file):

Previously, degasus (Markus Wick) wrote…

Isn't the event queue already sorted?

The queue is a min-heap, it's not sorted.

Source/Core/Core/CoreTiming.cpp, line 408 [r3] (raw file):

Previously, degasus (Markus Wick) wrote…

With dropping this line, you'll increase the max delay of asynchronous events. Was this done on purpose?

The code is refactored, not removed. I replaced the `if` with `std::min`.

Source/Core/Core/CoreTiming.h, line 34 [r3] (raw file):

Previously, phire (Scott Mansell) wrote…

The actual limitation is: For things to work as expected, Advance() must be called before the first cycle is executed.

Done.

Source/Core/Core/HW/EXI_DeviceMemoryCard.cpp, line 88 [r3] (raw file):

Previously, degasus (Markus Wick) wrote…

Maybe a small comment which behavior is expected here with queued events: assert, called now, or dropped.

They're deleted (`RemoveEvent(et_cmd_done)`) in the destructor. I've added a note to the comment.

Source/Core/Core/PowerPC/PowerPC.cpp, line 39 [r3] (raw file):

Previously, degasus (Markus Wick) wrote…

By the way, "= nullptr" is redundant here. Global static pointers are initialized with nullptr.

I've removed the assignments.

Comments from Reviewable

@EmptyChaos
Copy link
Contributor Author

@phire The part I didn't understand from just looking at the code was the implicit slice boundary concept. It makes sense when you explained it as the boundary between slice -1 and slice 0 so I'm fine the current behavior now.

One point though, the interpreter seems to be divergent from the JIT. It calls Advance() at the bottom of its dispatcher loop instead of the start (Interpreter::Run, Interpreter::SingleStep), i.e. it runs cycle 1 then advances instead of advancing then running cycle 1 like the JIT. Shouldn't the Advance() calls be at the start like Jit64 in that case?

@phire
Copy link
Member

phire commented Sep 2, 2016

Yeah, it needs some comments to make this concept explictly clear. Shortly before 5.0, all the cores had wildly different behaviour about when advance was called.

I thought I standardized in in #3800, but I might have left an inconstancy. Fixing it up (and adding comments as to why Advance() is being called first) would be a good idea.

@degasus
Copy link
Member

degasus commented Sep 2, 2016

Reviewed 1 of 19 files at r4, 3 of 3 files at r5.
Review status: all files reviewed at latest revision, 9 unresolved discussions.


Source/Core/Core/CoreTiming.cpp, line 32 [r3] (raw file):

Previously, EmptyChaos wrote…

It's used in DoState and some of the log messages. The pointer is needed to go backwards from the value to the key.

The reason I didn't just use std::map<std::string, TimedCallback> is because I couldn't figure out how to forward declare an opaque pointer to a std::pair. Do you have an opinion about that?

Hm, you could use iterators instead of EventType*, but as long as this pointer here is used, I'm fine with both ways.

Source/Core/Core/CoreTiming.cpp, line 56 [r3] (raw file):

Previously, EmptyChaos wrote…

Unfortunately, I don't have Silent Hill to test. Are there any other games that are similarly event heavy?

The overall algorithms are not significantly different, the biggest change is that the event queue was changed to a vector instead of a list. Vectors have better cache efficiency which I measured as a non-significant performance increase.

Indeed, vector and list are similiar for performance. I thought about the performance of the queue.

Source/Core/Core/CoreTiming.cpp, line 408 [r3] (raw file):

Previously, EmptyChaos wrote…

The code is refactored, not removed. I replaced the if with std::min.

my bad, sorry

Source/UnitTests/Core/CoreTimingTest.cpp, line 139 [r5] (raw file):

  CoreTiming::ScheduleEvent(200, cb_b, CB_IDS[1]);

  s_lateness = 10;

May you move s_lateness as optional (thrid) parameter to AdvanceAndCheck?


Source/UnitTests/Core/CoreTimingTest.cpp, line 147 [r5] (raw file):

TEST(CoreTiming, ChainScheduling)
{
  static int s_reschedules = 0;

Why static? Is this test called multiple times? Same for RescheduleCallback


Source/UnitTests/Core/CoreTimingTest.cpp, line 199 [r5] (raw file):

TEST(CoreTiming, ScheduleIntoPast)
{
  static CoreTiming::EventType* s_cb_next = nullptr;

Why static? [&] is allowed to access this variable without "static".
https://godbolt.org/g/QEyuCK
So you also don't have to reinterpret_cast(cb_rs).


Source/UnitTests/Core/CoreTimingTest.cpp, line 227 [r5] (raw file):

  CoreTiming::ScheduleEvent(-1000, cb_b, CB_IDS[1], CoreTiming::FromThread::NON_CPU);
  Core::DeclareAsCPUThread();
  s_lateness = MAX_SLICE_LENGTH + 1000;

What do you think about this behavior? I know it matches our code, but I want to change it to s_lateness = 1000. No clue which one is better, as we have no idea how much time is already passed. But I feel like configuring less events in the past is a good idea.


Comments from Reviewable

@phire
Copy link
Member

phire commented Sep 2, 2016

Reviewed 7 of 26 files at r1, 15 of 19 files at r4, 3 of 3 files at r5.
Review status: all files reviewed at latest revision, 7 unresolved discussions.


Source/Core/Core/HW/EXI_DeviceMemoryCard.cpp, line 91 [r5] (raw file):

  // destructor.
  et_cmd_done = CoreTiming::RegisterEvent(event_names[index].done, CmdDoneCallback,
                                          CoreTiming::Registration::ExpectExisting);

I'm not really happy about the existing code here.

It looks like we could end up with a save-state loading containing one of these events before this event has been registered.

Generally events should all be registers during Init.


Source/UnitTests/Core/CoreTimingTest.cpp, line 120 [r5] (raw file):

  s_lateness = 0;
  s_expected_callback = 0;
  PowerPC::ppcState.downcount = 0;

Might as well assert that downcount equals 1000 here.


Source/UnitTests/Core/CoreTimingTest.cpp, line 147 [r5] (raw file):

Previously, degasus (Markus Wick) wrote…

Why static? Is this test called multiple times? Same for RescheduleCallback

Yeah, it will be best to make these all non static.

Source/UnitTests/Core/CoreTimingTest.cpp, line 150 [r5] (raw file):

  s_lateness = 0;

  static auto RescheduleCallback = [](u64 userdata, s64 lateness) {

should be capturing a reference to s_reschedules

auto RescheduleCallback = [&s_reschedules](u64 userdata, s64 lateness) {

Source/UnitTests/Core/CoreTimingTest.cpp, line 227 [r5] (raw file):

Previously, degasus (Markus Wick) wrote…

What do you think about this behavior? I know it matches our code, but I want to change it to s_lateness = 1000. No clue which one is better, as we have no idea how much time is already passed. But I feel like configuring less events in the past is a good idea.

I feel like the difference is arbitrary. We can't guarantee anywhere near exact times or consistent delays.

I don't think it's worth putting any effort into trying to minimise the cross-threaded lateness. I'd rather we put that time into minimising the amount of things scheduling from the wrong thread.


Comments from Reviewable

@EmptyChaos
Copy link
Contributor Author

Review status: 23 of 26 files reviewed at latest revision, 7 unresolved discussions.


Source/Core/Core/CoreTiming.cpp, line 32 [r3] (raw file):

Previously, degasus (Markus Wick) wrote…

Hm, you could use iterators instead of EventType*, but as long as this pointer here is used, I'm fine with both ways.

Unfortunately, `unordered_map` only guarantees stability of references to elements, not iterators. Rehashing invalidates all the iterators (they become dangling and can't be used). Only `std::map` guarantees iterators are stable as well as pointers.

Exporting iterators would also leak an implementation detail, I'm trying to hide the nature/structure of the internal data structures from the CoreTiming consumers since that makes the interface less brittle (don't need to change things that use CoreTiming when changing the internals).


Source/Core/Core/CoreTiming.cpp, line 56 [r3] (raw file):

Previously, degasus (Markus Wick) wrote…

Indeed, vector and list are similiar for performance. I thought about the performance of the queue.

When I originally profiled it before the review comment changes, I got roughly 100 additional CPU samples in `CoreTiming::Advance` and 300 less samples in `CoreTiming::ScheduleEvent` compared to master.

Analytically, AddEventToQueue is O(n) with low-locality, std::push_heap is log n with high-locality, e.g. 32 events -> 32 comparisons vs 32 events -> 5 comparisons. pop_heap is another 5 comparisons, but 10 comparisons total is still less than 32. After throwing some statistics code into CoreTiming, the average number of events in the queue seems to be around 8 which is up to 8 comparisons for the list vs up to 6 for the heap.


Source/UnitTests/Core/CoreTimingTest.cpp, line 120 [r5] (raw file):

Previously, phire (Scott Mansell) wrote…

Might as well assert that downcount equals 1000 here.

Done.

Source/UnitTests/Core/CoreTimingTest.cpp, line 139 [r5] (raw file):

Previously, degasus (Markus Wick) wrote…

May you move s_lateness as optional (thrid) parameter to AdvanceAndCheck?

Done.

Source/UnitTests/Core/CoreTimingTest.cpp, line 147 [r5] (raw file):

Previously, degasus (Markus Wick) wrote…

Why static? Is this test called multiple times? Same for RescheduleCallback

`static` is needed to make it accessible via the global scope from `RescheduleCallback`.

RescheduleCallback doesn't need to be static but removing the static keyword doesn't actually change anything from a practical level, all I'm doing is defining a function in the global scope without actually putting the name of the function into the global scope. i.e. it's semantically equivalent to

namespace ChainSchedulingTest
{
static int s_reschedules = 0;
static void RescheduleCallback(u64 userdata, s64 lateness) { ... }
}

Source/UnitTests/Core/CoreTimingTest.cpp, line 199 [r5] (raw file):

Previously, degasus (Markus Wick) wrote…

Why static? [&] is allowed to access this variable without "static".
https://godbolt.org/g/QEyuCK
So you also don't have to reinterpret_cast(cb_rs).

CoreTiming only accepts function pointers, not `std::function`. Lambdas are only "true functions" (reducible to a function pointer) when they don't capture. A capturing lambda is a functor (i.e. class with `operator()`) instead of a function.
int main()
{
    int i = 5;
    void (*callback)() = [&]() { printf("%d\n", i); };  // <-- won't compile "lambda not convertible to function pointer"
    callback();
}

Source/UnitTests/Core/CoreTimingTest.cpp, line 227 [r5] (raw file):

Previously, degasus (Markus Wick) wrote…

What do you think about this behavior? I know it matches our code, but I want to change it to s_lateness = 1000. No clue which one is better, as we have no idea how much time is already passed. But I feel like configuring less events in the past is a good idea.

I've added a code comment explaining what I'm trying to do here.

Basically what's being tested is a ScheduleEvent from another thread where the MoveEvents gets performed late. i.e. there are no events in the queue, a thread samples g_global_timer and gets paused by the operating system scheduler, the CPU Thread then advances CoreTiming by 1000 cycles and then the second thread wakes up and adds the event. The CPU Thread will then run up to 20000 cycles (current downcount) before it Advances again, MoveEvents and then finally runs the callback (very) late.

Essentially, g_global_timer is unreliable outside the CPU Thread so it's possible for dumb things to happen which need to not break the event scheduler.


Comments from Reviewable

@degasus
Copy link
Member

degasus commented Sep 2, 2016

:lgtm:


Reviewed 3 of 3 files at r7.
Review status: all files reviewed at latest revision, 6 unresolved discussions.


Comments from Reviewable

@phire
Copy link
Member

phire commented Sep 2, 2016

Review status: all files reviewed at latest revision, 5 unresolved discussions.


Source/UnitTests/Core/CoreTimingTest.cpp, line 147 [r5] (raw file):

Previously, EmptyChaos wrote…

static is needed to make it accessible via the global scope from RescheduleCallback.

RescheduleCallback doesn't need to be static but removing the static keyword doesn't actually change anything from a practical level, all I'm doing is defining a function in the global scope without actually putting the name of the function into the global scope. i.e. it's semantically equivalent to

namespace ChainSchedulingTest
{
static int s_reschedules = 0;
static void RescheduleCallback(u64 userdata, s64 lateness) { ... }
}
But `RescheduleCallback` doesn't need to be accessing `s_reschedules` from the global scope. It should be capturing an reference to `s_reschedules` and using that.

By using static, you are ensuring the second call to the unit test will return the wrong result. That's not really an issue in itself as we only need to call tests once, but it still feels very wrong.


Source/UnitTests/Core/CoreTimingTest.cpp, line 227 [r5] (raw file):

Previously, EmptyChaos wrote…

I've added a code comment explaining what I'm trying to do here.

Basically what's being tested is a ScheduleEvent from another thread where the MoveEvents gets performed late. i.e. there are no events in the queue, a thread samples g_global_timer and gets paused by the operating system scheduler, the CPU Thread then advances CoreTiming by 1000 cycles and then the second thread wakes up and adds the event. The CPU Thread will then run up to 20000 cycles (current downcount) before it Advances again, MoveEvents and then finally runs the callback (very) late.

Essentially, g_global_timer is unreliable outside the CPU Thread so it's possible for dumb things to happen which need to not break the event scheduler.

BTW. `g_global_timer` should possibly be thread local to the CPU thread.

But way out of scope for this PR.


Source/UnitTests/Core/CoreTimingTest.cpp, line 241 [r7] (raw file):

}

TEST(CoreTiming, Overclocking)

Could you add a stage to this test which schedules some events, AdvanceAndChecks through some of them then changes the m_OCFactor before AdvancingAndChecking the rest.


Comments from Reviewable

Replace adhoc linked list with a priority heap. Performance
characteristics are mostly the same, but is more cache friendly.

[Priority Queues have O(log n) push/pop compared to the linked
list's O(n) push/O(1) pop but the queue is not big enough for
that to matter, so linear is faster over linked. Very slight gains
when framelimit is unlimited (Wind Waker), 1900% -> 1950%]
@EmptyChaos
Copy link
Contributor Author

Review status: 22 of 26 files reviewed at latest revision, 6 unresolved discussions.


Source/Core/Core/HW/EXI_DeviceMemoryCard.cpp, line 91 [r5] (raw file):

Previously, phire (Scott Mansell) wrote…

I'm not really happy about the existing code here.

It looks like we could end up with a save-state loading containing one of these events before this event has been registered.

Generally events should all be registers during Init.

I've looked through the code and I think you're right. If a memory card is attached before loading the save state then the callback is incidentally present and it works, if there isn't a memory card then `CEXIChannel::DoState` will create the memory card class after CoreTiming has already been `DoState()`ed which causes the events to be lost.

Source/UnitTests/Core/CoreTimingTest.cpp, line 147 [r5] (raw file):

Previously, phire (Scott Mansell) wrote…

But RescheduleCallback doesn't need to be accessing s_reschedules from the global scope. It should be capturing an reference to s_reschedules and using that.

By using static, you are ensuring the second call to the unit test will return the wrong result. That's not really an issue in itself as we only need to call tests once, but it still feels very wrong.

`s_reschedules` is initialized to 3 as part of the test body further down. I thought gtest only ran one test at a time, otherwise the global variables (`s_expected_callback`, etc) would be broken as well.

Capturing lambdas are functors (a class object with the operator() member function), not functions.

int main()
{
    int i = 5;
    void (*callback)() = [&]() { printf("%d\n", i); };  // <-- won't compile, lambda is not a function
    callback();
}

To get a true function it must not capture which means s_reschedules needs to reachable without a this or a relative stack offset (i.e. global).


Source/UnitTests/Core/CoreTimingTest.cpp, line 150 [r5] (raw file):

Previously, phire (Scott Mansell) wrote…

should be capturing a reference to s_reschedules

auto RescheduleCallback = [&s_reschedules](u64 userdata, s64 lateness) {
Capturing lambdas are functors. I've replaced the lambda with a normal function.

Source/UnitTests/Core/CoreTimingTest.cpp, line 227 [r5] (raw file):

Previously, phire (Scott Mansell) wrote…

BTW. g_global_timer should possibly be thread local to the CPU thread.

But way out of scope for this PR.

There seems to be an architectural problem with the `ScheduleEvent` API. Events scheduled from outside the CPU Thread really should use either absolute times or none at all instead of "cycles_into_future" relative times. Outside the CPU Thread, the question is "cycles_into_future" of _what_ exactly? Relative to when? If the thread started a task at x cycles and intended to finish at x+2000 but ended up being paused by the OS scheduler so we're now at x+10000, when is the event supposed to be scheduled? -8000, +0, +2000? There's no sane behavior in the absence of a centralized clock source that blocks all the threads when one of the thread group falls too far behind the others.

IMHO, ScheduleEvent should be split back into ScheduleEvent (CPU Thread) and ScheduleEventExternal (Non-CPU) where ScheduleEventExternaldoes not have a time parameter at all, just the callback key and userdata.


Source/UnitTests/Core/CoreTimingTest.cpp, line 241 [r7] (raw file):

Previously, phire (Scott Mansell) wrote…

Could you add a stage to this test which schedules some events, AdvanceAndChecks through some of them then changes the m_OCFactor before AdvancingAndChecking the rest.

Done.

Comments from Reviewable

@phire
Copy link
Member

phire commented Sep 3, 2016

Reviewed 1 of 3 files at r7, 4 of 4 files at r8.
Review status: all files reviewed at latest revision, 3 unresolved discussions.


Source/Core/Core/HW/EXI_DeviceMemoryCard.cpp, line 91 [r5] (raw file):

Previously, EmptyChaos wrote…

I've looked through the code and I think you're right. If a memory card is attached before loading the save state then the callback is incidentally present and it works, if there isn't a memory card then CEXIChannel::DoState will create the memory card class after CoreTiming has already been DoState()ed which causes the events to be lost.

Another savestate bug bites the dust.

Source/UnitTests/Core/CoreTimingTest.cpp, line 147 [r5] (raw file):

Previously, EmptyChaos wrote…

s_reschedules is initialized to 3 as part of the test body further down. I thought gtest only ran one test at a time, otherwise the global variables (s_expected_callback, etc) would be broken as well.

Capturing lambdas are functors (a class object with the operator() member function), not functions.

int main()
{
    int i = 5;
    void (*callback)() = [&]() { printf("%d\n", i); };  // <-- won't compile, lambda is not a function
    callback();
}

To get a true function it must not capture which means s_reschedules needs to reachable without a this or a relative stack offset (i.e. global).

Ah, you are right. The static variable can stay.

Source/UnitTests/Core/CoreTimingTest.cpp, line 227 [r5] (raw file):

Previously, EmptyChaos wrote…

There seems to be an architectural problem with the ScheduleEvent API. Events scheduled from outside the CPU Thread really should use either absolute times or none at all instead of "cycles_into_future" relative times. Outside the CPU Thread, the question is "cycles_into_future" of what exactly? Relative to when? If the thread started a task at x cycles and intended to finish at x+2000 but ended up being paused by the OS scheduler so we're now at x+10000, when is the event supposed to be scheduled? -8000, +0, +2000? There's no sane behavior in the absence of a centralized clock source that blocks all the threads when one of the thread group falls too far behind the others.

IMHO, ScheduleEvent should be split back into ScheduleEvent (CPU Thread) and ScheduleEventExternal (Non-CPU) where ScheduleEventExternaldoes not have a time parameter at all, just the callback key and userdata.

Yeah... what externally scheduled events are doing, is moving some event to the CPU thread. Any sort of delay doesn't make any sence (unless you are scheduling two events retaliative to each other, which results in a race condition).

Absolute Times are a bit useless too. Only the CPU thread should know what the Absolute Time currently are, so the other threads will have an outdated view. Unless an external thread is scheduling something several ms in the future, any absolute times are practically guaranteed to come in late.

Your refactoring idea sounds good. Don't even bother calling it ScheduleEventExternal. Call it something like RunOnCPUThread and make it take an std::function so we can use lambdas (@delroth was suggesting lambda for this function the other day). Since we won't need to save these cross-thread events out to savestates anymore, there is no issue with accepting lambdas.

If something has a good reason for an event happening in the future, it can call ScheduleEvent from within the lambda.


Comments from Reviewable

@phire
Copy link
Member

phire commented Sep 3, 2016

Once you have deleted that enum, :lgtm:


Review status: all files reviewed at latest revision, 3 unresolved discussions.


Source/Core/Core/CoreTiming.h, line 48 [r8] (raw file):

void DoState(PointerWrap& p);

enum class Registration

Now that the one place that used this enum was used has been proven buggy and been eliminated, the enum (and the optional argument on RegisterEvent) can be removed.

I don't think there is any use cases for these alternative modes which won't break savestates.


Comments from Reviewable

Replace 'int' keys with something that carries type information.
Performance is neutral.
ForceExceptionCheck messes up the downcount and slice length if the
callback is scheduled into the past (g_slice_length becomes negative)
Events don't update the downcount until after the first Advance(),
thus Advance() must be called once before scheduling works normally.
CoreTiming gets restored before ExpansionInterface so CoreTiming
events need to already be registered before the save state loading
begins. This means that the callbacks must be registered
unconditionally instead of on-demand.
PixelEngine is initialized by InitializeShared()
@phire
Copy link
Member

phire commented Sep 3, 2016

Review status: 24 of 27 files reviewed at latest revision, 5 unresolved discussions.


Source/Core/VideoBackends/Software/SWmain.cpp, line 145 [r10] (raw file):

  SWOGLWindow::Init(window_handle);

  PixelEngine::Init();

Perhaps put this change in another PR.


Comments from Reviewable

@EmptyChaos
Copy link
Contributor Author

Review status: 24 of 27 files reviewed at latest revision, 5 unresolved discussions.


Source/Core/Core/CoreTiming.h, line 48 [r8] (raw file):

Previously, phire (Scott Mansell) wrote…

Now that the one place that used this enum was used has been proven buggy and been eliminated, the enum (and the optional argument on RegisterEvent) can be removed.

I don't think there is any use cases for these alternative modes which won't break savestates.

Done.

Source/Core/VideoBackends/Software/SWmain.cpp, line 145 [r10] (raw file):

Previously, phire (Scott Mansell) wrote…

Perhaps put this change in another PR.

`PixelEngine::Init` contains a `CoreTiming::RegisterEvent`. Registering an event twice trips the assertion I added and causes FIFOCI to crash.

Source/UnitTests/Core/CoreTimingTest.cpp, line 227 [r5] (raw file):

Previously, phire (Scott Mansell) wrote…

Yeah... what externally scheduled events are doing, is moving some event to the CPU thread. Any sort of delay doesn't make any sence (unless you are scheduling two events retaliative to each other, which results in a race condition).

Absolute Times are a bit useless too. Only the CPU thread should know what the Absolute Time currently are, so the other threads will have an outdated view. Unless an external thread is scheduling something several ms in the future, any absolute times are practically guaranteed to come in late.

Your refactoring idea sounds good. Don't even bother calling it ScheduleEventExternal. Call it something like RunOnCPUThread and make it take an std::function so we can use lambdas (@delroth was suggesting lambda for this function the other day). Since we won't need to save these cross-thread events out to savestates anymore, there is no issue with accepting lambdas.

If something has a good reason for an event happening in the future, it can call ScheduleEvent from within the lambda.

The gotcha with lambdas is that it creates phantom state. It'll need something like `CoreTiming::ForceSync()` so that save state system can flush the queue before creating a save state.

Comments from Reviewable

@phire
Copy link
Member

phire commented Sep 3, 2016

Reviewed 1 of 2 files at r9, 1 of 1 files at r10.
Review status: 26 of 27 files reviewed at latest revision, 3 unresolved discussions.


Source/Core/VideoBackends/Software/SWmain.cpp, line 145 [r10] (raw file):

Previously, EmptyChaos wrote…

PixelEngine::Init contains a CoreTiming::RegisterEvent. Registering an event twice trips the assertion I added and causes FIFOCI to crash.

Ah ok. LGTM then.

Source/UnitTests/Core/CoreTimingTest.cpp, line 227 [r5] (raw file):

Previously, EmptyChaos wrote…

The gotcha with lambdas is that it creates phantom state. It'll need something like CoreTiming::ForceSync() so that save state system can flush the queue before creating a save state.

We already have to sync everything before we can savestate. One more thing to sync is not a huge issue.

Comments from Reviewable

@degasus
Copy link
Member

degasus commented Sep 3, 2016

Reviewed 2 of 4 files at r8, 2 of 2 files at r9, 1 of 1 files at r10.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


Source/Core/VideoBackends/Software/SWmain.cpp, line 145 [r10] (raw file):

Previously, phire (Scott Mansell) wrote…

Ah ok. LGTM then.

oh, nice catch

Source/UnitTests/Core/CoreTimingTest.cpp, line 147 [r5] (raw file):

Previously, phire (Scott Mansell) wrote…

Ah, you are right. The static variable can stay.

Sounds so. A second PR which replaces void(*) with std::function would be nice through ;)

Comments from Reviewable

@phire
Copy link
Member

phire commented Sep 3, 2016

Review status: all files reviewed at latest revision, 2 unresolved discussions.


Source/UnitTests/Core/CoreTimingTest.cpp, line 147 [r5] (raw file):

Previously, degasus (Markus Wick) wrote…

Sounds so. A second PR which replaces void(*) with std::function would be nice through ;)

You have to be very careful that only static data (known at either compile time or init) is captured within std::function, so it's already recreated by the time a savestate is loaded.

IMO, the best way to do this, is to simply not accept any more event types to after the first cycle of execution.


Comments from Reviewable

@phire phire merged commit f5fa5a7 into dolphin-emu:master Sep 3, 2016
@EmptyChaos EmptyChaos deleted the coretiming-cleanup branch September 5, 2016 02:33
@JMC47
Copy link
Contributor

JMC47 commented Sep 5, 2016

@EmptyChaos I'm getting reports that the system menu can't load channels since this was merged.

@JosJuice
Copy link
Member

JosJuice commented Sep 5, 2016

I heard something on the forums about Metroid Prime Trilogy not being able to launch its games (though there was no bisect). Could ES_Launch be broken?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
6 participants