Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interprocessing fixes #3009

Merged
merged 4 commits into from
Oct 19, 2021
Merged

Interprocessing fixes #3009

merged 4 commits into from
Oct 19, 2021

Conversation

lampysprites
Copy link
Contributor

@lampysprites lampysprites commented Oct 18, 2021

Hello again!

Out of three commits here, first two implement the solutions for connection backoff and update timer from the initial IPC PR (thanks!)

The last commit avoids a freeze that happened randomly (and regularly) when the tab, or the app is closed. As far as I can tell, it is a deadlock in ClosedDocs::addClosedDoc's unique lock and SpriteEvents::~SpriteEvents call to doc()->remove_observer. Also noticed that for it to happen, the doc has to be released from ClosedDocs, doesnt matter if it's timed or immediately, otherwise the destructor is not called. Honestly I don't see what's wrong there.

To reproduce:

  1. disable Settings>Files>Keep Closed Sprite In Memory (or set to 10s, harder to catch but same thing happens)
  2. open a file and run the following script, it creates spriteEvents
app.activeSprite.events:off(app.activeSprite.events:on("change", function() end))`
  1. Then close the file. Repeat open/script/close several times depending on today's luck.
  2. If the day is exceptionally lucky, runthe script several times and open multiple sprites.
  3. Not unsubscribing events makes it less likely (?!)

Regarding the solution, I only removed the destructor involved. It seems that observer and event classes are destroyed together, and the global pointer clears in onDestroy, so no subscription remains anyway? Still I have a suspicion that the destructor was there for a reason.

With these changes, I don't run into any critical issues while doing half-duplex communication.

(For a while, the networking thread and this issue were the main suspect, perhaps it'll strike later:
machinezone/IXWebSocket#316)


I agree that my contributions are licensed under the Individual Contributor License Agreement V3.0 ("CLA") as stated in https://github.com/aseprite/sourcecode/blob/main/cla.md

I have signed the CLA following the steps given in https://github.com/aseprite/sourcecode/blob/main/sign-cla.md#sign-the-cla

@dacap dacap self-assigned this Oct 18, 2021
@dacap
Copy link
Member

dacap commented Oct 18, 2021

Hi @lampysprites! Thanks for your PR, I saw your extra changes in the other PR's comment and it's nice to have them here now. Didn't have enough time to advance with this. Was thinking about the extra complexity for the API but I think it might be useful to have these extra parameters. I'll try to merge in some way these changes tomorrow (maybe changing milliseconds by seconds as a number for max/minreconnectwait parameters) and using a std::unique_ptr for the global ui::Timer for WebSockets (about this maybe in a future we can add a method to wakeup the laf events queue to avoid the timer, but that's a little more complex).

@dacap dacap merged commit 6075285 into aseprite:main Oct 19, 2021
@dacap
Copy link
Member

dacap commented Oct 19, 2021

Thanks @lampysprites! I've merged these changes with other fixes: 37a1b4b

Mainly WebSocketMessageType constants now are in upper case (to match the style of all other constants, e.g. WebSocketMessageType.TEXT), and I've added a close_ws() to remove the WebSocket from g_connections and stop the g_timer in case that the WebSocket is GC'd.

@dacap
Copy link
Member

dacap commented Oct 19, 2021

I still have a question about minreconnectwait/maxreconnectwait, what each parameter does? I saw this documentation but I'm still not sure what does each parameter (I'm asking this so we can write the documentation about it, and to know if both are needed).

@lampysprites
Copy link
Contributor Author

lampysprites commented Oct 19, 2021

Thanks!

Min and max just clamp the exponential time from both sides. The socket waits this time until the next attempt:

// IXExponentialBackoff.cpp
uint32_t calculateRetryWaitMilliseconds(uint32_t retryCount,
                                        uint32_t maxWaitBetweenReconnectionRetries,
                                        uint32_t minWaitBetweenReconnectionRetries)
{
    uint32_t waitTime = (retryCount < 26) ? (std::pow(2, retryCount) * 100) : 0;

    if (waitTime < minWaitBetweenReconnectionRetries)
    {
        waitTime = minWaitBetweenReconnectionRetries;
    }

    if (waitTime > maxWaitBetweenReconnectionRetries || waitTime == 0)
    {
        waitTime = maxWaitBetweenReconnectionRetries;
    }

    return waitTime;
}

@lampysprites lampysprites deleted the interprocessing branch October 19, 2021 14:54
@dacap
Copy link
Member

dacap commented Oct 19, 2021

Oh thanks for the explanation @lampysprites! 👍

About the deadlock fix, I had to reverted because it was giving a use-after-free error (we have to remove the DocUndo observer in the ~SpriteEvents dtor):

=================================================================
==26851==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000030880 at pc 0x00000058e135 bp 0x7ffe42ec1b60 sp 0x7ffe42ec1b58
READ of size 8 at 0x606000030880 thread T0
    #0 0x58e134 in void obs::observers<app::DocUndoObserver>::notify_observers<app::DocUndo*, undo::UndoState*>(void (app::DocUndoObserver::*)(app::DocUndo*, undo::UndoState*), app::DocUndo*, undo::UndoState*) /home/david/Dropbox/ASEPRITE/aseprite/src/observable/obs/observers.h:37
    #1 0x58d4df in void obs::observable<app::DocUndoObserver>::notify_observers<app::DocUndo*, undo::UndoState*>(void (app::DocUndoObserver::*)(app::DocUndo*, undo::UndoState*), app::DocUndo*, undo::UndoState*) /home/david/Dropbox/ASEPRITE/aseprite/src/observable/obs/observable.h:33
    #2 0x58cca6 in app::DocUndo::onDeleteUndoState(undo::UndoState*) /home/david/Dropbox/ASEPRITE/aseprite/src/app/doc_undo.cpp:259
    #3 0x176129e in undo::UndoHistory::deleteState(undo::UndoState*) /home/david/Dropbox/ASEPRITE/aseprite/src/undo/undo_history.cpp:210
    #4 0x17602eb in undo::UndoHistory::clearRedo() /home/david/Dropbox/ASEPRITE/aseprite/src/undo/undo_history.cpp:70
    #5 0x175fe75 in undo::UndoHistory::~UndoHistory() /home/david/Dropbox/ASEPRITE/aseprite/src/undo/undo_history.cpp:30
    #6 0x58f5c0 in app::DocUndo::~DocUndo() (/home/david/builds/aseprite-gcc/bin/aseprite+0x58f5c0)
    #7 0x58f5f7 in app::DocUndo::~DocUndo() (/home/david/builds/aseprite-gcc/bin/aseprite+0x58f5f7)
    #8 0x54c378 in std::default_delete<app::DocUndo>::operator()(app::DocUndo*) const /usr/include/c++/11/bits/unique_ptr.h:85
    #9 0x54a7a2 in std::unique_ptr<app::DocUndo, std::default_delete<app::DocUndo> >::~unique_ptr() /usr/include/c++/11/bits/unique_ptr.h:361
    #10 0x544629 in app::Doc::~Doc() /home/david/Dropbox/ASEPRITE/aseprite/src/app/doc.cpp:70
    #11 0x544699 in app::Doc::~Doc() /home/david/Dropbox/ASEPRITE/aseprite/src/app/doc.cpp:82
    #12 0x41ff61 in app::App::run() /home/david/Dropbox/ASEPRITE/aseprite/src/app/app.cpp:491
    #13 0x41d38e in app_main(int, char**) /home/david/Dropbox/ASEPRITE/aseprite/src/main/main.cpp:115
    #14 0x109a2a9 in main /home/david/Dropbox/ASEPRITE/aseprite/laf/os/common/main.cpp:67
    #15 0x7fd724dbcb74 in __libc_start_main (/lib64/libc.so.6+0x27b74)
    #16 0x41cf5d in _start (/home/david/builds/aseprite-gcc/bin/aseprite+0x41cf5d)

0x606000030880 is located 32 bytes inside of 56-byte region [0x606000030860,0x606000030898)
freed by thread T0 here:
    #0 0x7fd725a1e307 in operator delete(void*, unsigned long) (/lib64/libasan.so.6+0xb1307)
    #1 0xb4b5f2 in ~SpriteEvents /home/david/Dropbox/ASEPRITE/aseprite/src/app/script/events_class.cpp:161
    #2 0xb4b66a in operator() /usr/include/c++/11/bits/unique_ptr.h:85
    #3 0xb4cb7e in ~unique_ptr /usr/include/c++/11/bits/unique_ptr.h:361
    #4 0xb4c6c3 in ~pair /usr/include/c++/11/bits/stl_pair.h:211
    #5 0xb4c6e3 in destroy<std::pair<unsigned int const, std::unique_ptr<app::script::(anonymous namespace)::SpriteEvents> > > /usr/include/c++/11/ext/new_allocator.h:162
    #6 0xb4c1ba in destroy<std::pair<unsigned int const, std::unique_ptr<app::script::(anonymous namespace)::SpriteEvents> > > /usr/include/c++/11/bits/alloc_traits.h:531
    #7 0xb4b6a4 in _M_destroy_node /usr/include/c++/11/bits/stl_tree.h:623
    #8 0xb49f48 in _M_drop_node /usr/include/c++/11/bits/stl_tree.h:631
    #9 0xb4a2a8 in _M_erase_aux /usr/include/c++/11/bits/stl_tree.h:2485
    #10 0xb497ba in erase /usr/include/c++/11/bits/stl_tree.h:1209
    #11 0xb48ae4 in erase /usr/include/c++/11/bits/stl_map.h:1038
    #12 0xb47dda in onDestroy /home/david/Dropbox/ASEPRITE/aseprite/src/app/script/events_class.cpp:192
    #13 0x54c921 in void obs::observers<app::DocObserver>::notify_observers<app::Doc*>(void (app::DocObserver::*)(app::Doc*), app::Doc*) /home/david/Dropbox/ASEPRITE/aseprite/src/observable/obs/observers.h:37
    #14 0x54aba8 in void obs::observable<app::DocObserver>::notify_observers<app::Doc*>(void (app::DocObserver::*)(app::Doc*), app::Doc*) /home/david/Dropbox/ASEPRITE/aseprite/src/observable/obs/observable.h:33
    #15 0x5445b1 in app::Doc::~Doc() /home/david/Dropbox/ASEPRITE/aseprite/src/app/doc.cpp:75
    #16 0x544699 in app::Doc::~Doc() /home/david/Dropbox/ASEPRITE/aseprite/src/app/doc.cpp:82
    #17 0x41ff61 in app::App::run() /home/david/Dropbox/ASEPRITE/aseprite/src/app/app.cpp:491
    #18 0x41d38e in app_main(int, char**) /home/david/Dropbox/ASEPRITE/aseprite/src/main/main.cpp:115
    #19 0x109a2a9 in main /home/david/Dropbox/ASEPRITE/aseprite/laf/os/common/main.cpp:67
    #20 0x7fd724dbcb74 in __libc_start_main (/lib64/libc.so.6+0x27b74)

@lampysprites
Copy link
Contributor Author

Okay makes sense but it's troubling if the program keeps freezing. Do you have any ideas on how to avoid the deadlock then? Is it okay to only remove the last line that causes it?

@dacap
Copy link
Member

dacap commented Oct 19, 2021

I'm going to investigate, I was able to reproduce the deadlock with the steps you gave. I think it's related to the onDestroy listener of SpriteEvents, which can be called from the ClosedDocs background thread. It's quite complex but we'll see if I can find a fix.

dacap added a commit that referenced this pull request Oct 19, 2021
…ocument

Instead of using an onDestroy(Document), which can be called from the
ClosedDocs background thread, we can disconnect all Doc observers on a
new onCloseDocument() event called from Doc::close() (from the UI
thread).

This deadlock issue was commented here:
#3009 (comment)
@dacap
Copy link
Member

dacap commented Oct 19, 2021

I think the deadlock issue should be fixed now: 4dcdcfe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants