Skip to content
This repository has been archived by the owner on Feb 7, 2024. It is now read-only.

Crashed in production, supervisord process was still running #325

Closed
ven0ms99 opened this issue Mar 2, 2020 · 38 comments
Closed

Crashed in production, supervisord process was still running #325

ven0ms99 opened this issue Mar 2, 2020 · 38 comments
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed

Comments

@ven0ms99
Copy link

ven0ms99 commented Mar 2, 2020

So basically it stopped working, even though the process in supervisord was showing up as running (since 60 days). After I restarted the process with supervisord it worked again. What could have happened and how could I avoid this in the future?

@tomcoonen
Copy link

Do you have any logs available? Without that it will be a bit hard to diagnose.

@rennokki
Copy link
Collaborator

Bump

@rennokki rennokki added the question Further information is requested label Aug 13, 2020
@Tofandel
Copy link
Contributor

Tofandel commented Aug 19, 2020

@Hillcow It happened to me, your issue is likely that the server ran out of memory, if that happens all request will fail because php cannot allocate more memory but the server will still be considered running (as the process doesn't exit if the failed allocation is within a try, catch) so supervisor will not restart it

Solution 1: increase the allowed memory for the ws server, this is a missing option in this lib that could be done with ini_set but can also be done when launching the command using php -d MEMORY_LIMIT=1G (doesn't work on all setups)

Solution 2: make sure to catch allocation failure exception and exit the process when that happens

@rennokki
Copy link
Collaborator

@Tofandel I assume Solution 2 can be adapted directly in this package by using a signal listener: https://github.com/reactphp/event-loop#addsignal

The bad thing is that that it needs the pcntl extension, so Windows is not favored for this kind of change.

@Tofandel
Copy link
Contributor

Tofandel commented Aug 19, 2020

The weird thing is that the loop kept running when out of memory, that's a fatal error so it shouldn't be caught in try catch but somehow the process still managed to continue and in the on error handler of the logger you can see the caught messages of "Unable to allocate xxx bytes" multiple times, once after each request

@yurik94
Copy link

yurik94 commented Aug 19, 2020

We resolved by restarting the process every week, very ugly but it's the only working solution for us as memory was below the max limit.
Use also used pcntl extension as, by default, the worst loop was in use.

@rennokki
Copy link
Collaborator

Ugly-ugly, but we gotta find what causes this huge increase in memory. Prolly some memory leaks caused by closed connections? 🤔

@Tofandel
Copy link
Contributor

Tofandel commented Aug 19, 2020

@rennokki I don't really see how this would solve the issue as that would listen for external signals (eg: sigterm, sigend) and not internal unrecoverable errors

Yes there likely is some memory leak in this package
I would suggest looking into the statistics part, last week I completely disabled the statistics by modifying some code because I didn't need them and didn't need the table as well, there is no memory increase over time anymore
Though I also added a gc_collect_cycles everytime a connection closes, so it might also be some cyclic reference issue and if the garbage collector is not called enough and you get a high volume of request before garbage collection (eg 200 requests in 5 minutes) then the memory just blows

There is a lot of circular references eg ->app = app() in all the loggers singleton so I wouldn't be surprised

@rennokki
Copy link
Collaborator

I understand that PHP cannot create exceptions because the memory limit is... well, memory limit. So maybe we can check with a timer if the total ram reached a percentage in the last x minutes and restart the server.

Again, if this seems to be due to statistics, i understand. There was a PR merged for 1.6.1 that fixes a decorator for websockets by cloning the self. Perhaps it needs some on the other decorators.

@Tofandel
Copy link
Contributor

Tofandel commented Aug 19, 2020

@rennokki Yeah that was me :p Well actually I think for loggers, singleton should be dropped at least for the message logger and instead a trait or abstract class should be added, it's a big BC but the websocket logger is causing other issues by decorating the class

One of them is interface implementation, ratchet will check your class for some implementations and provide some features in that case (Eg: supported protocol with interface WsServerInterface) and by decorating you lose the interface and so the feature, I think logging should be done within the class by extension of onMessage and onError rather than wrapping the class within a class

For memory you can indeed add a loop ticker every 30sec and check if you are at more than 95% of ini_get('memory_limit') if so stop the server

@rennokki
Copy link
Collaborator

I can drop the decorators for the 2.x. It is still in beta and it needs thoroughly tested for this point.

@rennokki rennokki added bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed and removed question Further information is requested labels Aug 19, 2020
@rennokki
Copy link
Collaborator

rennokki commented Aug 19, 2020

I basically managed to replicate it:

  • set the stats interval to 1
  • add this one to the start command:
$this->loop->addPeriodicTimer(1, function () {
    $memoryInMb = round(memory_get_usage() / 1048576, 2);

    $this->line("Using: {$memoryInMb} MB");
});
  • run php artisan websockets:serve
  • initiate a connection (via dashboard or via test with Echo)
  • refresh the page a few times
  • sit back and see how the memory increases on its own for no reason
Click to see the logs ``` New connection opened for app key key. Connection id 91478157.695732491 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"91478157.695732491\",\"activity_timeout\":30}"} Saving statistics... Using: 17.68 MB 1: connection id 91478157.695732491 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 91478157.695732491 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Connection id 91478157.695732491 closed. New connection opened for app key key. Connection id 806028557.614130089 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"806028557.614130089\",\"activity_timeout\":30}"} 1: connection id 806028557.614130089 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 806028557.614130089 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Saving statistics... Using: 17.82 MB Connection id 806028557.614130089 closed. New connection opened for app key key. Connection id 180972669.616525848 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"180972669.616525848\",\"activity_timeout\":30}"} 1: connection id 180972669.616525848 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 180972669.616525848 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Saving statistics... Using: 17.91 MB Connection id 180972669.616525848 closed. New connection opened for app key key. Connection id 133946535.732795730 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"133946535.732795730\",\"activity_timeout\":30}"} 1: connection id 133946535.732795730 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 133946535.732795730 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Connection id 133946535.732795730 closed. Saving statistics... Using: 17.97 MB New connection opened for app key key. Connection id 803104692.328573343 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"803104692.328573343\",\"activity_timeout\":30}"} 1: connection id 803104692.328573343 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 803104692.328573343 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Saving statistics... Using: 18.09 MB Saving statistics... Using: 18.18 MB Connection id 803104692.328573343 closed. Saving statistics... Using: 18.24 MB New connection opened for app key key. Connection id 169524850.746570909 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"169524850.746570909\",\"activity_timeout\":30}"} 1: connection id 169524850.746570909 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 169524850.746570909 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Saving statistics... Using: 18.36 MB Saving statistics... Using: 18.44 MB Connection id 169524850.746570909 closed. Saving statistics... Using: 18.5 MB New connection opened for app key key. Connection id 137718526.920963224 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"137718526.920963224\",\"activity_timeout\":30}"} 1: connection id 137718526.920963224 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 137718526.920963224 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Saving statistics... Using: 18.62 MB Saving statistics... Using: 18.71 MB Saving statistics... Using: 18.8 MB Saving statistics... Using: 18.9 MB Saving statistics... Using: 19 MB Saving statistics... Using: 19.09 MB Saving statistics... Using: 19.17 MB Saving statistics... Using: 19.26 MB Saving statistics... Using: 19.35 MB Saving statistics... Using: 19.44 MB Saving statistics... Using: 19.53 MB Saving statistics... Using: 19.62 MB Saving statistics... Using: 19.71 MB Saving statistics... Using: 19.8 MB Saving statistics... Using: 19.89 MB Saving statistics... Using: 19.97 MB Saving statistics... Using: 20.06 MB Saving statistics... Using: 20.16 MB Saving statistics... Using: 20.25 MB Saving statistics... Using: 20.33 MB Saving statistics... Using: 20.43 MB Saving statistics... Using: 20.53 MB Saving statistics... Using: 20.62 MB Saving statistics... Using: 20.71 MB Saving statistics... Using: 20.8 MB Saving statistics... Using: 20.88 MB Saving statistics... Using: 20.97 MB Saving statistics... Using: 21.06 MB Saving statistics... Using: 21.15 MB Saving statistics... Using: 21.27 MB Saving statistics... Using: 21.36 MB 1: connection id 137718526.920963224 received message: {"event":"pusher:ping","data":{}}. Connection id 137718526.920963224 sending message {"event":"pusher:pong"} Saving statistics... Using: 21.45 MB Saving statistics... Using: 21.54 MB Saving statistics... Using: 21.63 MB Saving statistics... Using: 21.71 MB Saving statistics... Using: 21.8 MB Saving statistics... Using: 21.89 MB Saving statistics... Using: 21.98 MB Saving statistics... Using: 22.07 MB Saving statistics... Using: 22.16 MB Saving statistics... Using: 22.25 MB Saving statistics... Using: 22.34 MB Saving statistics... Using: 22.42 MB Saving statistics... Using: 22.51 MB Saving statistics... Using: 22.6 MB ```

@rennokki
Copy link
Collaborator

Managed to solve the issue by working on the save() function from the statistic manager and add a $this->statistics = []; after the for each loop, but it still looks like it keeps something in-memory, then it drops again back to normal after a while.

Click to see the log ``` Starting the WebSocket server on port 6001... Saving statistics... Using: 16.63 MB Saving statistics... Using: 16.63 MB Saving statistics... Using: 16.63 MB New connection opened for app key key. Connection id 205785305.210932098 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"205785305.210932098\",\"activity_timeout\":30}"} Saving statistics... Using: 17.68 MB 1: connection id 205785305.210932098 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 205785305.210932098 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Saving statistics... Using: 17.81 MB Saving statistics... Using: 17.81 MB Saving statistics... Using: 17.81 MB Saving statistics... Using: 17.81 MB Connection id 205785305.210932098 closed. New connection opened for app key key. Connection id 726284151.278333402 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"726284151.278333402\",\"activity_timeout\":30}"} 1: connection id 726284151.278333402 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 726284151.278333402 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Saving statistics... Using: 17.91 MB Saving statistics... Using: 17.91 MB Saving statistics... Using: 17.91 MB Saving statistics... Using: 17.91 MB Connection id 726284151.278333402 closed. New connection opened for app key key. Connection id 939177083.989187003 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"939177083.989187003\",\"activity_timeout\":30}"} 1: connection id 939177083.989187003 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 939177083.989187003 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Saving statistics... Using: 18 MB Saving statistics... Using: 18 MB Saving statistics... Using: 18 MB Connection id 939177083.989187003 closed. Saving statistics... Using: 18.06 MB New connection opened for app key key. Connection id 28250624.310040980 sending message {"event":"pusher:connection_established","data":"{\"socket_id\":\"28250624.310040980\",\"activity_timeout\":30}"} 1: connection id 28250624.310040980 received message: {"event":"pusher:subscribe","data":{"auth":"","channel":"public"}}. Connection id 28250624.310040980 sending message {"event":"pusher_internal:subscription_succeeded","channel":"public"} Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB Saving statistics... Using: 18.18 MB 1: connection id 28250624.310040980 received message: {"event":"pusher:ping","data":{}}. Connection id 28250624.310040980 sending message {"event":"pusher:pong"} Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.27 MB Saving statistics... Using: 18.24 MB Saving statistics... Using: 18.15 MB Saving statistics... Using: 18.15 MB Saving statistics... Using: 18.15 MB Saving statistics... Using: 18.15 MB Saving statistics... Using: 18.07 MB Saving statistics... Using: 18.07 MB Saving statistics... Using: 18.07 MB Connection id 28250624.310040980 closed. Saving statistics... Using: 18.13 MB Saving statistics... Using: 18.04 MB Saving statistics... Using: 18.04 MB Saving statistics... Using: 18.04 MB Saving statistics... Using: 17.95 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB Saving statistics... Using: 17.86 MB ```

@Tofandel
Copy link
Contributor

@rennokki make sure to call gc_collect_cycles() before displaying the memory or you might see inaccurate result because of circular references then check if memory is still increasing

@rennokki
Copy link
Collaborator

rennokki commented Aug 19, 2020

@Tofandel ed96e24 seems to fix the auto-incrementing of memory, but after the gc_collect_cycles(), seems that the memory increases incrementally on each re-connection using Echo.

1 sec tick, both statistics and memory count. Refreshed the page on each increase:

16.63 -> 17.73 -> 17.82 -> 17.91 -> 18.00 -> 18.09 -> 18.15 -> (closed the connection; no active connections at this point) -> 18.12 -> 18.04 -> 17.95 -> 17.77 -> 17.68

The base start is 16.63, and 17.68 is the last value and it's been like 3 mins and it still stays on 17.68, so it seems like you are right, there is some cycle that keeps being beaten up every time a new connection comes in. Maybe I can do this at scale and try to initiate multiple connections.

@rennokki
Copy link
Collaborator

Switching up on 2.x to \BeyondCode\LaravelWebSockets\Statistics\Logger\NullStatisticsLogger::class stops any leakage, so it's not the singleton itself, it's the code within that breaks things up. 🤔

@rennokki
Copy link
Collaborator

rennokki commented Aug 19, 2020

This thing here messes up the memory:

$this->browser
    ->post(
        action([WebSocketStatisticsEntriesController::class, 'store']),
        ['Content-Type' => 'application/json'],
        stream_for(json_encode($postData))
    );

@rennokki
Copy link
Collaborator

@Tofandel Seems like the controller caused the issue. Removed it in 99b5541 and it seems like the issue is gone, although it still increments after some time with 0.01 MB, like every 5th or 6th connection adds a 0.01 MB to the memory out of nowhere. 🤔

@Tofandel
Copy link
Contributor

Tofandel commented Aug 19, 2020

@rennokki that seems like a normal increase, some laravel helpers have a local cache to lookup singletons/controllers, convert strings etc, likely what you are seeing

@rennokki
Copy link
Collaborator

rennokki commented Aug 20, 2020

Won't you believe, but broadcast(new StatisticsUpdated(...)) is causing the issue. :/ Now I'm really confused. Disabling it would turn the memory consumption back to normal. More specific, this: https://github.com/beyondcode/laravel-websockets/pull/475/files#diff-ed8111d433470eea09c12aede397e6e2R108-R110

Disabling decorators has nothing to do with the memory leak, I can confirm it.

@rennokki
Copy link
Collaborator

I removed the event broadcasting that caused the issue with an automatic polling function in the dashboard. It keeps polling the /statistics endpoint for refreshes, allowing to disable or refresh on-demand.

Untitled

@Tofandel
Copy link
Contributor

Isn't the memory leak caused by the broadcast function? If so the memory leak is within laravel and the issue is more serious

@rennokki
Copy link
Collaborator

👀

@rennokki
Copy link
Collaborator

Just posted laravel/framework#33952

@Tofandel
Copy link
Contributor

Tofandel commented Aug 20, 2020

After, this package is hooking into the broadcast function to send websocket messages correct? In that case it might be wise to do some xdebugging and see where that function is going and what data it attaches, as it can either be in laravel or just in the broadcasting part of this lib

Edit: I'm looking into it

@Tofandel
Copy link
Contributor

Tofandel commented Aug 20, 2020

@rennokki I figured it out, the event you removed doesn't implement the ShouldBroadcastNow interface, as such it is added to a null queue and somehow kept in memory but broadcasted right away for some reason, something weird is going on, in any case changing ShouldBroadcast to ShouldBroadcastNow should fix it

@rennokki
Copy link
Collaborator

@Tofandel That means that although the driver is by default sync and the event gets triggered, the method still causes a memory leak.

@rennokki
Copy link
Collaborator

rennokki commented Aug 20, 2020

@Tofandel The memory leak still persists, even with ShouldBroadcastNow. Am i missing something?

@Tofandel
Copy link
Contributor

@rennokki Debugging a little more, btw unused variable there, not that it's related

    public function broadcastOn()
    {
        $channelName = Str::after(DashboardLogger::LOG_CHANNEL_PREFIX.'statistics', 'private-');

        return new PrivateChannel(
            Str::after(DashboardLogger::LOG_CHANNEL_PREFIX.'statistics', 'private-')
        );
    }

@rennokki
Copy link
Collaborator

@Tofandel The problem is a bit tricky. Until we don't get a confirmation that laravel/framework#33952 is caused by Laravel via the broadcast() method, I assume changes from #475 to remove the broadcaster and add the auto-polling are completely valid.

It is a bit weird to call it like the leak is from this package because the problems start with calling the broadcast() method. 🤔

@rennokki
Copy link
Collaborator

Gonna merge it for 2.x and see what happens afterward.

This was referenced Aug 21, 2020
@rennokki
Copy link
Collaborator

@Tofandel So it seems like Laravel classified the issue as a bug 🤔 laravel/framework#33952

@francislavoie
Copy link
Contributor

@rennokki FYI, you should not be broadcasting inside of ReactPHP. That ends up using the synchronous Redis client (predis/phpredis) instead of the ReactPHP redis client which is async, non-blocking, and safe for use in the ReactPHP server loop.

@rennokki
Copy link
Collaborator

rennokki commented Sep 4, 2020

@mpociot pointed out this and going to be removed before 2.x gets released.

@rennokki
Copy link
Collaborator

rennokki commented Dec 1, 2020

The issue was caused by the Ignition package, see the quick-fix and follow-up coming from the Ignition team: #379 (comment)

@rennokki rennokki closed this as completed Dec 1, 2020
@francislavoie
Copy link
Contributor

@rennokki I don't actually see a fix anywhere? Following all the links, all I see are MRs that were closed/rejected for various reasons, none merged.

@rennokki
Copy link
Collaborator

rennokki commented Dec 1, 2020

@francislavoie This specific comment: laravel/framework#33952 (comment)

@ven0ms99
Copy link
Author

Hey there, for me the problem unfortunately still persists. Laravel websockets randomly stops working and supervisor does not restart the process as it does not crash. I've looked through supervisor logs, Nginx logs, Laravel logs, but there is no sign of any failure.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

6 participants