New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed connections and timeouts on TcpTransport (was: Incomplete write: Only 0 of 358 written) #78
Comments
One more comment. Log from GrayLog about this error: 2017-01-12T11:45:15.267+03:00 ERROR [NettyTransport] Error in Input |
I try 3 times to retry fwrite and it didn't help. Maybe you have some other suggestions about this bug? Everithing is normal if i try to write this message manual (use php gelf but send this message one time). And it's work if i try to work with graylog on reserve server (copy of the main server). |
Some questions:
|
|
I also have this message on my developement computer, so in case it can help: I’m playing with Graylog before (perhaps) put it in production, and I got this message when the Graylog server is not running, and the UDP port is closed. To be sure I monitored the traffic with Wireshark and I see the UDP packet sent then an ICMP response "Destination unreachable (Port unreachable)"; there 4 (unsuccessful) tries. @lashnag perhaps you could try to monitor the traffic to hopefully obtain some interesting informations like ICMP responses, perhaps there are intermittent network errors. |
@Seb35 I set the tcpdump. And there is no errors in the log file. |
What do you get with the improved method? |
@bzikarsky sorry, what do you mean? |
I pushed a change to master in d98632d -- Can you run your tests again and see what happens? |
@bzikarsky I have error from this:
RuntimeException in /data/home/projects/payprocessing/classes/vendor/graylog2/gelf-php/src/Gelf/Transport/StreamSocketClient.php:218 |
And what does the error say? I assume you get an exception-message |
@bzikarsky I can send to you stack trace: 2017-01-19 17:44:11 [176.65.120.6][-][-][error][RuntimeException] |
The exception does not have a message at all? Or don't you log the message? Maybe dump it explicitly? |
We have the same issue here... Is it related to the config of graylog you think? |
@bzikarsky in commit https://github.com/bzikarsky/gelf-php/commit/d98632dfbf47a0084ae935653f34bf5d40bfa04f you write this code:
So exception was created from here and you don't send any message when you throw it
So, what message do you wan't to get? From log on the graylog? |
@Nikoms I don't know. But the exception don't say that there is some problems with graylog or socket was closed when i try to write. Also, i asked about this problem in google community of graylog. Here is the link https://groups.google.com/forum/#!topic/graylog2/QPaMCi3fQHA and have got the answer: |
@lashnag It's strange that it's only on one server (reserve vs main) |
@Nikoms I think it could depends on hight load or not. Both of servers have the same architecture, software and hardware. |
We also have a "random" issue during our "long cron jobs". Is it possible that the connection is closed like mysql for example? |
@lashnag: I'm sorry, that one just sneaked by. I readded the error-message to the exception. @Nikoms: I think it's possible the TCP connection will drop if you don't log anything over the connection for a certain amount of time. I consider adding |
@bzikarsky I checked our config and it our default socket timeout is set to 60s. Monday, I'll check if I still have this problem by changing it with ini_set. I will also check with a timeout to 1 second and adding a sleep... I'll keep you in touch |
Mhh changing "default socket timeout" to 1 sec does not change anything. Even with a sleep 5, the connection is not closed. So it may not be that |
@Nikoms A can say more! I try to start big script on the reserve server, which write something in db. The was 100 000 rows. And every write in db = write in gray log. It worked hard 15 minutes. And the is no errors... |
@Nikoms default-socket-timeout is only slightly related to SO_KEEPALIVE. SO_KEEPALIVE enables keepalive-packets on the TCP connection, so it does not get shut down (neither host nor client). The timeout only relates to PHP. |
ok thx |
@Nikoms It worked? |
@lashnag no ... I think I will contact our devops to see if they have an idea. Right now, I just made a dirty "try/catch" and I log the record in a json file... |
Maybe that helps: I also got the error message using tag 1.5.3. It showed that it happens (reproducable) when the socket resource is used more than once. Placing Not really sure what it would mean performance-wise to create the socket with every write though. @bzikarsky I tried your modified master as well, and it didn't seem to solve the issue |
@Nikoms I'll be waiting |
I'm getting the same error if graylog host isn't reachable. For me the modified master from @bzikarsky doesn't work. The fix @carstenwindler mentioned works for me. |
@carstenwindler Could you send pull request and @bzikarsky could you get it and push to master. And i will try it? |
Just so we are on the same page: The 2 commits in master don't try to fix the problem. They should only give a better error message. @carstenwindler Forcing the socket to close down (and therefore reopening it) for every write seems like a bad idea. This will probably work, when the connection timed out (or was closed another way), but doesn't address the problem. As mentioned earlier, one thing we can try is adding SO_KEEPALIVE to the tcp connection, to see what happens. This should help in a situations of long running jobs which log very infrequently. If the random disconnects still occur, the affected people should check their network setup. I can't think of a reason a TCP connection should get dropped if both client and server are happily keeping up their end of the connection. Maybe (but I'm quite sure if I like it) we can handle a failed write with a reconnect-retry. But ... this is messy. Since I'm quite busy currently I'll happily discuss/accept PRs on this matter. Thanks for your patience. |
@bzikarsky yes I agree. I just wanted to share what I found out during debugging. |
@carstenwindler I have the problem with new commit:
So, i think, that the problem in the network or in GrayLog. What do you think? |
@carstenwindler @bzikarsky https://moodle.org/plugins/view.php?id=1756 - here is the same problem. Do you think i need to write in official graylog support? |
And i see that UDP transport use fwrite function too. But there is no same exception. I've tried. But i'm not sure, that fwrite coult return 0 bytes, if it use to write using UDP protocol. |
Regarding timeout: Yes, it looks like a network/configuration problem. Somehow the server with gelf-php can't open a TCP connection to your graylog2-server Regarding moodle: The moddle plugin uses this library. if gelf-php has a bug, moodle is also affected Regarding UDP: UDP works differently than TCP. There is no connection in UDP, just packets. You don't get notified when those are dropped or they never reach their destination. (On the other hand it's faster and in many usecases it's reliable enough for logging) |
@bzikarsky Do you know how fwrite function work in low level? How it implemented in C++? I logged returned value from fwrite function, when i try to use UDP, and there is not 0 and not NULL. So what it's it mean? It's mean some that fwrite function return some default value to compliance phpdoc? |
@lashnag Sorry, I can't follow you. fwrite's behavious is documented at http://php.net/fwrite. Quote: "Return Values: fwrite() returns the number of bytes written, or FALSE on error.". This does not change with the type of stream. Since I also turn off blocking for UDP this should always correspond to the number of bytes in the string passed to fwrite. (And just as a heads up: PHP is written in C not C++) @usmonster Interesting. The fix itself should not trigger the error. It just makes it visible that some log-messages did not get sent/written properly before. Can you confirm this? |
@bzikarsky, you're absolutely right. I just had some time to properly debug, and I see that the source of the error is not gelf--sorry for the noise! 😶 |
@bzikarsky So, do you think, if i have the same error in UDP i should get exception messeges? I mean, it there will be the same problem fwrite return 0 like when i use TCP connection? |
@bzikarsky I tested another workaround, this time by closing and re-building the socket only if write is incomplete (just a quick hack, no final solution) works fine for our project (we use UDP), probably it's worth considering? |
@lashnag There cannot be the same error on UDP. UDP is stateless, there is no connection. Therefore you will not get any feedback if your messages cannot reach their destination. @carstenwindler I'll look into this. |
@bzikarsky I understand this, but i don't understand than what does it mean "return value of fwrite function" if i can't reach destination sended packages |
Hey, I will jump in, as we get this error too since some time. We use UDP transport, release 1.5.3, so definitely happens there too. In our case, it's a long running cron job that at some point can't reach graylog (usually happens for a few minutes when the indices rotate). I think nobody saw this before because it was not throwing an exception before 90ed711, so I think the best way is that everybody catches this exception in their code. LE: given that Monolog & graylog should never throw exceptions in production, and that we already have a file transport that should never fail, we now use this elegant trick:
|
I checked the Graylog GUI for TCP inputs today: It allows setting TCP Keepalive. Is this enabled when you come across those connection timeouts? |
I have the same problem in UDP when logging to graylog2 and the service is down, so easy enough to reproduce I guess. |
Just to help, I updated to 1.5.4 and I figured a problem problably related to this issue:
in 221. if ($failed || $byteCount === false) {
222. throw new \RuntimeException($errorMessage);
223. } |
Error 111 (or This is in almost all cases related to network- or serverside problems. |
I have the same issue so that I use UDP transport just for avoid these errors :/ Sometimes it work but sometimes I have this error message. |
I'll recommend https://stackoverflow.com/questions/41231270/econnrefused-errors-on-udp-sendto (especially the answer) for reading why ECONNREFUSED can also occur on UDP. If you want to silence those errors, please use the |
I'll add in docs in #90 for this and similiar issues and will close this now |
Do you mean that i need to use UDP and mute all problems? A cant'be sure that sended package will reache the log server? Do you have other decisions? |
If you use TCP, you get a guarantee of delivery. But this requires a working TCP connection. If it does not work, you'll definitly get an exception. If you use UDP, you get no guarantees. You may get an exception if there is the host cannot be resolved or there is no service listening on the destination post. If you do not want exceptions, wrap the transport in the I really cant do anything else. You can make sure that a) your destination host and port is configured correctly and is reachable |
Hello! I have bug when i write log to GrayLog on the large load project. There is the bug:
Incomplete write: Only 0 of 358 written in /data/home/projects/payprocessing/classes/vendor/graylog2/gelf-php/src/Gelf/Transport/StreamSocketClient.php:212
Here is the message send to socket:
{
"version": "1.0",
"host": "pay-1.reserve.lan",
"short_message": "Redirect to https://www.platron.ru/payment_params.",
"full_message": "Redirect to https://www.platron.ru/payment_params.php?customer=5d44643437990b1774efb742ed1fb9a031005685\r\n(Process number: 84073)",
"level": 6,
"timestamp": 1484144247.0146,
"facility": "paypocessing",
"file": "Platron::payment"
}
I try to send id manual - and it's worked! So the problem in fwrite function. There is a notice in php documentation about fwrite function:
http://php.net/manual/en/function.fwrite.php
Some people say that when writing to a socket not all of the bytes requested to be written may be written. You may have to call fwrite again to write bytes that were not written the first time. (At least this is how the write() system call in UNIX works.)
I think you need to try fwrite_retry several times, before throw error, or, maybe, use some other function.
The text was updated successfully, but these errors were encountered: