New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.0 : TIC Read Status 1:57 error spam #2140

Closed
BrandonNott opened this Issue Apr 12, 2018 · 15 comments

Comments

Projects
None yet
5 participants
@BrandonNott

BrandonNott commented Apr 12, 2018

When CouchbaseLite 2.0 cannot connect to sync gateway -
(i.e. incorrect network routing, sluggish VPN, sync gateway instance is down)

The console is spammed with this error which repeats indefinitely and never recovers.
The memory usage of the app continues to rise until the app crashes.

2018-04-12 09:16:24.374532-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57
2018-04-12 09:16:24.374706-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57
2018-04-12 09:16:24.374902-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57
2018-04-12 09:16:24.375077-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57
2018-04-12 09:16:24.490190-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57
2018-04-12 09:16:24.490464-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57
2018-04-12 09:16:24.490745-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57
2018-04-12 09:16:24.490953-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57
2018-04-12 09:16:24.491161-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57
2018-04-12 09:16:24.491366-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57
2018-04-12 09:16:24.491560-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57
2018-04-12 09:16:24.491789-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57
2018-04-12 09:16:24.492087-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57
2018-04-12 09:16:24.492318-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57
2018-04-12 09:16:24.492532-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57
2018-04-12 09:16:24.492775-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57
2018-04-12 09:16:24.493000-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57
2018-04-12 09:16:24.493236-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57
2018-04-12 09:16:24.493433-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57
2018-04-12 09:16:24.493648-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57
2018-04-12 09:16:24.493953-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57
2018-04-12 09:16:24.494686-0400 MyApp[4179:55526] TIC Read Status [2:0x6080001660c0]: 1:57

  • Version: CouchbaseLiteSwift 2.0 DB 23
  • Client OS: iOS 11.3
  • Server: Sync Gateway 2.0 beta 2
@pasin

This comment has been minimized.

Contributor

pasin commented Apr 12, 2018

TIC Read Status came from CFNetwork, and [1:57] means ENOTCONN. I will try to reproduce this today. I would be great if you can share logs with Replicator and Network Verbose enabled as well.

@snej

This comment has been minimized.

Member

snej commented Apr 12, 2018

In other words, the logging is from the OS, not from our code, so we can't make it go away. It's weird that it's happening so often though — we may be trying to retry connections too rapidly.

@BrandonNott

This comment has been minimized.

BrandonNott commented Apr 12, 2018

I am able to reproduce this issue in a local environment using docker-compose with the right conditions. I think what needs to happen is docker is listening on port 4984, but the sync gateway service has stopped.

@pasin

This comment has been minimized.

Contributor

pasin commented Apr 12, 2018

Can you add Database.setLogLevel(.debug, domain: .network)?

@BrandonNott

This comment has been minimized.

BrandonNott commented Apr 12, 2018

Here's the same behavior, different app, with verbose logging enabled

Database.setLogLevel(.verbose, domain: .all)
2018-04-12 14:23:40.056002-0400 MyApp[17078:249756] TIC Read Status [2:0x6000001645c0]: 1:57
2018-04-12 14:23:40.056113-0400 MyApp[17078:249865] CouchbaseLite WS Verbose: Received 0 bytes of HTTP response
2018-04-12 14:23:40.056156-0400 MyApp[17078:249754] TIC Read Status [1:0x604000166000]: 1:57
2018-04-12 14:23:40.056226-0400 MyApp[17078:249757] CouchbaseLite WS Verbose: Received 0 bytes of HTTP response
2018-04-12 14:23:40.056277-0400 MyApp[17078:249756] TIC Read Status [3:0x600000164740]: 1:57
2018-04-12 14:23:40.056355-0400 MyApp[17078:249865] CouchbaseLite WS Verbose: Received 0 bytes of HTTP response
2018-04-12 14:23:40.056394-0400 MyApp[17078:249756] TIC Read Status [2:0x6000001645c0]: 1:57
2018-04-12 14:23:40.056457-0400 MyApp[17078:249757] CouchbaseLite WS Verbose: Received 0 bytes of HTTP response
2018-04-12 14:23:40.056513-0400 MyApp[17078:249754] TIC Read Status [1:0x604000166000]: 1:57
2018-04-12 14:23:40.056662-0400 MyApp[17078:249865] CouchbaseLite WS Verbose: Received 0 bytes of HTTP response
@pasin

This comment has been minimized.

Contributor

pasin commented Apr 12, 2018

This is weird behavior. The CouchbaseLite WS Verbose: Received 0 bytes of HTTP response is logged from the completion handler when reading the HTTP response during establishing WebSocket connection time. I would expect more logging messages such as "CBLWebSocket connecting to..." and "CBLWebSocket Sent HTTP request..." in the same loop.

@snej

This comment has been minimized.

Member

snej commented Apr 12, 2018

It looks like the read completion handler (in -readHTTPResponse) is being called with an ENOTCONN error. But it calls -checkError:, which calls [self ignoreError: error], which returns true for ENOTCONN ... so the completion handler just appends zero bytes to the response and continues.

Apparently CFNetwork is confused by the task not getting closed, so it just loops and calls the completion handler again, which forms an infinite loop.

The comment in -ignoreError: says "We sometimes get bogus(?) ENOTCONN errors after closing the socket". But they shouldn't be ignored in this context because the socket is open. So something needs to be refactored so -ignoreError: only gets called in situations where the socket has been closed. Maybe take the call out of -checkError: and refactor it into the call sites where it makes sense?

@pasin

This comment has been minimized.

Contributor

pasin commented Apr 16, 2018

After reviewing the code, I believe we could just remove the ignoreError: check, then it will work. The -didCloseWithError: has already called -ignoreError: to check the error.

Note that I still couldn't reproduce this issue myself. Basically I couldn't manage to get ENOTCONN error. I have also tried to use docker but all I got is ECONNREFUSED.

@BrandonNott by any chance, would it be possible to reproduce the issue with .network log enabled? I would like to make sure that I don't miss anything as it's a bit weird to me that _requestedClose flag was set to true.

@pasin pasin closed this in fa0f920 Apr 16, 2018

@BrandonNott

This comment has been minimized.

BrandonNott commented Apr 16, 2018

This commit did not seem to help: fa0f920

@pasin I have emailed you additional logs as well as a docker-compose setup for recreating the issue locally.

@pasin

This comment has been minimized.

Contributor

pasin commented Apr 16, 2018

@BrandonNott I received some additional log but not the docker-compose setup. Can you send me the docker-compose setup to reproduce the issue?

@pasin pasin reopened this Apr 16, 2018

@pasin

This comment has been minimized.

Contributor

pasin commented Apr 16, 2018

Again thanks @BrandonNott for the docker-compose and the steps to reproduce the issue. The assumption that CFNetwork returns ENOTCONN as indicated by the TIC log messages is wrong. It turns out the completionHandler of the readDataOfMinLength:maxLength:timeout:completionHandler: ^(NSData* data, BOOL atEOF, NSError* error) is called in loop without error.

@pasin pasin closed this in d1230c6 Apr 16, 2018

@pasin

This comment has been minimized.

Contributor

pasin commented Apr 16, 2018

@BrandonNott I have re-fixed the issue. Could you please verify the fix from the master branch?

@BrandonNott

This comment has been minimized.

BrandonNott commented Apr 16, 2018

I can confirm that this fixed the issue! Thank you @pasin 💯

@pasin

This comment has been minimized.

Contributor

pasin commented Apr 17, 2018

Thanks for the confirmation!

pasin added a commit that referenced this issue Apr 17, 2018

Remove ignore error check from CBLWebSocket when checking an error
* The ignore error check has already been done in -didCloseWithError: which is the place where ignoring error is matter. We could remove the ignore error check from -checkError: method.
* This will fix #2140  that readHTTPResponse is called in loop when getting ENOTCONN error in some circumstances.

pasin added a commit that referenced this issue Apr 17, 2018

Re-fix #2140 that the completionHanlder when reading data from stream…
… was called in loop

Detect if the data has zero length and atEOF is true, then close the socket with kWebSocketCloseProtocolError error.
@snej

This comment has been minimized.

Member

snej commented Apr 17, 2018

@pasin That sounds like a bug in CFNetwork. If you give some more details I can file a bug report with Apple.

@djpongh djpongh added this to the 2.1.0 milestone Apr 23, 2018

pasin added a commit that referenced this issue May 4, 2018

Fixed #2148 by converting zero data w/ EOF into ENOTCONN error
* Refixed #2140 by converting zero data with EOF stream response into ENOTCONN error so that the retry logic could be kicked in.
* Made a speculative fix for #2150 by reverting fa0f920 commit.

pasin added a commit that referenced this issue May 19, 2018

Fixed #2148 by converting zero data w/ EOF into ENOTCONN error
* Refixed #2140 by converting zero data with EOF stream response into ENOTCONN error so that the retry logic could be kicked in.
* Made a speculative fix for #2150 by reverting fa0f920 commit.

pasin added a commit that referenced this issue May 22, 2018

Fix #2140 #2148 by converting zero data w/ EOF into ENOTCONN error
Fixed the issue by converting zero data with EOF stream response into ENOTCONN error so that the retry logic could be kicked in.

@pasin pasin added the bug label Jul 31, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment