Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack overflow (EXC_BAD_ACCESS) in cURL.swift #14

Open
mfreed7 opened this issue Nov 12, 2019 · 11 comments · May be fixed by #15
Open

Stack overflow (EXC_BAD_ACCESS) in cURL.swift #14

mfreed7 opened this issue Nov 12, 2019 · 11 comments · May be fixed by #15

Comments

@mfreed7
Copy link

mfreed7 commented Nov 12, 2019

Hello, I'm getting a stack overflow that causes an EXC_BAD_ACCESS crash in cURL.swift. See the stack trace below. It seems like the recursive call back to innerComplete at this line should be added to an async queue or something, rather than being directly recursive?

This is on a Mac running Catalina 10.15.1 (19B88). The stack trace below comes from Perfect-CURL version 3.1.0, but I get this on any 3.x or 4.x version that I've tried. This happens typically when waiting for a CURL response from a slow server that takes a little while (~5-10 seconds) to respond.

Any help you can give me would be really appreciated.

...
#10 0x0000000100f945d8 in closure #2 in CURLResponse.innerComplete(:) at Perfect-CURL/Sources/PerfectCURL/CURLResponse.swift:206
#11 0x0000000100f9b3a5 in CURL.ioWait(
:) at Perfect-CURL/Sources/PerfectCURL/cURL.swift:222
#12 0x0000000100f942d7 in CURLResponse.innerComplete(:) at Perfect-CURL/Sources/PerfectCURL/CURLResponse.swift:205
#13 0x0000000100f945d8 in closure #2 in CURLResponse.innerComplete(
:) at Perfect-CURL/Sources/PerfectCURL/CURLResponse.swift:206
#14 0x0000000100f9b3a5 in CURL.ioWait(:) at Perfect-CURL/Sources/PerfectCURL/cURL.swift:222
...
#1298 0x0000000100f9b3a5 in CURL.ioWait(
:) at Perfect-CURL/Sources/PerfectCURL/cURL.swift:222
#1299 0x0000000100f942d7 in CURLResponse.innerComplete(:) at Perfect-CURL/Sources/PerfectCURL/CURLResponse.swift:205
#1300 0x0000000100f945d8 in closure #2 in CURLResponse.innerComplete(
:) at Perfect-CURL/Sources/PerfectCURL/CURLResponse.swift:206
#1301 0x0000000100f9bde7 in closure #1 in CURL.ioWait(_:) at Perfect-CURL/Sources/PerfectCURL/cURL.swift:241

mfreed7 added a commit to mfreed7/Perfect-CURL that referenced this issue Nov 19, 2019
I believe this fixes [Issue PerfectlySoft#14](PerfectlySoft#14), but please take a look.
@mfreed7 mfreed7 linked a pull request Nov 19, 2019 that will close this issue
@RockfordWei
Copy link
Contributor

Can you please scratch a script to reproduce this issue?

@mfreed7
Copy link
Author

mfreed7 commented Dec 2, 2019

It is intermittent, and (based on the code and my experience with it) depends on the length of time the response takes to arrive. I typically see this condition when using a slow/flaky network connection. The code below is roughly where I see it - nothing too special.

You can see the recursive code path, in the stack trace, right? That would seem to depend on new bytes coming in fairly rapidly, to avoid exhausting the stack while waiting for new data. If you see the pull request I made, I just queue the recursive call to avoid exhausting the stack. Seems to work on my end.

 import PerfectCURL
    private func broken() {
        while (true) {
            let obj = CURLRequest("http://api.bart.gov/api/etd.aspx?cmd=etd&orig=ALL&key=MW9S-E7SL-26DU-VV8V&json=y")
            obj.options.append(.connectTimeout(10))
            obj.options.append(.timeout(10))
            obj.perform {}
        }
    }

@RockfordWei
Copy link
Contributor

I didn't find anything wrong for the following snippet:

import PerfectCURL
import Foundation

func runAsyncAndJoin(completion: @escaping (Bool) -> ()) {
	let obj = CURLRequest("http://api.bart.gov/api/etd.aspx?cmd=etd&orig=ALL&key=MW9S-E7SL-26DU-VV8V&json=y")
	obj.options.append(.connectTimeout(10))
	obj.options.append(.timeout(10))
	obj.perform { resp in
		do {
			let result = try resp()
			print(result.bodyString)
			completion(true)
		} catch (let err) {
			print ("error", err)
			completion(false)
		}
	}
}

let start = time(nil)
var total = 0
var ok = 0
var fault = 0
let lock = DispatchSemaphore(value: 1)

for _ in 0..<100 {
	runAsyncAndJoin { success in
		lock.wait()
		if success {
			ok += 1
		} else {
			fault += 1
		}
		lock.signal()
		total += 1
	}
}
while(total < 99) {
	sleep(1)
}
let end = time(nil)
print("all done. total: ", end - start, "seconds")
print("ok:", ok, "fault:", fault)

Package.swift:

// swift-tools-version:5.1
// The swift-tools-version declares the minimum version of Swift required to build this package.

import PackageDescription

let package = Package(
    name: "curltest",
    dependencies: [
        // Dependencies declare other packages that this package depends on.
        .package(url: "https://github.com/PerfectlySoft/Perfect-CURL.git", from: "4.0.1"),
    ],
    targets: [
        // Targets are the basic building blocks of a package. A target can define a module or a test suite.
        // Targets can depend on other targets in this package, and on products in packages which this package depends on.
        .target(
            name: "curltest",
            dependencies: ["PerfectCURL"]),
    ]
)

can you confirm it? It's pretty fast actually, 4 seconds for 100 requests, nothing wrong, all good.

all done. total:  4 seconds
ok: 99 fault: 0

@mfreed7
Copy link
Author

mfreed7 commented Dec 2, 2019

Your snippet looks fine, but from your description, your testing conditions are not correct. As I mentioned, this happens on a slow/flaky network. In that condition, one of your 100 requests will take 10+ seconds to complete. Your 100 requests should take several minutes.

@RockfordWei
Copy link
Contributor

No, still negative. This time I setup a local server like this:

import PerfectHTTP
import PerfectHTTPServer
import Foundation
func handler(request: HTTPRequest, response: HTTPResponse) {
  // Respond with a simple message.
  response.setHeader(.contentType, value: "text/json")
  response.appendBody(string: "{\"error\":0}")
  sleep(8) // sleep for 8 seconds to simulate the delay
  response.completed()
}

var routes = Routes()
routes.add(method: .get, uri: "/", handler: handler)
try HTTPServer.launch(name: "localhost",
            port: 8888,
            routes: routes)

Then every line of the above snippet keeps the same except it was using let obj = CURLRequest("http://localhost:8888")

Then it yielded like this:

error Error(code: 28, description: "Timeout was reached", response: PerfectCURL.CURLResponse)
error Error(code: 28, description: "Timeout was reached", response: PerfectCURL.CURLResponse)
all done. total:  13 seconds
ok: 64 fault: 36
Program ended with exit code: 0

That's the expected outcome, still not crashed - cannot reproduce your issue. Sorry about it.

@mfreed7
Copy link
Author

mfreed7 commented Dec 2, 2019

So, couple comments:

  • why did your example time out? You added an 8 second delay but the request timeout is 10 seconds.
  • your example simply waits 8 seconds to respond at all. Perhaps with a truly slow/flaky network, with the initial response coming sooner, and additional bytes trickling in later, you would hit the stack trace from the comment above.
  • in general, how do you account for the stack trace shown above? It clearly shows a fast recursive path, no?

@mfreed7
Copy link
Author

mfreed7 commented Dec 2, 2019

(Perhaps also try turning off the 10 second timeout from my example. That should make it easier to recreate the stack overflow. I only added it to my example because that’s how I’m using it.)

@RockfordWei
Copy link
Contributor

No, still negative. This time I was using a TCP server to simulate wait to accept:

#include <arpa/inet.h>
#include <netdb.h>
#include <netinet/in.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <unistd.h>

int main(void) {
    int listener = socket(AF_INET, SOCK_STREAM, 0);
    int opt = 1;
    setsockopt(listener, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
    struct sockaddr_in host;
    memset(&host, 0, sizeof(host));
    host.sin_family = AF_INET;
    host.sin_port = htons(8888);
    bind(listener, (struct sockaddr *)&host, sizeof(host));
    listen(listener, 2);
    do {
        sleep(8);
        int client = accept(listener, 0, 0);
        if (client > 0) {
            shutdown(client, SHUT_RDWR);
            close(client);
        }
    } while(1);
    return 0;
}

Then the client snippet was complaining

error Error(code: 56, description: "Failure when receiving data from the peer", response: PerfectCURL.CURLResponse)
error Error(code: 56, description: "Failure when receiving data from the peer", response: PerfectCURL.CURLResponse)
error Error(code: 56, description: "Failure when receiving data from the peer", response: PerfectCURL.CURLResponse)

and then

rror Error(code: 28, description: "Timeout was reached", response: PerfectCURL.CURLResponse)
all done. total:  10 seconds
ok: 0 fault: 99

Still expected and not crashed. Can't reproduce your issue.

@mfreed7
Copy link
Author

mfreed7 commented Dec 9, 2019

Ok. Well I can confirm that, as I mentioned, on a slow network (tethered to a cellphone on 3G) the stack trace in my initial comment will occur, intermittently. And the code in my pull request reliably solves the stack overflow. Is there any issue with that code? Can you see, from code inspection, the possibility for the stack overflow I mentioned in my initial comment? I can. I suppose I can just use my pull request as the source for my project, if you're not interested in accepting the pull request. Let me know.

@RockfordWei
Copy link
Contributor

Hi, we are definitely interested in your PR, but the point is that many users are using it so it is really necessary to us to understand what happened and what the consequence might be if an extra thread dispatch applied - CPU usage, memory violation, etc., to outstanding the quality of the whole product. Please be patient and allow us to do more tests on it, seriously. Thank you.

@mfreed7
Copy link
Author

mfreed7 commented Dec 17, 2019

Sorry, perhaps I misunderstood the tone of your messages - it sounded like you were eager to give up on reproducing the problem. I just hit this issue again, same stack trace, same URLs, same slow 3G mobile network. This happens when the latency is about 1000ms - each packet takes ~1s to traverse the network. Is there any way you can simulate that on your end? Simply adding a long single delay before responding will not reproduce the problem. I'm guessing it is after completing the initial handshake and the data is being transferred. Your code above doesn't look like it actually sends any HTTP responses, it just accepts the initial connection.

Perhaps you could use something like Charles or the Network Link Conditioner for MacOS to simulate 1s latency?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants