Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

too big CPU load & event loop stall on massive socket close #894

Open
allright opened this issue Mar 10, 2019 · 63 comments
Open

too big CPU load & event loop stall on massive socket close #894

allright opened this issue Mar 10, 2019 · 63 comments

Comments

@allright
Copy link

allright commented Mar 10, 2019

Expected behavior

[what you expected to happen]
no stalls on socket close

Actual behavior

[what actually happened]
stalls for 10-30 seconds, up to disconnect by timeout

Steps to reproduce

video: https://yadi.sk/i/ZmAu8La5zLWfSg. ( to look in the best quality you can download file)
sources: https://github.com/allright/swift-nio-load-testing/tree/master/swift-nio-echo-server
commit: allright/swift-nio-load-testing@a461c72

VPS: 1 CPU 512 RAM ubuntu 16.0.4

  1. download and compile swift-nio-echo-server (based on NIOEchoServer from swift-nio)
  2. compile release & run
  3. connect to server manually by telnet client
  4. run
    tcpkali -c 20000 --connect-rate=3000 --duration=10000s --latency-connect -r 1 -m 1 echo-server.url:8888
  5. wait until server will have > 15000 connections
  6. during wait type in telnet & look echo response immediately
  7. stop tcpkali by Ctrl+C
  8. type in telnet & DO NOT RECEIVE ANY RESPONSE!
  9. Wait some time 10...30 seconds, until all connections will be closed by timeout
  10. Type in telnet & have echo response immediately (sometimes telnet may be disconnected by timeout 30 sec in code)

root@us-san-gate0:~/swift-nio-load-testing/swift-nio-echo-server# cat Package.resolved
{
"object": {
"pins": [
{
"package": "swift-nio",
"repositoryURL": "https://github.com/apple/swift-nio.git",
"state": {
"branch": "nio-1.13",
"revision": "29a9f2aca71c8afb07e291336f1789337ce235dd",
"version": null
}
},
{
"package": "swift-nio-zlib-support",
"repositoryURL": "https://github.com/apple/swift-nio-zlib-support.git",
"state": {
"branch": null,
"revision": "37760e9a52030bb9011972c5213c3350fa9d41fd",
"version": "1.0.0"
}
}
]
},
"version": 1
}

Swift version 4.2.3 (swift-4.2.3-RELEASE)
Target: x86_64-unknown-linux-gnu
Linux us-san-gate0 4.14.91.mptcp #12 SMP Wed Jan 2 17:51:05 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

PS.
the same echo server, but implemented on C++ ASIO, has not such problem. Can apply source codes(C++) & video if needed

@allright
Copy link
Author

allright commented Mar 11, 2019

I have just profiled whats happening on the "stall moment"

You can open perf-kernel.svg in any Browser to look performance Graph
Perf.zip

Too much objects release in the same moment blocks Event Loop.
Can we fix it?
Workarounds:

  1. Is it possible to schedule 50% of event loop time to handle all events except releasing objects, and 50% for other tasks?
    May be we need something like Managed GarbageCollector (or " smooth release objects manager" may be thing like DisposeBag in RxSwift ? )

  2. Schedule channel release at random time after closing socket from client?

  3. Closing 25000 connects in one thread cause 30 seconds hang, but if I make 4 EventLoops - telnet hangs only for 7.5 second.
    So not more 6000 connections per event loop is possible.

Tools used for perf monitoring:
http://www.brendangregg.com/perf.html
http://www.brendangregg.com/perf.html#TimedProfiling

2019-03-11_10-12-10

@weissi
Copy link
Member

weissi commented Mar 11, 2019

ouch, thanks @allright , we'll look into that

@allright
Copy link
Author

allright commented Mar 12, 2019

One more possible design - is provide the FAST custom allocator/deallocator (like in std C++) for promises. Which really have preallocated memory & not really calls malloc/free every time object deallocated or calls it one time for big group of objects.
So my idea is group allocations/deallocations 1 alloc for 1000 promises, or 1 alloc/dealloc per second.
So we can attach this custom allocator/deallocator to each EventLoop.

Another possible design - is object reuse pool. Really, it can preallocate many needed objects at the app start & deallocate it only on app stop. Or manage it automatically. Real server application usually tuned on the place for maximum possible connections/speed - so we do not need real retain/dealloc during app life (just only on start/stop).

@weissi What do you think?

@weissi
Copy link
Member

weissi commented Mar 12, 2019

@allright Swift unfortunately doesn’t let you choose the allocator. It will always use malloc. Also from your profile it seems to be the reference counting rather than the allocations, right?

@allright
Copy link
Author

allright commented Mar 12, 2019

@weissi Yes, I think it is reference counting.
So we still can use special factories (objects reuse pools) for lightweight create/free objects, attached to each EventLoop. In these pools object may not really to be deallocated, just reinit before next allocation.

@weissi
Copy link
Member

weissi commented Mar 12, 2019

@weissi Yes, I think it is reference counting.
So we still can use special factories (objects reuse pools) for lightweight create/free objects, attached to each EventLoop. In these pools object may not really to be deallocated, just reinit before next allocation.

but the reference counting operations are inserted automatically by the Swift compiler. They happen when something is used. Let's say you write this

func someFunction(_ foo: MyClass()) { ... }

let object = MyClass()
someFunction(object)
object.doSomething()

then the Swift compiler might emit code like this:

let object = MyClass() // allocates it with reference count 1
object.retain() // ref count + 1, to pass it to someFunction
someFunction(object)
object.retain() // ref count + 1, for the .doSomething call
object.doSomething()
object.release() // ref count - 1, because we're out of someFunction again
object.release() // ref count - 1, because we're doing with .doSomething
object.release() // ref count - 1, because we no longer need `object`

certain reference counts can be optimised but generally Swift is very noisy with ref counting operations and we can't remove them with object pools.

@allright
Copy link
Author

allright commented Mar 12, 2019

Yes. Not all. But for example, Channel Handlers may be allocated/deallocated using factory

let chFactory = currentEventLoop().getFactory() // or createFactoryWithCapacity(1000) let channelHandler = chFactory.createChannelHandler() // real allocation here (or get from preallocated)

// use channelHandler
chFactor.release(channelHandler) // ask chFactory that this channelHandler may be reused // no release or retain here! let reusedChannelHandler = chFactory.createChannelHandler() // reinited channel handler

So - use this way for each object like Promise and etc..

@weissi
Copy link
Member

weissi commented Mar 12, 2019

@allright sure, you could even implement this today for ChannelHandlers. The problem is the number of reference count operations will be the same.

@allright
Copy link
Author

allright commented Mar 12, 2019

Yes, the number of operations is the same, but the moment of operations is not the same.
We can make this operations at the app exit, or when no load on event loop. We are able to prioritise operation, and give event loop time to most important tasks (like send/receive messages to already established connections).

Really, how to use swift-nio framework on the big production servers with millions connections? One event loop per 6000 handlers?
We have to find out best solution.

@weissi
Copy link
Member

weissi commented Mar 12, 2019

Yes, the number of operations is the same, but the moment of operations is not the same.

That's not totally accurate. If you take out a handler from a pipeline, there will be reference counts changed, whether that handler will be re-used or not. Sure, if they are re-used, then you don't need do deallocate which causes even more reference count decreases.

We can make this operations at the app exit, or when no load on event loop. We are able to prioritise operation, and give event loop time to most important tasks (like send/receive messages to already established connections).

Really, how to use swift-nio framework on the big production servers with millions connections? One event loop per 6000 handlers?
We have to find out best solution.

Totally agreed. I'm just saying that caching your handlers (which you can do today, you don't need anything from NIO) won't remove all reference count traffic when tearing down the pipeline.

@allright
Copy link
Author

I see. Let's try to fix what we can and test!)
Even preventing massive deallocations will improve performance.

@allright
Copy link
Author

Also, as in performance graph I think reference count change not takes a lot of time.
There are __swift_retain_HeapObject - means allocation, but not increment reference count only.

So lets optimise alloc/dealloc speed by reuse pools or any other way.

@weissi
Copy link
Member

weissi commented Mar 12, 2019

I see. Let's try to fix what we can and test!)

What you could do is store a thread local NIOThreadLocal<CircularBuffer<MyHandler>> on every event loop. Then you can

let threadLocalMyHandlers = NIOThreadLocal<CircularBuffer<MyHandler>>(value: .init(capacity: 32))
extension EventLoop {
    func makeMyHandler() -> MyHandler {
        if threadLocalMyHandlers.value.count > 0 {
            return threadLocalMyHandlers.value.removeFirst()
        } else {
            return MyHandler()
        }
    }
}

and in MyHandler:

func handlerRemoved(context: ChannelHandlerContext) {
    self.resetMyState()
    threadLocalMyHandlers.value.append(self)
}

(code not tested or compiled, just as an idea)

Even preventing massive deallocations will improve performance.

agreed

@allright
Copy link
Author

good idea) will test later)

@weissi
Copy link
Member

weissi commented Mar 12, 2019

Also, as in performance graph I think reference count change not takes a lot of time.
There are __swift_retain_HeapObject - means allocation, but not increment reference count only.

__swift_retain_...HeapObject is also just in increment of the reference count. Allocation is swift_alloc, swift_allocObject and swift_slowAlloc.

The reason HeapObject is in the symbol name of __swift_retain...HeapObject is because it's written in C++ and in C++ the parameter types are name-mangled into the symbol name.

@allright
Copy link
Author

Also, as in performance graph I think reference count change not takes a lot of time.
There are __swift_retain_HeapObject - means allocation, but not increment reference count only.

__swift_retain_...HeapObject is also just in increment of the reference count. Allocation is swift_alloc, swift_allocObject and swift_slowAlloc.

hm ...

@allright
Copy link
Author

allright commented Mar 13, 2019

CircularBuffer<MyHandler>

I have just tested this, but it is not enough (a lot of Promises cause retain/release), and these promises must be reused too. But I figured out that stalls happen while massive handlerRemoved function called. So I think the best solution will be to automatically distribute in time invokeHandlerRemoved() ... calling.
It must be not > 100 invokeHandlerRemoved() invokes per second (for example) - depends of CPU performance. May be add special deferred queue for call invokeHandlerRemoved() ??? It will be smart garbage collector per EventLoop.
@weissi is it possible to apply this workaround?

@Lukasa
Copy link
Contributor

Lukasa commented Mar 13, 2019

handlerRemoved is invoked in a separate event loop tick by way of a deferred event loop execute call. Netty provides a hook to limit the number of outstanding tasks that execute in any event loop tick. While I don't know if we want the exact same API, we may want to investigate whether we should provide tools to prevent scheduled tasks from starving I/O operations.

@allright
Copy link
Author

"limit the number of outstanding tasks that execute in any event loop tick"
Yes, EventLoop mechanics means that every operation is very small. And only prioritisation can help in this case. I think it is good Idea.
Two not dependent ways for optimise:

  1. Reuse objects (all promises and channel handlers myst be reused to prevent massive alloc/dealloc)
  2. Prioritisation (one of possible implementations is limiting not hi priority tasks per one tick).

@allright
Copy link
Author

allright commented Mar 14, 2019

In Real world:
We have limited resources on server.
Simple example 1 CPU Core + 1 GB Ram (it may covers up to 100000 tcp connection or 20000 ssl).
So real server will be tuned and limited for maximum connections due to RAM & CPU limitations.
And.....

Server do not need dynamic memory allocation/deallocation during processing.
Swift-nio pipeline:

EchoHandler() -> BackPressureHandler() -> IdleStateHandler() -> ... some other low level handlers like TCP and etc...
We can preallocate and reuse 100000 Pipelines with All they needs, not only Handlers, but all Promises too:

EchoHandler: 100000
BackPressureHandler: 100000
IdleStateHandler: 100000
Promise: 10* 100000 = 1000000

It completely solves our problem - no massive allocations/deallocations during processing.

Possible steps to implement:

  1. Move the ownership of all Promises to the common base ChannelHandler class.
  2. Make the factory interface for creating & reiniting ChannelHandlers in ReusePool.
    May be easier will be reuse the ChannelPipeline object (I'm still not have deep diving into source codes yet).

P.S.
I faced with slow accepting of incoming TCP connections in comparison with C++ boost::asio.
So I think the reason is slow memory allocation.

@AnyCPU
Copy link

AnyCPU commented Apr 23, 2019

I have gotten an issue using the Vapor based on the SwiftNIO (vapor/vapor#1963)
I guess it belongs to this issue.
Does any workaround exist?

@weissi
Copy link
Member

weissi commented Apr 23, 2019

@AnyCPU your issue isn't related to this.

@AnyCPU
Copy link

AnyCPU commented Apr 23, 2019

@weissi is it related to SwiftNIO?

@weissi
Copy link
Member

weissi commented Apr 23, 2019

is it related to SwiftNIO?

I don't think so but we'd need more information to be 100% sure. Let's discuss this on the Vapor issue tracker.

@allright
Copy link
Author

allright commented Apr 23, 2019

I think it is related to Swift-NIO architect. Too much Feature/Promise alloc/dealloc per one connection.
The only way is preallocate and defer deallocation of resources during processing.
I think we also have a problem on massive accepting a lot of new incoming tcp connections in comparison with ASIO (cpp library).

@weissi
Copy link
Member

weissi commented Apr 23, 2019

I think it is related to Swift-NIO architect. Too much Feature/Promise alloc/dealloc per one connection.
The only way is preallocate and defer deallocation of resources during processing.

Your graph above shows that most of the overhead is in retain and release. That would not go away if we pre-allocated.

@allright
Copy link
Author

@AnyCPU your issue isn't related to this.

I recommend you to do workaround. So create more threads (approx not more 5000 connections per thread)

@allright
Copy link
Author

allright commented Apr 23, 2019

I think it is related to Swift-NIO architect. Too much Feature/Promise alloc/dealloc per one connection.
The only way is preallocate and defer deallocation of resources during processing.

Your graph above shows that most of the overhead is in retain and release. That would not go away if we pre-allocated.

  1. malloc, retainCount++. 0->1. (a lot of time)
  2. retainCount++ 1->2
  3. retainCount++ 2->3
    ....
  4. retainCount-- 3->2
  5. retainCount-- 2->1
  6. retainCount--, free() 1->0 (a lot of time)

I'm not think that atomic increment/decrement of retain count takes a lot of time.
But the very first malloc and the last free - takes a lot of time.

Could you test this hypothesis?

@weissi
Copy link
Member

weissi commented Apr 23, 2019

I'm not think that atomic increment/decrement of retain count takes a lot of time.
But the very first malloc and the last free - takes a lot of time.

It's an atomic increment and decrement. Just check your own profile, the ZN14__swift_retain_... is just an atomic ++

@allright
Copy link
Author

@allright
Copy link
Author

2019-04-23_22-46-55

Look at this comment:
https://github.com/apple/swift/blob/48d8ebd1b051fba09d09e3322afc9c48fabe0921/benchmark/single-source/ObjectAllocation.swift#L15

It says - problem in alloc/dealloc
// 53% _swift_release_dealloc
// 30% _swift_alloc_object
// 10% retain/release

@weissi
Copy link
Member

weissi commented Apr 23, 2019

@weissi , Do you think that this function too slow?
https://github.com/apple/swift/blob/48d8ebd1b051fba09d09e3322afc9c48fabe0921/stdlib/public/SwiftShims/RefCount.h#L736

Well 'slow' is relative but in your profile up top, that function took about 30% of the time.

@weissi
Copy link
Member

weissi commented Apr 23, 2019

Look at this comment:
https://github.com/apple/swift/blob/48d8ebd1b051fba09d09e3322afc9c48fabe0921/benchmark/single-source/ObjectAllocation.swift#L15

Yes, but that very very much depends on the state of the CPU caches and what's on which cacheline etc.

@allright
Copy link
Author

// 53% _swift_release_dealloc
// 30% _swift_alloc_object
// 10% retain/release

Really it means that 83% of time takes malloc/free.
So may be http://jemalloc.net - is good decision.

@allright
Copy link
Author

@weissi Yes, I think it is reference counting.
So we still can use special factories (objects reuse pools) for lightweight create/free objects, attached to each EventLoop. In these pools object may not really to be deallocated, just reinit before next allocation.

but the reference counting operations are inserted automatically by the Swift compiler. They happen when something is used. Let's say you write this

func someFunction(_ foo: MyClass()) { ... }

let object = MyClass()
someFunction(object)
object.doSomething()

then the Swift compiler might emit code like this:

let object = MyClass() // allocates it with reference count 1
object.retain() // ref count + 1, to pass it to someFunction
someFunction(object)
object.retain() // ref count + 1, for the .doSomething call
object.doSomething()
object.release() // ref count - 1, because we're out of someFunction again
object.release() // ref count - 1, because we're doing with .doSomething
object.release() // ref count - 1, because we no longer need `object`

certain reference counts can be optimised but generally Swift is very noisy with ref counting operations and we can't remove them with object pools.

Also this code is not a problem, if retain/release in the middle takes only 10% of time.
So we can optimise malloc/free on the architect level of Swift NIO. Using preallocated pools of objects, that never frees. Usually server do not need free memory, but it must be benchmarked to get understanding of how much connections can handle.
So if we need server for 100000 connections, we allocate it once at start, and never deallocate!

@allright
Copy link
Author

But we must to avoid alloc/free for Feature/Promises in the pipeline.
So:

  1. no ALLOC/FREE per Packet/Event.
  2. Alloc/Free ONLY PER CONNECTION and ONLY on server START.

@AnyCPU
Copy link

AnyCPU commented Apr 23, 2019

@AnyCPU your issue isn't related to this.

I recommend you to do workaround. So create more threads (approx not more 5000 connections per thread)

I have run the wrk tool using two profiles:

  1. wrk -t 1 -d 15s -c 1000 http://localhost:8080
  2. wrk -t 10 -d 15s -c 1000 http://localhost:8080

The issue occurs with the first option on very first run.
The issue does not occur with the second option. I tried to run many times.

I hope it will somehow help.

@allright
Copy link
Author

allright commented Apr 23, 2019

@AnyCPU your issue isn't related to this.

I recommend you to do workaround. So create more threads (approx not more 5000 connections per thread)

I have run the wrk tool using two profiles:

  1. wrk -t 1 -d 15s -c 1000 http://localhost:8080
  2. wrk -t 10 -d 15s -c 1000 http://localhost:8080

The issue occurs with the first option on very first run.
The issue does not occur with the second option. I tried to run many times.

I hope it will somehow help.

I means that you can increase the number of threads in EventLoopGroup:
let numberOfThreads = 32
let group = MultiThreadedEventLoopGroup(numberOfThreads: numberOfThreads)

So. Do not use the same machine for running wrk, it consumes the CPU, and have influence to your swift-nio server behaviour. Tests are incorrect.

@weissi
Copy link
Member

weissi commented Apr 23, 2019

@AnyCPU & @allright Please, let's separate the issues here. @AnyCPU your issue is unrelated to this here. @AnyCPU let's discuss your issue on the Vapor bug tracker.

@weissi
Copy link
Member

weissi commented Apr 23, 2019

But we must to avoid alloc/free for Feature/Promises in the pipeline.
So:

  1. no ALLOC/FREE per Packet/Event.

The only allocations that happen per packet/event are the buffer for the bytes. There are no futures/promises allocated. If you set a custom ByteBufferAllocator on the Channel you can also remove those allocations by ByteBuffers. So per packet/event no allocations are necessary.

@allright
Copy link
Author

allright commented Apr 23, 2019

@AnyCPU your issue isn't related to this.

I recommend you to do workaround. So create more threads (approx not more 5000 connections per thread)

I have run the wrk tool using two profiles:

  1. wrk -t 1 -d 15s -c 1000 http://localhost:8080
  2. wrk -t 10 -d 15s -c 1000 http://localhost:8080

The issue occurs with the first option on very first run.
The issue does not occur with the second option. I tried to run many times.
I hope it will somehow help.

I means that you can increase the number of threads in EventLoopGroup:
let group = MultiThreadedEventLoopGroup(numberOfThreads: "PUT HERE A LOT OF THREADS")

So. Do not use the same machine for running wrk, it consumes the CPU, and have influence to your swift-nio server behaviour. Tests are incorrect.

But we must to avoid alloc/free for Feature/Promises in the pipeline.
So:

  1. no ALLOC/FREE per Packet/Event.

The only allocations that happen per packet/event are the buffer for the bytes. There are no futures/promises allocated.

Yes. No problem here.
The problem is in massive deallocations, during massive channel closing. So we have to prevent it on channel/socket close.

@weissi
Copy link
Member

weissi commented Apr 23, 2019

The problem is in massive deallocations (and retaining!!!) if look on my graph, during massive channel closing. So we have to prevent it on channel/socket close

We can't really do anything about the retaining unfortunately, that's due to ARC. And with jemalloc I wouldn't hope that in a real-world application the deallocations lead to massive issues. Closing a (Socket)Channel is always expensive because we need to talk to the kernel to close the file descriptor, deregister from kqueue/epoll etc.

@allright
Copy link
Author

allright commented Apr 24, 2019

Yes, closing socket is expensive. But graph shows retain/release problem, not socket close.
I'm not sure that is not massive issue.
Hi Load http server - very often opens and closes connections.
For example - if you have 1 Gbit network (or even 10 Gbit!). You can open 1 000 000 connections! How to handle it? People will use C++, not swift. But why? Swift fast enough, event in C++ ASIO we can have alloc/dealloc problem.
Swift == C++ with std::shared_ptr every there. So, lets think how to use it correctly.
So my opinion it is not swift problem, but it is swift-nio architect problem. Really, I compared with C++ ASIO, and for ASIO now there are no problem to open/close 3000...5000 new connections per second and more on ONE THREAD without affecting other connections.

So for Swift-NIO current normal closing/open connections speed is < 1000 connections per second for 1 thread. Real limit for thread is 5000 RPS (because massive open/close can suspend this thread for time about: 5..10 seconds to close 5000 connections, and affect over connection handlers, processed by this thread).

Yes, this benchmarks is good, but it can be better even on swift I think.

And next steps may be:

  1. Investigate whats happen on socket close (is it possible to insert debug trace into retain/release increment decrement???)
  2. Minimise alloc/dealloc operations during connection handler open/close (we have influence on it, not on each retain/release, but on malloc/dealloc).
  3. All allocations -> move to INIT phase of server. (It may be special class:
    SwiftNIOHandlerAllocationPolicy )

I hope, I will deep in this issue during next several months. Right now I have no time for it:(

@weissi
Copy link
Member

weissi commented Apr 24, 2019

But graph shows retain/release problem

Again, we can't do much against retain being expensive. And retain is about 30% alone. Release will be as expensive as retain but it's harder to judge because a release might trigger a deallocation. Looking at retain is easier because it only ever retains the object.

Swift == C++ with std::shared_ptr every there.

Except that ARC at the moment inserts a lot more retains/releases than you'd typically see in C++.

If you find time, what would be really interesting to see is:

  • is it any different with a single-threaded EventLoopGroup. At the moment, Swift doesn't have a memory model so we need to use locks to implement the event loop. We hope to move to an atomic dequeue there
  • does -assume-single-threaded make a difference (obviously only use it with MultiThreadedEventLoopGroup(numberOfThreads: 1)
  • please test with Swift from the master branch, the first version of Semantic ARC has landed there and it makes a big difference
  • does jemalloc make a difference

@allright
Copy link
Author

Thanks @weissi.
I'll keep in mind.

@allright
Copy link
Author

allright commented Jul 19, 2019

I found very interesting architect for async networking, Fibers:
https://github.com/tris-code/web

So, I have tested this server in comparison with swift-nio HTTPServer using 1 core.
So I've got 11000 rps on my Mac Book Pro 2017 (on tris-code) vs 4000 rps (on swift-nio).

Fibers -> simlify code
-> no memery alloc/dealloc overhead for futures

@weissi
Copy link
Member

weissi commented Jul 19, 2019

@allright I know, fibers (commonly known as green threads) are great. However, they can't be implemented with Swift. You'd need setjmp/longjmp or some hand-rolled assembly (see tris-code's implementation at coro.c). Using these things is like the Swift team repeatedly said undefined behaviour.

If/when Swift gets async/await, we will be able to write code similar to the fibers that is more optimised too. So everything should look nicer and many things will be faster.

@allright
Copy link
Author

Is it not safe?
https://github.com/tris-code/fiber

@Lukasa
Copy link
Contributor

Lukasa commented Jul 19, 2019

It is not.

@allright
Copy link
Author

Can you point concrete unsafe place in this code?
Have something wrong here?
https://github.com/tris-code/fiber/blob/master/Sources/Fiber/FiberScheduler.swift

@Lukasa
Copy link
Contributor

Lukasa commented Jul 19, 2019

Incidentally, regarding the performance numbers, I should note that SwiftNIO's default HTTP pipeline configuration is "safe but slow". This is because we want to accommodate users who want to run without an nginx or similar reverse proxy in front of their server. You can remove some channel handlers from the default configuration to potentially see a substantial performance boost on load tests.

@Lukasa
Copy link
Contributor

Lukasa commented Jul 19, 2019

Basically all of this.

@allright
Copy link
Author

Ok, will see.

@Lukasa
Copy link
Contributor

Lukasa commented Jul 19, 2019

Specifically, that assembly code is replicating many of the features of setjmp and longjmp, which the Swift team have repeatedly noted do not behave correctly in Swift programs. This is particularly true when you are modifying the call stacks of Swift programs, as it can lead to errors around reference counting leading to memory unsafety.

@allright
Copy link
Author

My aim is to have replacement for boost:asio C++ in swift, comparable by performance.
And to have the simplest code.
So may be we have to wait async/await to support coroutines/fibers, or optimise work with massive memory alloc/dealloc for futures/promises.

@VineFiner
Copy link

Is there any new progress?

@Lukasa
Copy link
Contributor

Lukasa commented Sep 28, 2021

Currently there has not been meaningful progress here. We have consistently pushed down the memory overhead and pushed up performance, but it remains the case that deallocations happen on the event loop.

@VineFiner
Copy link

My aim is to have replacement for boost:asio C++ in swift, comparable by performance. And to have the simplest code. So may be we have to wait async/await to support coroutines/fibers, or optimise work with massive memory alloc/dealloc for futures/promises.

swift5.5 is release async/await

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants