Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.7.x Windows support - obsolete, check new one, leaving open for discussion #104

Closed
wants to merge 0 commits into from

Conversation

janbiedermann
Copy link

@janbiedermann janbiedermann commented Jun 20, 2021

Windows support -> heading for world domination!

To use:

  • Install Windows 10 in a VM
  • Install Ruby+Devkit from rubyinstaller.org, make sure things are in the path
  • clone repo
  • open cmd, type: ridk enable
  • got to repo dir
  • type make test
  • see tests passing
  • celebrate

Remarks:
There is no ssl support.
There is no fork() on Windows, so only one worker will work.
There are no unix sockets on Windows, sadly. Named pipes would be the best solution, but semantics would change and make things complicated.
Instead i opted for using tcp ports for local communication, as it keeps things simple, keeps the socket semantics and its possible to use WSAPoll() for everything.
https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsapoll

The point of all of this is, to make iodine and thus isomorfeus development accessible to a broader audience. Patches for iodine follow some time soon.

Although tests work, an actual app does not respond ... need to fix.

@boazsegev
Copy link
Owner

Hi @janbiedermann ,

I love where you're going this, however, this week I'm super busy and won't have time to look it over (between my birthday obligations and a few projects I am fresh out of seconds, not to mention minutes or hours).

I'l review it soon :)

Cheers and THANKS!!!
Bo.

@janbiedermann
Copy link
Author

janbiedermann commented Jun 21, 2021

Hi @boazsegev ,
"I love where you're going" - you mean world domination? Sure, love it. ;)
No need to hurry, it probably still needs a week to ripen.
With dbf1151 finally the example app works.

So with -w1 -t1 in the example "Hello wordl!" app i get with Apache ab ab.exe -c10 -n 100000 -k http://localhost:3000/:
Requests per second: 53475.98 [#/sec] (mean)
Not bad at all, small birthday present - Happy Birthday!

With multiple threads i get a lockup and with too many clients app exits. Still some work to do.

@janbiedermann
Copy link
Author

Fixed all issues i came across so far (not yet pushed), but there is one issue, that needs a bit more attention.

On windows there seems to be no determinable socket limit and socket fds (HANDLEs in winsock terminology) are all sorts of numbers and with multile threads often way beyond the fixed allocation of fio_data.

For example:
fio_data->info[1024]
allows for fd numbers up to 1023

Windows may for example provide a fd/HANDLE of 454654 as result of a accept() call, especially when using muliple threads. This will lead to a reference of
fio_data->info[454654]
and thus subsequently a crash

So for Windows a indirection would be required, that maps actual fds (HANDLEs) to fio_data->info[index].

I am thinking, how best to achieve that with as little as possible changes but did not arrive at a final solution yet.
I was considering the usage a fio hash map, maybe.
Any idea/inspiration very welcome. @boazsegev would be great, if you would have an idea?

@janbiedermann
Copy link
Author

To solve the above problem u just added a fio_data->info[i]->soket_handle. Works nicely, forward lookup is fast, reverse lookup scans fio_data>info[]. Fast enough. Single threaded it still shows excellent performance and after fixing another crash bug (not #105) it works very reliable. Socket handling seems to be correct.
However, there still is a performance issue when running multithreaded. It may be, because of the all the locking required, not sure yet. Dont pull this, Ill recreate this PR ans submit a new one.

@janbiedermann
Copy link
Author

janbiedermann commented Jun 25, 2021

Multi threaded with 1500 connections with a capa of 1024, 5 minute run:

  Reqs/sec      2081.54    1415.22   35504.73
  Latency      744.42ms   512.71ms     11.10s
  HTTP codes:
    1xx - 0, 2xx - 575632, 3xx - 0, 4xx - 0, 5xx - 0
    others - 29841
  Errors:
    dial tcp [::1]:3000: connectex: Es konnte keine Verbindung hergestellt werden, da der Zielcomputer die Verbindung verweigerte. - 29764
    the server closed connection before returning the first response byte. Make sure the server returns 'Connection: close' response header before closing the connection - 77

Single threaded with 1500 connections and capa of 1024, 5 minute run:

  Reqs/sec     42136.97    3294.27   56649.43
  Latency       35.61ms   265.46ms     14.87s
  HTTP codes:
    1xx - 0, 2xx - 12609920, 3xx - 0, 4xx - 0, 5xx - 0
    others - 31205
  Errors:
    dial tcp [::1]:3000: connectex: Es konnte keine Verbindung hergestellt werden, da der Zielcomputer die Verbindung verweigerte. - 26851
    the server closed connection before returning the first response byte. Make sure the server returns 'Connection: close' response header before closing the connection - 4354
  Throughput:     8.65MB/s

@janbiedermann
Copy link
Author

Bombarding http://localhost:3000/ for 10s using 500 connection(s)
[=================================================================================================================] 10s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec     82086.76   16529.77  235717.14
  Latency        5.99ms     3.62ms   557.00ms
  HTTP codes:
    1xx - 0, 2xx - 831271, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:    17.13MB/s

Great performance and multithreaded! Finally! And very stable, stable enough for development.
Now need to check websockets ...

@boazsegev
Copy link
Owner

Hi @janbiedermann ,

First I want to thank you for all your amazing work. I love how minimal your code changes seem (though I think it would be even better if we could somehow abstract away some of the code changes to a centralized location using macros / inline functions).

I am still at the beginning of learning what needs to be done for Windows integration and how I could test the WinSock API code (I don't have the OS and don't really want to shell out money for an OS I don't believe in)... in the meantime I found myself asking a few core questions...:

  1. I am assuming the "Linux Sub-System" wouldn't help us without forcing the user to compile/run the code within that system...? There's no way to enable that subsystem from within the code? - I ask because I assume the code will run "as is" on that subsystem, I mean, I read it runs Redis.

  2. Is there an upper limit for the Windows fd value (i.e., 64K)? If so, is it reasonable to set capa to that value? - I ask because I doubt if scanning the connection data Array (the reverse lookup) will still work decently when using a 100K connection capacity...

To clarify point 2: With WebSockets and (maybe in the future) HTTP/2, facil.io supports tens of thousands of concurrent connections (some of whom may be dormant for long periods of time while waiting for push events)... on Linux facil.io would be limited mostly by machine resources, not by any internal implementation detail. I would love Windows users to have the same experience.

  1. Do you have a recommendation how I might test the Windows code? Maybe we could add a Windows OS CI test? Is there a windows solution for developers of some kind?

Again, thanks for all the amazing work!

Cheers!
Bo.

@janbiedermann
Copy link
Author

Hi @boazsegev ,
my pleasure, i am too learning. Meanwhile, at several places the #ifdefs got a bit ugly. I agree, a beautiful solution would be nice to have.

  1. The WSL2 is very well integrated into Windows. Its easy to access files from Win or Lin, one way or another. Even Visual Studio Code can open files or directories from within the WSL2 and its very nice to work with. Networking between those two still needs a few things to consider. But i dont intent to instruct users to install and learn about the WSL, how to upgrade it to WSL2, what to consider to get networking services running between the systems, adjust firewalls, etc.... I would prefer a easy to use Windows only way. facil.io runs without any changes in WSL, as WSL is just a standard Linux.
  2. Memory is the limit for sockets on Windows too. I agree, scanning the array is not meant as a high performance solution, but as a way to get things running, without changing to much of facil.io. I chose 1024 sockets as capa, learning from xitami, which has internally the same limit and very high performance. But truly, without limits and random socket fds, the per socket task queue of facil.io needs a completely new way.
    2.5. I also noticed, that the thread handling of facil.o isnt perfectly suited for poll/wsapoll, limiting performance and response time. But i am very pleased with the current performance and its very well suited for using Windows at least as development platform. However, with some more effort, rethinking the task queue, thread handling and using Window's completion api, better performance can be achieved for lots more sockets. We maybe can have a look at https://oatpp.io/ and see how it handles things on Windows. But for the moment, getting iodine running on Windows with good enough performance is my first priority.
  3. You can get free Windows VMs from Microsoft, which work for 90 days, https://developer.microsoft.com/en-us/microsoft-edge/tools/vms/ . I use the HEB CI (hands, eyes, brains) ;-), otherwise i have no idea as of yet. I havent been a Windows user or developer until recently.

Ill try to get websockets running asap. Latest code is here https://github.com/janbiedermann/facil.io/tree/0.7.x_w
When everything works, ill recreate my PRs, to get things in order.

@janbiedermann
Copy link
Author

Latest instructions for Windows:
Install windows
Install Ruby30 with DevKit from rubyinstaller.org
Install git for Windows
clone repo, checkout 0.7.x_w
copy repo to new dir
cd to new dir
then ridk enable
then bash
then ./scripts/new/clean
then exit bash
then make DEBUG=1
then tmp\fioapp.exe
example app should work
curl http://localhost:3000/
for testing i used: https://github.com/codesenberg/bombardier/releases/tag/v1.2.5

@janbiedermann
Copy link
Author

janbiedermann commented Jun 28, 2021

Websockets work, however ... after using some and many with multiple threads the app busyloops doing nothing, just heating the room. No idea why. Debugger shows nothing of help.
I improved socket accuracy, meaning, as good as possible avoid passing invalid sockets to WSAPoll().
Added a simple websocket benchmark script, to be used with https://k6.io/

@janbiedermann
Copy link
Author

janbiedermann commented Jun 29, 2021

Ok, poll() on Linux too has some issues dropping websocket connections and later on not taking any connection anymore. So there definitly is a bug in the common poll()/WSApoll() code. But on Linux it doesnt start busy looping. Anyway, it sometimes shows as this, on Linux and on Windows:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==7935==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000010 (pc 0x56219f509567 bp 0x7f0d401fda60 sp 0x7f0d401fd940 T4)
==7935==The signal is caused by a READ memory access.
==7935==Hint: address points to the zero page.
    #0 0x56219f509566 in fio_flush lib/facil/fio.c:3661
    #1 0x56219f5038d3 in deferred_on_ready lib/facil/fio.c:2668
    #2 0x56219f4ffad3 in fio_defer_perform_single_task_for_queue lib/facil/fio.c:1256
    #3 0x56219f4ffc12 in fio_defer_perform lib/facil/fio.c:1294
    #4 0x56219f4ffd62 in fio_defer_cycle lib/facil/fio.c:1334
    #5 0x7f0d44171608 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x9608)
    #6 0x7f0d43f49292 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x122292)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV lib/facil/fio.c:3661 in fio_flush
Thread T4 created by T3 here:
    #0 0x7f0d441c5805 in pthread_create (/lib/x86_64-linux-gnu/libasan.so.5+0x3a805)
    #1 0x56219f4fe209 in fio_thread_new lib/facil/fio.c:814
    #2 0x56219f4fff2b in fio_defer_thread_pool_new lib/facil/fio.c:1366
    #3 0x56219f50dc34 in fio_worker_startup lib/facil/fio.c:4478
    #4 0x56219f50e407 in fio_sentinel_worker_thread lib/facil/fio.c:4568
    #5 0x7f0d44171608 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x9608)

Thread T3 created by T0 here:
    #0 0x7f0d441c5805 in pthread_create (/lib/x86_64-linux-gnu/libasan.so.5+0x3a805)
    #1 0x56219f4fe209 in fio_thread_new lib/facil/fio.c:814
    #2 0x56219f50e4fb in fio_sentinel_task lib/facil/fio.c:4582
    #3 0x56219f50e9cf in fio_start lib/facil/fio.c:4640
    #4 0x56219f5f899b in main src/main.c:254
    #5 0x7f0d43e4e0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)

==7935==ABORTING

On Linux it restarts the worker, everything fine, until next time. On Windows, Windows kills the thread and starts spawning new ones and killing them again all the time, also spinlock waiting for something and thus causing the cpu usage and doing nothing.

@janbiedermann
Copy link
Author

There is no reason access to ->flush should fail, its never changed to something else than the default flush function. So there must be some stack/heap/whatever overrun or something. But at this time i am getting nowhere with this. How to find the culprit?

@boazsegev
Copy link
Owner

Actually this might be related to a bug I wanted to track down for a while that's probably related to race conditions in the protocol assignment logic and might be unrelated to the poll implementation.

However, I think the poll and bug tracking should take a slightly backstage as the issue might be resolved using a different concurrency IO design and a new poll wrapper (already written, see later on).

I want to dedicate the main thread to the system and then all user code will run on "user" threads. This will ease much of the lock contention and allow buffered data to be sent even when all user threads are busy and without stressing the locks on the IO buffers (new client requests will wait for user threads, while buffered responses will be sent with lower latency).

I'm very much behind on the new design for version 0.8.x and I don't want to pause the Windows support avenue, which is why I didn't point out before that there's additional future work in the pipeline. I thought I'll just port your code to the new design once it's ready.

You can have a look at the re-write for all the FIOBJ API and core type logic here (the API is much clearer, as the naming is more consistent). There's also a new poll engine in there... but there's no IO design just yet, this is just a library full of building blocks.

The future 0.8 design will eventually go into this repo which is currently full of junk that I need to replace (the beginning of the new design is partly on my system and mostly in my head).

One of the issues the new design raises is the question of the fd data array (which is great for POSIX OSs, but not portable). Replacing it with dynamic pointers might ruin memory locality (which is hugely important for some performance details) but could reduce some race conditions for multi-threaded memory access, so I have to write it all to test performance before I know if dynamic memory is a pass or fail.

I wish I knew what I'm doing, but right now I'm spread thin on a big number of projects and nothing is moving fast enough.

@boazsegev
Copy link
Owner

Hi @janbiedermann ,

I'm having some issues and I wanted your opinion.

For the last week or so I've been intensively attempting to work out a manageable approach to Windows development.

My main concerns are maintenance and testing.

If we're looking at getting iodine on Windows, then we need to get facil.io core functionality to support WinSock2 and windows IO polling (WSAPoll), as well as use the Windows file reading (and writing) API for the FIOBJ data types.

Threads and fork-ing on iodine use the Ruby implementation, so we can delay working on that... though I think a threading API that allows a developer to choose between co-rutines (i.e., green threads) and native OS threads will help those using facil.io without a Ruby layer.

I think it would be better to code these few layers in a native Windows API rather than use MSYS2 / Cygwin / Mingw.

The facil.io custom memory allocator requires the ability to allocate memory on aligned addresses (it uses masks and pointer addresses to figure out the original allocation block and access the block's metadata)... I'm not sure I can rewrite it for Windows.

However...

Keeping an intel computer around just so I can run a virtual Windows dev machine and provide support for Windows sounds a little too heavy duty for me. I cannot test contributions without running the code and I don't want to run a windows machine (not even a virtual one) on my network. That's one high-risk OS where everyone (including Microsoft) it trying to mine personal data from both the machine and the local network.

So... I don't know what to do.

Maybe Windows support will (forever) remain "unofficial" and "untested", where I trust contributors without testing their code... or maybe Windows support will just fade away and Windows related bugs / issues / features will never get addressed.

I intend to try my hand at incorporating what you've already done so that we are as close to Windows compatible as we can be, but... this week that I spent learning the Windows API (and their types, they have to name all their types in capitals?)... it really made me feel that keeping up with Windows support over time might be impossible :(

What do you think? Will we be able to maintain support for Windows over time? Will there be enough contributors? Who will test Windows code? ...?

Sorry if I'm ranting, it's just that this Windows experience reminded me why I left that OS both as a user and as a developer.

Thanks for your input.

Bo.

@janbiedermann
Copy link
Author

janbiedermann commented Jul 5, 2021

Hi @boazsegev ,

ill give my best:

If we're looking at getting iodine on Windows, then we need to get facil.io core functionality to support WinSock2 and windows IO polling (WSAPoll), as well as use the Windows file reading (and writing) API for the FIOBJ data types.

Thats basically done in my code. There are some windows support functions, well, posixish wrappers for native win api.
Others did that work, libuv would be available, but not necessary.

(Threads) I think it would be better to code these few layers in a native Windows API rather than use MSYS2 / Cygwin / Mingw.

Sure, but i am not sure, if there would be any benefit, even performance wise. Sticking to pthreads of msys/mingw keeps thing simple and portable. Cygwin isnt a target for me, too slow.

I think it would be better to code these few layers in a native Windows ...

I agree. But honestly, that Windows tooling is beyond my understanding. All i need is a compiler and header files and libs. What you get is a web installer with lots of fancy things that i dont understand and gigs over gigs of software. No idea what this all is about. And it seems to change over and over again with each new VisualStudio release or Windows Release or whenever some new marketing guy is hired at Microsoft. I am unable to keep up with this ever changing vastness. However, i found out, that are "build tools" available and after installing >1G compiler, headers and libs are there, somewhere on the system ...
https://visualstudio.microsoft.com/de/downloads/#build-tools-for-visual-studio-2019

The facil.io custom memory allocator ....

It seems Windows doesnt provide aligned memory allocation, so that must be done "manually". I omitted that part and just use malloc/free, good enough for now.

Keeping an intel computer around just so I can run a virtual Windows ...

What kind of computers do you have?
Sure, i understand your concern. Having facil.io as nice toolkit for VisualStudio Development, native with msvc, would be great. But no, i am not interested in that, all i want is iodine on Windows ;). And wouldnt need to have bells and wistles, just work, and fast enough, stable enough and all is good for me. Also it wouldnt need millions of ws connections, if it can handle around 300000 that would be fantastic.

What do you think? Will we be able to maintain support for Windows over time?

No, i think we dont need to. Once Isomorfeus is running on Windows the world will gradually switch to Linux or *BSD ;-D
because they all will see, that its a much better experience on Linux or *BSD ;-D

Will there be enough contributors? Who will test Windows code? ...?

Maybe, i could imagine that people might be interested in something fast and simple on Windows, not sure. But maybe not.
I was thinking about creating a facil.io-DevKit for Windows, complete with some simple opensource C IDE, compiler (tcc), headers (from mingw), libs (also from mingw, where necessary). And then implement real time development, with a file watcher/proxy daemon, that compiles superfast with tcc on file change and reloads the browser view via a proxy injected script ...
But probably i wont have time for that.

Sorry if I'm ranting, it's just that this Windows experience reminded me why I left that OS both as a user and as a developer.

Clearly, its the same for me, but i would like isomorfeus to be more accessible for a broader audience of developers/users because of reasons. As long as Windows is dominating the desktop, i think i can maintain and support iodine/isomorfeus on Windows in such a way, that isomorfeus development works nicely. So for that i am testing.
For super mega native Windows performance a lot more effort would be required.

And now for something completely different:

@janbiedermann
Copy link
Author

janbiedermann commented Jul 5, 2021

I think i found a bug, with the fix websocket things seem to work better on Linux:
in fio_write2_fn i think, at the end it should look like this:

locked_error:
  fio_unlock(&uuid_data(uuid).sock_lock);
  fio_packet_free(packet);
  /** fallthrough and free buffer */
error:
  if (options.after.dealloc) {
    options.after.dealloc((void *)options.data.buffer);
  }
  errno = EBADF;
  return -1;
}

Currently it doesnt fallthrough and just returns, and so the buffer passed in case of locked_error never gets freed.

Further on linux with poll() testing websockets, there seem to be "waves" of activity and sleep, like connections/data are coming in but facil.io does nothing for 5s and then is very busy for 1s and then again doing nothing for 5s. (times not accurate, just to explain). When i stop the debugger in such situation for example all threads are waiting for a lock to fio_postoffice.pubsub.lock or all threads are waiting for poll(). Still investigating.
Further on Linux now i am experiencing another memory error, where malloc fails with SIGSEGV. Still investigating.
Windows still is busylooping early with websockets and both linux and Windows still drop/loose ws connections. Still investigating.

@janbiedermann
Copy link
Author

Why do you use your custom locking and not pthread_rwlock?

@janbiedermann
Copy link
Author

I think i found the problem with the locking.
fio_filter_dup_lock_internal

  • locks the collection
  • doing stuff, getting channel
  • locks the channel
  • unlocks the collection

channel gets unlocked in fio_subscribe

fio_unsubscribe

  • locks the channel
  • doing stuff
  • locks the collection
  • ...

Lets look at 2 threads running that interleaved by accident:
1 fio_filter_dup_lock_internal
2 fio_unsubscribe
1 lock collection
1 doing stuff getting channel
2 locking channel
2 doing stuff
1 locking channel --> waiting
2 locking collection --> waiting
--> deadlock, busy spinning

@janbiedermann
Copy link
Author

Alright, fixed all that and pushed latest code to 0.7.x_w but i got a new problem:
Thread 35 received signal SIGTRAP, Trace/breakpoint trap.
...
Thread 35 (Thread 16080.0x299c):
...
#6 0x00007ff8e8a19c9c in msvcrt!free () from C:\Windows\System32\msvcrt.dll
#7 0x00007ff759deeb72 in fio_ls_remove (node=0x191ac962670) at lib/facil/fio.h:3329

Only when using websockets.
No idea where that comes from now. Somewhere malloc/free is confused within websockets.
Hormal HTTP works like charm.

@janbiedermann janbiedermann changed the title 0.7.x Windows support 0.7.x Windows support - obsolete, check new one, leaving open for discussion Jul 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants