New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too many Cannot assign requested address
error messages from the latest Citrine and Azure runs
#4092
Comments
Actually the above problem seems to be bigger. I've checked the logs from the last tree
There is a lot of stuff with the following
If the server application is not responding we should see only
In this particular case the JVM has died while running The servlet3.raw log from the same run (20180925) is nice illustration of the problem:
The Another manifestation of the same problem are the logs of various frameworks that are complaining with the same |
Cannot assign requested address
error messages from the latest Citrine and Azure runs
@nbrady-techempower @michaelhixson This seems to be a serious problem. |
This seems to suggest that sometimes the, presumably non-blocking, docker shutdown of a container outlasts the next docker start, so trying to bind with the same hostname (or port, or address) fails? @nbrady-techempower @michaelhixson Is it possible to reproduce locally? If so, can we add some checking similar to how we wait until Something like:
|
Would it be possible to run an experiment where we map each docker container to a different port over the course of the test. eg the first one mapps 8080:8081 then the next 8080:80802 etc. The load tester would need to know where each test is moving to but it would eliminate any possibility that the last "old" container could impact the next test. The results may be more stable. |
@nathantippy If the old container is up and running and using resources then the current test is still impacted. Changing the ports would make that harder to see. At least with the ports the same, we see the collision and know we have a problem with those tests. |
I think we caught the rest of these issues in #4585. This should be resolved. |
@nbrady-techempower My quick check shows that Scroll down (quite a lot or search for
|
This may be a problem with the Gemini tests. My concern before was that one framework’s failures was causing another framework problems. I don’t think that’s the case here. I’ll dig in a little more tomorrow. |
@nbrady-techempower I've checked some of the
Sample result:
And some quick and dirty reverse check:
The corresponding raw logs look suspicious:
Before the Maybe it is a good idea to add an export of |
@msmith-techempower Mike, do you have any time to look at this, perhaps next week? |
I have some time today; I'll see what I can do. |
Reopening this issue since it was never resolved, as far as I know. I looked into this a while ago and never figured it out. I had a hypothesis that I never tested that went like this: Since we use Docker's |
My hypothesis is that it is something about the kernel TCP stack settings. I've touched that in my previous comment. Something similar to this. That's why I've suggested the
These are 6 frameworks from the TOP10 in the At the end are |
I've just remembered about this case with Before: After: The difference between the two is this change: tuning of the Tomcat HTTP connector. The new configuration ( |
@zloster Sorry, I totally forgot about this. I'm super busy today, but I'll try and make sure this is a priority before the next round. Here is the sysctl -a
|
We looked into the "Cannot assign requested address" issue and found that the issue was caused by insufficient ephemeral ports, or too many connections, on the VMs when running certain frameworks. Since most frameworks do not have this issue and are behaving within spec as expected, the fix/mitigation we decided to implement was to add a 60-second wait time between each permutation to minimize the lingering impact of occupied ports from the previous permutation to the next. 60 seconds seems to be the amount of time for lingering We think the maintainers of test implementations having this issue should treat it as a potential bug in that implementation and/or framework. Looking at the results from a recent Citrine run (2020/04/01), only the following frameworks were having the "Cannot assign requested address" issue: By explicitly specifying the We tinkered around and found that setting So in the end, we've decided to add a 60-second wait between each permutation to make sure the |
In the last run, a new framework with that problem is Phalcon-micro: |
Closing in favor of new issue. |
@jsongte
Than there should be VERY good elaboration why before migration to Docker containers |
@zloster I take full responsibility for not pulling other people into this issue sooner. When we finally got around to it, we took Citrine down for a week, solely for the purpose of working on this. We had internal discussions while we were trying several different approaches that were all related to the guidance you provided. If we gave the impression that your efforts weren't valuable, it was unintentional. The time and effort you put into these more difficult problems is truly appreciated by all of us! As far as the |
I've noticed that:
hexagon-undertow-mongodb
hexagon-undertow-postgresql
are having problems. The following message is visible in the
raw.txt
:Here are some example logs from
citrine
:The
.dockerfiles
are looking OK i.e. no obvious differences.The text was updated successfully, but these errors were encountered: