Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to bind ports: Docker-for-Windows & Hyper-V excluding but not using important port ranges #3171

Open
veqryn opened this issue Jan 3, 2019 · 17 comments

Comments

@veqryn
Copy link

@veqryn veqryn commented Jan 3, 2019

  • I have tried with the latest version of my channel (Stable or Edge)
  • I have uploaded Diagnostics
  • Diagnostics ID: BB0297BB-C287-4F0B-A007-72B5F2D7BD72/20190102235413

Expected behavior

Be able to bind specific ports that I have always used.
Be able to specify which ports docker/hyperv exclude or use, and/or I expect that docker/hyper-v actually use the ports that it is excluding and that they show up in netstat -ano as being used or listened on.

Actual behavior

If I start a service that binds on port 50051 (it is a grpc service, and that is the traditional port used by grpc), it says:
listen tcp :50051: bind: An attempt was made to access a socket in a way forbidden by its access permissions.

Information

  • Is it reproducible? Yes
  • Is the problem new? Yes. My previous installation of docker for windows, from a year ago when I was on Windows 1709, didn't have this problem.
  • Did the problem appear with an update? Yes, you could say that. I wiped my harddrive and started over with Windows 1809 and the latest version of Docker for Windows.
  • Windows Version: Windows 10 Pro 1809 (Version 10.0.17763 Build 17763)
  • Docker for Windows Version: 2.0.0.0-win81
  • Docker version: 18.09.0

Steps to reproduce the behavior

MinGW 12:11:50 ~$ docker run -p 50051:50051 hello-world
C:\Program Files\Docker\Docker\Resources\bin\docker.exe: Error response from daemon: driver failed programming external connectivity on endpoint infallible_lehmann (906354afc8855cc38fc8ac3e9e5b0642c9470f48f99c48e188ed3c8cfe236c9e): Error starting userland proxy: Bind for 0.0.0.0:50051: unexpected error Permission denied.

My own investigation:

I was extremely confused by this problem, because I was able to bind other ports, such as 8080 or 60000, yet it did not appear that 50051 was in use by anything on my system.

Running netstat -ano shows nothing using 50051.

Running Get-NetTCPConnection in powershell with admin privileges shows nothing using 50051.

Even if I disconnect from the internet and disable both windows firewall and my antivirus, and run everything as admin, I still get the errors.

After hours of google searching, I found a command that showed what happened to 50051:

PS C:\WINDOWS\system32> netsh interface ipv4 show excludedportrange protocol=tcp

Protocol tcp Port Exclusion Ranges

Start Port    End Port
----------    --------
     49692       49791
     49792       49891
     49892       49991
     49992       50091
     50092       50191
     50214       50313
     50498       50597

* - Administered port exclusions.

It seems that 50051 is excluded (whatever that means?!), even though it isn't in use by anything.

After lots of trial and error, I discovered that Docker for Windows and Hyper-V are responsible for all of those excluded port ranges above.

It also seems like all those port ranges change or increase by 1 every time I reboot, so I suppose 450 reboots from now my problem will go away, maybe...

I have never had this problem, despite using docker for years now.

I run lots of containers and setups that other people at my company work on and rely on, so it is not feasible for me to be changing the ports around on them to work around this issue. (Other people use the kube templates and docker-compose, and some of them connect with other docker-compose networks, etc, and expect things on certain ports.)

When I try to delete that excluded port range, I get this, despite running the command as administrator:

PS C:\WINDOWS\system32> netsh interface ipv4 delete excludedportrange protocol=tcp startport=49992 numberofports=100
Access is denied.
@rramsden

This comment has been minimized.

Copy link

@rramsden rramsden commented Jan 23, 2019

Solution in googlevr/gvr-unity-sdk#1002 works for me but not ideal

@veqryn

This comment has been minimized.

Copy link
Author

@veqryn veqryn commented Jan 23, 2019

That workaround does not work for me, unfortunately, despite having admin rights.

@enashed

This comment has been minimized.

Copy link

@enashed enashed commented Jan 31, 2019

@veqryn the workaround worked for me, the steps are:

  1. Disable hyper-v (which will required a couple of restarts)
    dism.exe /Online /Disable-Feature:Microsoft-Hyper-V

  2. When you finish all the required restarts, reserve the port you want so hyper-v doesn't reserve it back
    netsh int ipv4 add excludedportrange protocol=tcp startport=50051 numberofports=1

  3. Re-Enable hyper-V (which will require a couple of restart)
    dism.exe /Online /Enable-Feature:Microsoft-Hyper-V /All

when your system is back, you will be able to bind to that port successfully.

pm47 added a commit to ACINQ/eclair that referenced this issue Apr 4, 2019
There was an obscure Docker error when trying to start an Electrum
server in tests. [1]

It appears that there is a conflict between Docker and Hyper-V on some
range of ports.

A workaround is to just change the port we were using.

[1] docker/for-win#3171
@mxl mxl referenced this issue Apr 15, 2019
2 of 2 tasks complete
@docker-desktop-robot

This comment has been minimized.

Copy link
Collaborator

@docker-desktop-robot docker-desktop-robot commented May 1, 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale comment.
Stale issues will be closed after an additional 30d of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle stale

@veqryn

This comment has been minimized.

Copy link
Author

@veqryn veqryn commented May 1, 2019

/remove-lifecycle stale

pm47 added a commit to ACINQ/eclair that referenced this issue May 21, 2019
* Fix eclair-cli to work with equal sign in arguments (#926)

* Fix eclair cli argument passing

* Modify eclair-cli to work with equals in arguments

* Eclair-cli: show usage when wrong params are received

* Remove deprecated call from eclair-cli help message [ci skip]

* Make Electrum tests pass on windows (#932)

There was an obscure Docker error when trying to start an Electrum
server in tests. [1]

It appears that there is a conflict between Docker and Hyper-V on some
range of ports.

A workaround is to just change the port we were using.

[1] docker/for-win#3171

* API: fix fee rate conversion (#936)

Our `open` API calls expects an optional fee rate in satoshi/byte, which is the most widely
used unit, but failed to convert to satoshi/kiloweight which is the standard in LN.
We also check that the converted fee rate cannot go below 253 satoshi/kiloweight.

* Expose the websocket over HTTP GET to work properly with basic auth (#934)

* Expose the websocket over HTTP GET
* Add test for basic auth over websocket endpoint

* Set max payment attempts from configuration (#931)

With a default to `5`.

* Add a proper payments database (#885)

There is no unique identifier for payments in LN protocol. Critically,
we can't use `payment_hash` as a unique id because there is no way to
ensure unicity at the protocol level.

Also, the general case for a "payment" is to be associated to multiple
`update_add_htlc`s, because of automated retries. We also routinely
retry payments, which means that the same `payment_hash` will be
conceptually linked to a list of lists of `update_add_htlc`s.

In order to address this, we introduce a payment id, which uniquely
identifies a payment, as in a set of sequential `update_add_htlc`
managed by a single `PaymentLifecycle` that ends with a `PaymentSent` or
`PaymentFailed` outcome.

We can then query the api using either `payment_id` or `payment_hash`.
The former will return a single payment status, the latter will return a
set of payment statuses, each identified by their `payment_id`.

* Add a payment identifier

* Remove InvalidPaymentHash channel exception

* Remove unused 'close' from paymentsDb

* Introduce sent_payments in PaymentDB, bump db version

* Return the UUID of the ongoing payment in /send API

* Add api to query payments by ID

* Add 'fallbackAddress' in /receive API

* Expose /paymentinfo by paymentHash

* Add id column to audit.sent table, add test for db migration

* Add invoices to payment DB

* Add license header to ExtraDirective.scala

* Respond with HTTP 404 if the corresponding invoice/paymentHash was not found.

* Left-pad numeric bolt11 tagged fields to have a number of bits multiple of five (bech32 encoding).

* Add invoices API

* Remove CheckPayment message

* GUI: consume UUID reply from payment initiator

* API: reply with JSON encoded response if the queried element wasn't found

* Return a payment request object in /receive

* Remove limit of pending payment requests!

* Avoid printing "null" fields when serializing an invoice to json

* Add index on paymentDb.sent_payments.payment_hash

* Order results in descending order in listPaymentRequest

* Electrum: do not persist transaction locks (#953)

Locks held on utxos that are used in unpublished funding transactions should not be persisted.
If the app is stopped before the funding transaction has been published the channel is forgotten
and so should be locks on its funding tx utxos.

* Added a timeout for channel open request (#928)

Until now, if the peer is unresponsive (typically doesn't respond to
`open_channel` or `funding_created`), we waited indefinitely, or until the
connection closed.

It translated to an API timeout for users, and uncertainty about the
state of the channel.

This PR:
- adds an optional `--openTimeoutSeconds` timeout to the `open` endpoint, that will
actively cancel the channel opening if it takes too long before reaching
state `WAIT_FOR_FUNDING_CONFIRMED`.
- makes the `ask` timeout configurable per request with a new `--timeoutSeconds`
- makes the akka http timeout slightly greater than the `ask` timeout

Ask timeout is set to 30s by default.

* Set `MAX_BUFFERED` to 1,000,000 (#948)

Note that this doesn't mean that we will buffer 1M objects in memory:
those are just pointers to (mostly) network announcements that already
exist in our routing table.

Routing table has recently gone over 100K elements (nodes,
announcements, updates) and this causes the connection to be closed when
peer requests a full initial sync.

* Fix Dockerfile maven binary checksum (#956)

The Maven 3.6.0 SHA256 checksum was invalid and caused the docker build to fail.

* Add channel errors in audit db (#955)

We now keep track of all local/remote channel errors in the audit db.

* Added simple plugin support (#927)

Using org.clapper:classutil library and a very simple `Plugin` interface.

* Live channel database backup (#951)

* Backup running channel database when needed

Every time our channel database needs to be persisted, we create a backup which is always
safe to copy even when the system is busy.

* Upgrade sqlite-jdbc to 3.27.2.1

* BackupHandler: use a specific bounded mailbox

BackupHandler is now private, users have to call BackupHandler.props() which always
specifies our custom bounded maibox.

* BackupHandler: use a specific threadpool with a single thread

* Add backup notification script

Once a new backup has been created, call an optional user defined script.

* Update readme with bitcoin 0.17 instructions (#958)

This has somehow been missed by PR #826.

* Backup: explicitely specify move options (#960)

* Backup: explicitely specify move options

We now specify that we want to atomically overwrite the existing backup file with the new one (fixes
a potential issue on Windows).
We also publish a specific notification when the backup process has been completed.

* Print stack trace when crashing during boot sequence (#949)

* Print stack trace when crashing during boot sequence

* Use friendly message when db compatibility check fails

* ElectrumWallet should not send ready if syncing (#963)

This commit is already embedded in version `0.2-android-beta22`.

* Channel: Log additional data (#943)

* Channel: Log additional data

Log local channel parameters, and our peer's open or accept message.
This should be enough to recompute keys needed to recover funds in case of unilateral close.

* Electrum: make debug logs shorter (#964)

* Better handling of closed channels (#944)

* Remove closed channels when application starts

If the app is stopped just after a channel has transition from CLOSING to CLOSED, when the  application starts again if will be restored as CLOSING. This commit checks channel data and remove closed channels instead of restoring them.

* Channels Database: tag closed channels but don't delete them

Instead we add a new `closed` column that we check when we restore channels.

* Document how we check and remove closed channels on startup

* Do not print the stacktrace on stderr when there is an error at boot (#966)

* Do not print the stacktrace on stdout when there is an error at boot

* Fix flaky test in PaymentLifecycleSpec (#967)

* Use local random pamentHash for each test in paymentlifecyclespec, intercept the route request before the router.

* Rename `eclair.bak` to `eclair.sqlite.bak` (#968)

This removes any ambiguity about what the content of the file is about.

* Fixed concurrency issue in `IndexedObservableList` (#961)

Update map with new indexes after element is removed

Fixes #915

* Various fix and improvements in time/timestamp handling (#971)

This PR standardizes the way we compute the current time as unix timestamp 

- Scala's Platform is used and the conversion is done via scala's concurrent.duration facilities
- Java's Instant has been replaced due to broken compatibility with android
- AuditDB events use milliseconds (fixes #970)
- PaymentDB events use milliseconds
- Query filters for AuditDB and PaymentDB use seconds

* API: Support query by `channelId` or `shortChannelId` everywhere (#969)

Add support for querying a channel information by its `shortChannelId`.

* Smarter strategy for sending `channel_update`s (#950)

The goal is to prevent sending a lot of updates for flappy channels.

Instead of sending a disabled `channel_update` after each disconnection, 
we now wait for a payment to try to route through the channel and only 
then reply with a disabled `channel_update` and broadcast it on the
network.

The reason is that in case of a disconnection, if noone cares about that
channel then there is no reason to tell everyone about its current
(disconnected) state.

In addition to that, when switching from `SYNCING`->`NORMAL`, instead
of emitting a new `channel_update` with flag=enabled right away, we wait
a little bit and send it later. We also don't send a new `channel_update` if
it is identical to the previous one (except if the previous one is outdated).

This way, if a connection to a peer is unstable and we keep getting
disconnected/reconnected, we won't spam the network.

The extra delay allows us to remove the change made in #888, which was
a workaround in case we generated `channel_update` too quickly.

Also, increased refresh interval from 7 days to 10 days. There was no
need to be so conservative.

Note that on startup we still need to re-send `channel_update` for all 
channels in order to properly initialize the `Router` and the `Relayer`.
Otherwise they won't know about those channels, and e.g. the 
`Relayer` will return `UnknownNextPeer` errors.

But we don't need to create new `channel_update`s in most cases, so 
this should have little or no impact to gossip because our peers will
already know the updates and will filter them out.

On the other hand, if some global parameters (like relaying fees) are
changed, it will cause the creation a new `channel_update` for all
channels.

* Fixed overflow issue with max duration (#975)

This is a regression caused by #971, because `Duration` has a max value of `Long.MaxValue` *nanoseconds*, not *seconds*.

* Use proper closing type in `ChannelClosed` event (#977)

There was actually a change introduced by #944 where we used
`ClosingType.toString` instead of manually defining types, causing a
regression in the audit database.

* Update bash autocompletion for eclair-cli (#983)

* Update bash autocompletition file to suggest all the endpoints

* Update list of commands in eclair-cli help message

* Replace `UnknownPaymentHash` and `IncorrectPaymentAmount` with `IncorrectOrUnknownPaymentDetails` (#984)

See lightningnetwork/lightning-rfc#516 and lightningnetwork/lightning-rfc#544

* Wireshark dissector support (#981)

* Transport: add support for encryption key logging.
This is the format the wireshark lightning-dissector uses to be able to decrypt lightning messages.

* Enrich test for internal eclair API implementation (fr.acinq.eclair.Eclair.scala) (#938)

* Add test to EclairImpl for `/send`, `/allupdates` and `/forceclose/`

* Set default chain to "mainnet" (#989)

Eclair is now configured to run on mainnet by default.

* Set tcp client timeout to 20s (#990)

So that it fails before the ask/api time out.

* Add bot support for code coverage (codecov) (#982)

* Add scoverage-maven-plugin dependency

* Update travis build to generate a scoverage report

* Add custom codecov configuration to have nice PR comments

* Add badge for test coverage in readme

* Accept `commit_sig` without changes (#988)

LND sometimes sends a new signature without any changes, which is a
(harmless) spec violation.

Note that the test was previously not failing because it wasn't specific
enough. The test now fails and has been ignored.

* Ignore subprojects eclair-node/eclair-node-gui in the codecov report (#991)

* Use bitcoind fee estimator first (#987)

* use bitcoind fee provider first

* set default `smooth-feerate-window`=6

* Configuration: increase fee rate mismatch threshold

We wil accept fee rates that up to 8x bigger or smaller than our local fee rate

* Updated license header (#992)

* Release v0.3 (#994)

* gui: include javafx native libraries for windows, mac, linux

* Release v0.3

* Set version to 0.3.1-SNAPSHOT

* Improved test coverage of `io` package (#996)

* improved test coverage of `NodeURI`

* improved test coverage of `Peer`

* Fix TextUI

* BackupHandler: use renameTo() on Android

Most Path methods are not available at our current API level
@dozer75

This comment has been minimized.

Copy link

@dozer75 dozer75 commented Jun 12, 2019

What's the status for this?

Today I had 100 of reservations which caused Skype for Business to stop working since it couldn't find any available ports. Uninstall docker/hyper-v/containers removed these reservations and Skype for Business had ports to work with again.

This is a critical error that should be focused on since it reserves so many ports that isn't in use. I can't uninstall docker/hyper-v/containers or similar workarounds each time I get problems with Skype for Business in conjunction with meetings.

@Sureiya

This comment has been minimized.

Copy link

@Sureiya Sureiya commented Jun 26, 2019

This makes hyper-v unusable for me and most of my company.

@sbley

This comment has been minimized.

Copy link

@sbley sbley commented Jul 16, 2019

@enashed Does your solution (disabling and re-enabling HyperV) have any side effects? Will my virtual switches and virtual machines still be there after applying your solution?

@sbley

This comment has been minimized.

Copy link

@sbley sbley commented Aug 1, 2019

Answering myself: This actually does have side effects. It deletes your virtual switches. You should keep that in mind when applying this solution.

@ciprian-chichirita

This comment has been minimized.

Copy link

@ciprian-chichirita ciprian-chichirita commented Sep 6, 2019

this issue is still present
/remove-lifecycle stale

@danidemi

This comment has been minimized.

Copy link

@danidemi danidemi commented Sep 17, 2019

Hi guys, I've the same problem. This is really blocking me.

@tarikguney

This comment has been minimized.

Copy link

@tarikguney tarikguney commented Oct 15, 2019

@enashed's answer worked for me perfectly! Thanks!

@vamseekm

This comment has been minimized.

Copy link

@vamseekm vamseekm commented Oct 31, 2019

IntelliJ idea community edition doesn't start because it tries to bind first available the port in the range 6942-6991.

source: https://intellij-support.jetbrains.com/hc/en-us/community/posts/360004973960-Critical-Internal-Error-on-Startup-of-IntelliJ-IDEA-Cannot-Lock-System-Folders-

And following command shows this port range is reserved/blocked by hyper-v/docker-for-windows. Frankly I don't know whether it's because of docker or some other app.

netsh interface ipv4 show excludedportrange protocol=tcp

Protocol tcp Port Exclusion Ranges

Start Port End Port


  1583        1682
  1683        1782
  2480        2579
  4492        4591
  5357        5357
  5614        5713
  5834        5933
  5940        6039
  6045        6144
  6276        6375
  6491        6590
  6897        6996
  7003        7102
 28385       28385
 50000       50059     *
    • Administered port exclusions.
@eXtreme

This comment has been minimized.

Copy link

@eXtreme eXtreme commented Nov 13, 2019

The "hns" service is very... greedy.

  • I have a boot setup where I can boot windows without hyper-v enabled, so that I can use virtualbox
  • If I boot "without hyper-v" the hns service starts, even when it says that autostart for that service is disabled
  • That service reserves large port ranges preventing me from starting winnfsd service (used with virtualbox/vagrant vm), I have to stop hns, and hope that this time it does no reserve that port and boot the vm
  • Even if i stop the service it starts after a few seconds, reserving this time a different range of ports
  • Sometimes it reserves ports that are normally used for local web development, like 3000/5000 for nodejs
  • Sometimes it reserves a port IntelliJ IDEA needs for starting
@cpietrzykowski

This comment has been minimized.

Copy link

@cpietrzykowski cpietrzykowski commented Nov 16, 2019

This is /obviously/ not docker's problem (as best I can tell), it's probably not even hyperv's. Commenting here as this seems to be a frustrating and common end of journey for googlers. What follows is at least "one" of the resolutions/explanations.

On one of my machines the dynamic port range was not updated to the "new" start port, and I guess related to a resolved bug in windows has now "exposed" this as a serious problem (e.g.: I couldn't even bind to port 3000 for node dev -- access denied is I think a valid response, but it's not the typical "port in use" root cause).

Current dynamic port config:
> netsh int ipv[46] show dynamicport tcp

Unless you know you've mucked with these settings, and if it doesn't specify 49152 as "Start Port" and is set to 1025, it's not "current". I don't know if there is some kind of migration bug when they were patching this new value or what. Dynamic start port for udp was set correctly for example.

To set it to the current config:
> netsh int ipv[46] set dynamic tcp start=49152 num=16384

(Likely a reboot of your host is required.)

ref: https://support.microsoft.com/en-ca/help/929851/the-default-dynamic-port-range-for-tcp-ip-has-changed-in-windows-vista

While it's bizarre that I only just ran into this issue less than 4 hours ago -- been doing docker/node/go dev for the last few months straight, using docker edge, etc. This appears to have resolved my port exclusion issues (I have no large ranges of reserved ports below 50000 now, previously had 1000 port range exclusions all over the place.)

@eXtreme

This comment has been minimized.

Copy link

@eXtreme eXtreme commented Nov 16, 2019

@cpietrzykowski this is it, wow, thanks for finding that. I can't remember how many hours I've spent debugging this... I've just tried that, rebooted, winnfsd starts, nodejs starts...

Where did that "invalid" range start (1024) come from? Something must have updated it back to 1024 and the only thing I can find in common from 4 systems I've encountered that problems is the moment of enabling hyper-v for "Docker for win" purposes.

@cpietrzykowski

This comment has been minimized.

Copy link

@cpietrzykowski cpietrzykowski commented Nov 16, 2019

@cpietrzykowski this is it, wow, thanks for finding that. I can't remember how many hours I've spent debugging this... I've just tried that, rebooted, winnfsd starts, nodejs starts...

Where did that "invalid" range start (1024) come from? Something must have updated it back to 1024 and the only thing I can find in common from 4 systems I've encountered that problems is the moment of enabling hyper-v for "Docker for win" purposes.

It was not "invalid", it was the previous system version default (the above link has all the authoritative information I have on it). Hyper-v is just doing what it's supposed to do, ensuring it has network ports available for its own management. The problem is (there's another docker issue that explains this part), that Microsoft had a "bug" in port binding, and that's since been fixed. Which is why applications adding port exclusions have turned into a headache for a few of us caught in the middle.

I don't think any of the above is misinformation, that's as much as I know of this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.