Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix netProbe write check #843

Merged
merged 1 commit into from
May 31, 2019
Merged

Fix netProbe write check #843

merged 1 commit into from
May 31, 2019

Conversation

matbech
Copy link
Contributor

@matbech matbech commented May 30, 2019

Write at least 1 byte. This ensures that sockets are ready to use for writing.
Windows specific: during the system startup, sockets can be created but the underlying buffers may not be setup yet. If this is the case Write fails with WSAENOBUFS: "An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full
This fixes: #841

Write at least 1 byte. This ensures that sockets are ready to use for writing.
Windows specific: during the system startup, sockets can be created but the underlying buffers may not be setup yet. If this is the case Write fails with WSAENOBUFS: "An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full
This fixes: #841
@jedisct1
Copy link
Member

This is a Windows-specific issue, so we should do this only on Windows (and only if the netprobe timeout is not 0).

This contradicts the documentation and configuration file saying that nothing is ever sent to this address. Such a change will trigger alarms from local firewalls, little snitch and friends, so this should be documented.

@matbech
Copy link
Contributor Author

matbech commented May 30, 2019

Good points.

According to the MS documentation, a winsock send call with len = 0 bytes also sends a datagram:
"Calling send with a len parameter of zero is permissible and will be treated by implementations as successful. In such cases, send will return zero as a valid value. For message-oriented sockets, a zero-length transport datagram is sent."
Reference: https://docs.microsoft.com/en-us/windows/desktop/api/winsock2/nf-winsock2-send

I don't know what the Go implementation does (maybe it does nothing if len = 0) because I cannot explain why it would succeed with len 0, but not with len 1.

Maybe the whole network code (config.loadSources) should be executed in the service thread and not during the service startup on Windows.

@jedisct1
Copy link
Member

It's a little bit weird that the behavior differs with an empty message and with a one-byte message.

I think an empty datagram is a valid datagram, so Go shouldn't handle this differently. Or maybe this is an interesting bug.

Raw syscalls are available in Go so maybe we should try these directly?

@matbech
Copy link
Contributor Author

matbech commented May 30, 2019

The problem is (at least on Windows) that the network code should not be in the service init function. The reason is that Windows will stop the service if the init does not complete within 45000 milliseconds (apparently the new default on Windows 10). So even with a reliable netProbe implementation, the current design cannot cope with all situations. For example, it is likely that it may take longer than 45 seconds to obtain network connectivity, to resolve the host and to download the sources (public-resolvers.md) from github etc. And in this case, the dnscrypt-proxy service fails to start.

What I propose is something like that:

  • Init: Load and parse the config. Do not update the resolvers yet. Continue, even if there are no resolvers. If the config integrity must be maintained, provide a default resolvers cache file in the setup.
  • Start the proxy. creating the socket and binding to the IP shouldn't fail like the write call in netProbe.
    If the proxy has no resolvers, and receives a request, return an error.
  • Update resolvers if necessary. If the update fails, retry again after x seconds.
  • Provide an option -force-update-sources for users that want to update the sources before the proxy is started.

This has the following benefits:

  • the service and therefore the proxy always starts, once network connectivity is available, the proxy will function properly.
  • netProbe is no longer necessary

I probably missed half of the things involved, but I think that should fix the current issues.

@jedisct1
Copy link
Member

Using resolvers lists and pivoting to other ones sounds complicated and dangerous.

And netprobe is probably a good thing to keep. On Linux, we need to inform systemd that the service started. Trying to update the resolvers list in a loop will either be slow or hammer servers if something goes wrong.

The only thing we need for Windows services is return as soon as possible. So that the service does nothing but spawns a goroutine that does everything.

And that goroutine can be the same on all operating systems, for all commands (including commands such as -list) and whether the app is called directly or run as a service.

I quickly tried to do that but eventually gave up due to implementation details of the services module.

@jedisct1 jedisct1 merged commit cf261da into DNSCrypt:master May 31, 2019
@matbech matbech deleted the patch-1 branch May 31, 2019 13:17
@DNSCrypt DNSCrypt locked and limited conversation to collaborators Jun 30, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Windows: dnscrypt-proxy service fails to start often
2 participants