Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support socks5h proxy using unix sockets #8668

Closed
wants to merge 3 commits into from
Closed

Support socks5h proxy using unix sockets #8668

wants to merge 3 commits into from

Conversation

balki
Copy link

@balki balki commented Apr 1, 2022

Usage:

curl -x "socks5h://unix/run/tor/socks" "https://example.com"

Ref: https://curl.se/mail/archive-2021-03/0012.html

This is a proof of concept to make sure it is working. Please take over from here or guide next steps - tests, documentation etc.,

Copy link
Member

@bagder bagder left a comment

For tests, I think test case 1435 can serve as good inspiration as it already uses --unix-socket

lib/url.c Outdated
@@ -2519,6 +2519,20 @@ static CURLcode parse_proxy(struct Curl_easy *data,
}

/* now, clone the proxy host name */
#ifdef USE_UNIX_SOCKETS
if(proxytype == CURLPROXY_SOCKS5_HOSTNAME
Copy link
Member

@bagder bagder Apr 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to limit this to SOCKS5_HOSTNAME and not allow SOCKS4a too?

Copy link
Author

@balki balki Apr 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it may not work as dns resolution is done locally in SOCKS4a using the same function. But after quick test, looks like it should work. I will update for all socks proxies.

Copy link
Author

@balki balki Apr 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to support all socks proxies and not depend on the scheme prefix in proxy string. Please confirm if implementation/syntax is ok. I will then add a test.

lib/url.c Outdated
@@ -2519,6 +2519,20 @@ static CURLcode parse_proxy(struct Curl_easy *data,
}

/* now, clone the proxy host name */
#ifdef USE_UNIX_SOCKETS
if(proxytype == CURLPROXY_SOCKS5_HOSTNAME
&& strncmp("unix/", proxy + 10, 5) == 0) { /* len("socks5h://") = 10 */
Copy link
Member

@bagder bagder Apr 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The host name might start with that scheme, the type could also be set with a separate option and then the proxy name starts with the host name. This cannot assume the first 10 bytes are known.

I also find it a little unfortunate to reserve the unix host name like this, as there might actually be a proxy in use that is named like that. Maybe we should instead make the --unix-socket option accept a socks... prefix?

Copy link
Author

@balki balki Apr 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The host name might start with that scheme, the type could also be set with a separate option and then the proxy name starts with the host name. This cannot assume the first 10 bytes are known.

I will update to handle that case.

there might actually be a proxy in use that is named like that.

In that case, unix/path cannot be a valid URI for a socks proxy as it can only have host:port syntax. Since I am checking for unix/, I think it should not affect anyone having a server named unix with a socks proxy running in it.

Maybe we should instead make the --unix-socket option accept a socks... prefix

I wish to have the ability to set the proxy via environment variable like ALL_PROXY=socks5h://unix/run/tor/socks, so that programs that internally use curl does not have to change the command line.

@dfandrich
Copy link
Collaborator

@dfandrich dfandrich commented Apr 2, 2022

@bagder
Copy link
Member

@bagder bagder commented Apr 2, 2022

i.e. an extra slash, making the host part of the URL empty. This is similar to how a file: URL would refer to the same file.

Unfortunately, the URL parser accepts one, two or three slashes where there should be two so three slashes will be accepted as if they were two:

$ curl -x socks5h:///moo/hello localhost
curl: (7) Failed to connect to moo port 1080 after 0 ms: Connection refused

Another option is to use a short but otherwise illegal host name. For example just two dots (..):

$ curl -x socks5h://../run/tor/socks localhost
curl: (6) Could not resolve host: ..

@dfandrich
Copy link
Collaborator

@dfandrich dfandrich commented Apr 2, 2022

@bagder
Copy link
Member

@bagder bagder commented Apr 3, 2022

That sounds wrong—it goes against RFC 1738 and RFC 8089, anyway. Is this a WHATWG thing?

Yes. 😢 The lax nature of WHATWG's URL spec makes such URLs occasionally get used in the wild, so this is our adjustment to accept such URLs better.The WHATWG spec actually allows any amount of slashes (even zero), and both backslashes and slashes (mixed) but curl only allows regular forward slashes and only one, two or three of them.

@balki
Copy link
Author

@balki balki commented Apr 4, 2022

For tests, I think test case 1435 can serve as good inspiration as it already uses --unix-socket

I looked at the test setup. I think the following changes are required along with writing a testcase.

  1. Update this server to support taking unixsocket as argument. https://github.com/curl/curl/blob/master/tests/server/socksd.c
  2. Pass the option from here: https://github.com/curl/curl/blob/master/tests/runtests.pl

Q: Where should the temporary socket be created and how/when should it be cleaned up?

With last update, If the hostname is unix and path is not just '/', the path is considered to to be a unix socket.

Please correct if I am missing anything.

@balki
Copy link
Author

@balki balki commented Apr 19, 2022

@bagder Please take a look again.

  • Changed syntax to socks5h://localhost/path/to/sock
  • Added tests

I don't understand the torture test failure but it does not fail consistently.

@bagder
Copy link
Member

@bagder bagder commented May 8, 2022

The torture test fail of 1467 means that there is a memory/resource leak somewhere in the exit path.

Copy link
Member

@bagder bagder left a comment

There is no added documentation?

@balki
Copy link
Author

@balki balki commented May 10, 2022

The torture test fail of 1467 means that there is a memory/resource leak somewhere in the exit path.

Reviewed again. Could not find the leak.. Can you please share instructions to reproduce failure locally? I am on a ubuntu desktop. make tests-full pass fine.

Added documentation for the command line flags.

@balki balki requested a review from bagder May 10, 2022
lib/url.c Outdated Show resolved Hide resolved
lib/url.c Outdated Show resolved Hide resolved
lib/url.c Outdated Show resolved Hide resolved
lib/url.c Outdated Show resolved Hide resolved
@bagder
Copy link
Member

@bagder bagder commented May 10, 2022

The clue is torture tests. It's a special way to run the tests, which verifies the exit paths heavily.

Reviewed again. Could not find the leak.. Can you please share instructions to reproduce failure locally? I am on a ubuntu desktop. make tests-full pass fine.

$ ./configure --enable-debug (+ c-ares or non-threaded resolver)
$ make
$ cd tests
$ make
$ ./runtests -n -t 1467

This then makes all allocations in the test fail, one by one and it verifies that it never leaks memory in any case.

The 1467 failure you got looked like this:

test 1467...[HTTP GET via SOCKS5 proxy via unix sockets]
 144 functions found, but only fail 25 (17.36%)
** MEMORY FAILURE
Leak detected: memory still allocated: 10 bytes
At 7f7f53e04bc8, there's 10 bytes.
 allocated by escape.c:144
LIMIT urlapi.c:1415 strdup reached memlimit
 Failed on function number 93 in test.
 invoke with "-t93" to repeat this single case.

It shows that escape.c:144 allocated a line that was never freed. It was when fallible function number 93 was made to fail. If you wanted reproduce that exact fail number, you can do this:

./runtests -t94 1467

I usually first try with -t to run all loops torture to see that it reproduces, it might not be the exact same number in a local build. Also, use -n to switch off valgrind since running with valgrind makes every loop take several times longer, which can make this test really slow. Using valgrind in the last step when you re-run a single number is very useful because valgrind can help you pinpoint the entire leak call stack much better.

@bagder
Copy link
Member

@bagder bagder commented May 10, 2022

if you still have problems rerunning the test, I can have a go at it soon.

@balki
Copy link
Author

@balki balki commented May 10, 2022

if you still have problems rerunning the test, I can have a go at it soon.

Thank you. Was able to reproduce and fix it. Waiting for CI run.

@balki balki requested a review from bagder May 11, 2022
@balki
Copy link
Author

@balki balki commented May 11, 2022

Fixed torture test and added documentation. Please review. curl/check failure does not show any logs or errors except for message "Build Failed". Please help understand the error.

Copy link
Member

@bagder bagder left a comment

This looks very good and I'm ready to merge. Just one thing left that I can think of:

You added docs for the command line tool but not for the libcurl option that actually passes the string to libcurl. CURLOPT_PROXY. It needs to be similarly extended so that users can learn how to specify a socks proxy over unix domain socket with it.

Balakrishnan Balasubramanian added 3 commits May 17, 2022
Usage:
    curl -x "socks5h://localhost/run/tor/socks" "https://example.com"

Updated runtests.pl to run a socksd server listening on unix socket
Added tests test1467 test1468
Added documentation for proxy command line option and socks proxy
options
@balki
Copy link
Author

@balki balki commented May 17, 2022

This looks very good and I'm ready to merge. Just one thing left that I can think of:

You added docs for the command line tool but not for the libcurl option that actually passes the string to libcurl. CURLOPT_PROXY. It needs to be similarly extended so that users can learn how to specify a socks proxy over unix domain socket with it.

Thank you. Updated CURLOPT_PROXY documentation.

@balki balki requested a review from bagder May 17, 2022
bagder
bagder approved these changes May 19, 2022
@bagder bagder closed this in dfa84a0 May 19, 2022
@bagder
Copy link
Member

@bagder bagder commented May 19, 2022

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants