Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Support punycode if `idn` command is available #1321

Open
teward opened this issue Sep 17, 2019 · 19 comments

Comments

@teward
Copy link
Contributor

commented Sep 17, 2019

This is a feature request and not a bug report.

Currently, testssl does NOT support punycode, or the conversion of UTF8 strings to punycode.

However, there is NO guarantee that any given system will have IDN support - i.e. no installed idn or libidn to call upon.

In Ubuntu and Debian variants, we can install idn from the repositories, however this isn't a guarantee that this will be available under the same name.

Depending on how we wish to include IDN support, we can either go the route of:

(1) Checking if the idn command exists, and if it does, run the given domain through idn before processing it throughout the program - however if idn is not available or installed this will fail hard and will continue to have issues a la #1319 and #1320

(2) External dependencies beyond just the idn program, a la a Python call to do the encode/decode. Which has the requisite of needing Python.

(3) Try and write an IDN encoder/decoder ourselves in Bash (which sounds like the hellish approach).

Using approach 1 here may be the best approach, however do we want to have that dependency added?

If we do want to go the route of incorporating the idn command support I would like to attempt to provide some mechanism for this in the code (so please let me have some time to dig into this for a solution, my Bash is slightly rusty and I need to reaffirm myself with bits of the code...)

@teward teward changed the title Support punycode if `idn` or similar library is available [Feature Request] Support punycode if `idn` or similar library is available Sep 17, 2019
@teward teward changed the title [Feature Request] Support punycode if `idn` or similar library is available [Feature Request] Support punycode if `idn` command is available Sep 17, 2019
drwetter added a commit that referenced this issue Sep 18, 2019
Add IDN/punycode support for non-ASCII URIs (#1319, #1320, #1321)
@teward

This comment has been minimized.

Copy link
Contributor Author

commented Sep 18, 2019

Fixed with #1322 merge

@teward teward closed this Sep 18, 2019
@drwetter

This comment has been minimized.

Copy link
Owner

commented Sep 18, 2019

There's a problem:

Debian Buster:

prompt% dig +timeout=2 +tries=2 +short -t a xn--v4h.com
dig: 'xn--v4h.com.' is not a legal IDNA2008 name (string contains a disallowed character), use +noidnout
prompt% dig +noidnout  +timeout=2 +tries=2 +short -t a xn--v4h.com
54.36.56.87

Debian stretch

prompt% dig  +timeout=2 +tries=2 +short -t a xn--v4h.com
54.36.56.87 from server 192.168.122.1 in 0 ms.
prompt% dig +noidnout  +timeout=2 +tries=2 +short -t a xn--v4h.com
Invalid option: +noidnout

Opensuse 15.X

prompt% dig +noidnout  +timeout=2 +tries=2 +short -t a xn--v4h.com
;; IDN support not enabled [on stderr]
54.36.56.87
prompt% dig +timeout=2 +tries=2 +short -t a xn--v4h.com 2>/dev/null

On Debian Buster host also hiccups. Drill seems to be straight forward so far.

You probably know better the status for Ubuntu...

@teward

This comment has been minimized.

Copy link
Contributor Author

commented Sep 18, 2019

@teward teward reopened this Sep 18, 2019
@teward

This comment has been minimized.

Copy link
Contributor Author

commented Sep 18, 2019

So, there seems to be some discrepancy because of idn and idn2. If we use idn2 it triggers the proper error handler. Which is a different package - libidn2. However, we then have to parse error output.

Using a Debian Buster environment (Thank you for LXD containers!) I have the following:

root@debian-buster:~# idn ☮.com; echo $?
xn--v4h.com
0
root@debian-buster:~# idn2 ☮.com; echo $?
idn2: toAscii: string contains a disallowed character
1

Digging deeper, BOTH of the test cases #1319 and #1320 don't meet the restrictions/criterion:

root@debian-buster:~# idn2 ♨️.com
idn2: toAscii: string contains a disallowed character

Now, this in mind, we can do a disallowed character test:

if [[ $1 = *[![:ascii:]]* ]]; then
    if [[ "$(type -p idn2)" == "" ]]; then
        fatal "URI contains non-ASCII characters, and IDN not available."
    else
        URI="$(idn2 $1 2>&1)"
        if [[ $URI == *"disallowed character"* ]]; then
            fatal "URI contains a disallowed non-ASCII character."
        fi
    fi
else
    URI="$1"
fi

... which adheres to the IDNA2008 standard. This will break emoji domains like we've been testing.

BUT, if we use a Russian example, of президент.рф, this will work properly.

So we need to be explicit with our definition of 'valid domain'. While technically Emoji Domains can be supported, they aren't part of the IDNA2008 standard and therefore standard DNS and such dictate them to be 'bad'.

How do you want to proceed with this, Dirk? Remove IDN support, or force IDNA2008 standard which dig was whining about?

@teward

This comment has been minimized.

Copy link
Contributor Author

commented Sep 18, 2019

@drwetter

I confirmed this in the Python idna library:

>>> idna.encode('☮.com')
Traceback:
idna.core.InvalidCodepoint: Codepoint U+262E at position 1 of '☮' not allowed

>>> idna.encode('♨️.com')
Traceback:
idna.core.InvalidCodepoint: Codepoint U+2668 at position 1 of '♨️' not allowed

So for IDN support/conversion we are going to have to decide whether we want to support emojis or be strict to the IDNA2008 standard (which doesn't include emojis in them)

@drwetter

This comment has been minimized.

Copy link
Owner

commented Sep 18, 2019

thanks, look into that after sleeping ;-)

@teward

This comment has been minimized.

Copy link
Contributor Author

commented Sep 18, 2019

Only concerns here:

Alpine doesn't have an idn2 library. Neither does FreeBSD. Debian and Ubuntu do, and I haven't tested every other distro yet. Which means anywhere that doesn't have idn2 will fail to work with this.

I... might be able to write something that can do this validation, using a Python based program as an addon here, but that'd be a headache to make work properly if people don't install the deps. (I don't know Perl well enough or I'd propose writing this there)

@teward

This comment has been minimized.

Copy link
Contributor Author

commented Sep 18, 2019

Further Concerns and Digging:

If we want to be using host or dig we need to comply with IDNA2008 because +noidnout is NOT guaranteed to exist.

If we want to support Emoji Domains, then we'll have to bypass IDNA2008 checks and use alternative DNS lookup mechanisms than host or dig, which for obvious reasons is going to be a bit tricky to really make work.

@bjmgeek

This comment has been minimized.

Copy link

commented Sep 18, 2019

It seems that major browsers (I tested Chrome and Firefox) work with emoji domains, probably by trying IDNA2008 and if that fails, trying IDNA2003. So for instance, in both browsers, ☮.com and ♨.com both work (in the latter case, there's actually a valid certificate, see https://www.ssllabs.com/ssltest/analyze.html?d=♨.com for instance).

@bjmgeek

This comment has been minimized.

Copy link

commented Sep 18, 2019

According to ICANN, the use of emoji in domain names is strongly discouraged, but according to wikipedia, there are over 20,000 emoji domains in .ws alone, so it seems testing emoji domains may be a valid use case.

@teward

This comment has been minimized.

Copy link
Contributor Author

commented Sep 18, 2019

It seems that major browsers (I tested Chrome and Firefox) work with emoji domains, probably by trying IDNA2008 and if that fails, trying IDNA2003. So for instance, in both browsers, ☮.com and ♨.com both work (in the latter case, there's actually a valid certificate, see https://www.ssllabs.com/ssltest/analyze.html?d=♨.com for instance.

I think the core issue here is not whether IDNA2008 or IDNA2003 is in use here, but what those specific tools dig or host use based on the environment. If dig were standard across all, we'd have a +noidnout function call that wouldn't fail in Buster, Stretch, or OpenSUSE. Same with host

See #1321 (comment) for that initial report.

I've done some digging, but am still trying to find a suitable solution that'd work for the DNS lookups that would ignore IDNA2008 in a 'works everywhere' format.

@teward

This comment has been minimized.

Copy link
Contributor Author

commented Sep 18, 2019

According to ICANN, the use of emoji in domain names is strongly discouraged, but according to wikipedia, there are over 20,000 emoji domains in .ws alone, so it seems testing emoji domains may be a valid use case.

Never said it wasn't, but I think you've missed the most important part of my first message you were looking at:

If we want to support Emoji Domains, then we'll have to bypass IDNA2008 checks and use alternative DNS lookup mechanisms than host or dig, which for obvious reasons is going to be a bit tricky to really make work.

Right now dig, host, etc. are inconsistently supporting or not supporting things we'd need to actually do the type of lookups necessary to support Emoji Domains. Which is the core issue that @drwetter was referring to.

@bjmgeek

This comment has been minimized.

Copy link

commented Sep 18, 2019

You're correct. This is testssl.sh after all. I don't think it's reasonable to pull in a bunch of external dependencies.

@teward

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2019

So, I got bored and came up with a Perl-driven DNS lookup system that returns A and AAAA records, but it requires Net::DNS which is not exactly standard (not standard in Debian or Ubuntu, not standard in Alpine, I don't have an OpenSUSE handy but I'll bet it's not standard there), and to use this Perl script as our DNS resolver for A and AAAA records, we'd need to add some dependencies on Perl and Perl's Net::DNS to function right.

The script I wrote can be found here: https://gist.github.com/teward/9b3a5f2d57b11be75715b93e7c15c4e6

This said, if we are not opposed to requiring Perl and Perl's Net::DNS to be available, we could theoretically replace all dig and host references with this Perl if we are on a Linux env and are using the IDN library to handle Emojis. This will bypass dig and host and any inbuilt IDN headaches for filtering and instead drop data directly as we are seeking it. (This said, my Perl is VERY rusty, so it's a VERY rough script).

It currently outputs a line-break-delimited list of A and AAAA records, but you have to be careful to catch stderr to /dev/null or somewhere else, because if there's a failure to resolve somewhere in there it'll spit stuff out to stderr. If we send stderr to /dev/null though, and no records exist for this that are either A or AAAA, it will return an empty set of data from the command call. Not the best approach but one of the few IDN-spec-ignoring mechanisms to do a DNS query on the emoji domain(s).

I would suggest though that barring any other solution that is cross-distro compliant for our needs, that this solution of relying on the Perl resolver here be a Last Resort Option, because it pulls more external dependencies in.

@drwetter

This comment has been minimized.

Copy link
Owner

commented Sep 19, 2019

@teward , sorry, I am opposed to using perl directly. testssl.sh uses deliberately a small set of external binaries as it is supposed to work under every OS (BSDs, OSX, Windows' Unix environments) ~without installing anything.

That being said, we need to come up with a solution which works as good as it can under the given circumstances. That'll be a bit of headache. But we need to keep in mind, that IDN support is a (great) option, not mandatory. So if at a certain point an URI is supplied which cannot be tested we signal that to the user and maybe give a recommendation (like it's the case with the missing idn binary), if possible. If that's not possible, it's ok too.

So what I suggest is a) find out which DNS client binary has the best IDN support and put it first in get_*_record() b) test other binaries' capabilities, so that they will be supplied with the correct option. c) if then it turns out a lookup fails, well the error handling is already there. Probably the message need to be amended only.

a+b requires some testing on several OS. As usual there will be no check for an OS version because FreBSD/Mac OSX can install GNU tools as well as others can install "unexpected" binaries. Required is a test with the binary itself (search for HAS_ variables)

@drwetter

This comment has been minimized.

Copy link
Owner

commented Sep 19, 2019

PS: I just added idn support to the docker container, which seem to work with all the examples mentioned here (some refuse to connect. I was just adding libidn to the package list. The container is using drill.

@drwetter

This comment has been minimized.

Copy link
Owner

commented Sep 19, 2019

Remark: dig under Opensuse Tumbleweed works perfectly for 💩.ws, so as a result testssl.sh without your PR gets as far as to the socket connect. There's only a server on port 80 though.

No shit ;-)

drwetter added a commit that referenced this issue Sep 19, 2019
This PR adds a few quotes to some arguments which when previous code
was executed properly weren't needed.

Also it improves the IDN code from @teward, so that when idn2 is
available, a conversion will be tried, and when idn is available
and/or idn2 failed, a conversion will be tried.

Finally it'll be tried to continue without conversion, hoping that
the DNS client binaries can cope with the IDN URI.

This is not good enough yet and needs to be complemented, see discussion
@ #1321.
@drwetter

This comment has been minimized.

Copy link
Owner

commented Sep 19, 2019

Please see commit and its comment in separate branch. The point with DNS client binary is still open (and code is not as I want it to be). Now though I need to focus on topics for which I can buy food and the like...

@drwetter

This comment has been minimized.

Copy link
Owner

commented Sep 20, 2019

See 61238f1 and ae9cb99. Also https://☮.com will be correctly parsed as non-IDN URIs.

As you might have noticed I don't care much if a URI meets standards. :-) The workflow is now that a conversion with idn2 will be tried, then idn and if that fails the user now gets notified and we just supply the IDN nodename to the resolver. In some cases even that works (emoji domains). If it doesn't there's an error handler for the resolver which complains and exit the program.

What is missing now are tests with more IDN domains under several OS with several DNS clients. That would help. dig + libidn and modern distributions worked so far.

If you want to help the following circumstances of the failure would be helpful:

  • URI
  • your OS
  • DNS client used (bash -x testssl.sh $URI and looking at /tmp/testssl-*.log or SETX=true bash -x testssl.sh $URI can help)
@drwetter drwetter added this to the 3.0 milestone Sep 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.