New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IDNA support and integrate with DNS lookup #2543

Merged
merged 13 commits into from Apr 10, 2018

Conversation

Projects
None yet
@MakeNowJust
Contributor

MakeNowJust commented May 2, 2016

Punycode is the encoding for IDNA, which is defined in RFC 3492. And IDNA means Internationalized Domain Names in Application defined in RFC 5980, which is method to use non-ascii characters as host name of URI. For example, when you open http://日本語ドメイン.com/ in browser, browser converts this URL to http://xn--eckwd4c7c5976acvb2w6i.com/ internally. On this, 日本語ドメイン.com is IDN and xn--eckwd4c7c5976acvb2w6i.com is IDNA. I added punycode support.

RFC 3986 says:

When a non-ASCII registered name represents an internationalized domain name
intended for resolution via the DNS, the name must be transformed to the IDNA
encoding [RFC3490] prior to name lookup.

so I added integration of DNS lookup. Of course, it is better that getaddrinfo supports to convert IDN to IDNA internally, and in fact getaddrinfo supports such a thing since glibc 2.3.4 or Windows 8. However, this is extension and it is not portable, in other words it is not defined in POSIX (Especially, it is disabled by default on glibc). Finally, I think this is the best way to support IDNA.

Thanks.

@kostya

This comment has been minimized.

Contributor

kostya commented May 2, 2016

funny, yesterday doing the same https://github.com/kostya/simple_idn, even code is similar

@MakeNowJust

This comment has been minimized.

Contributor

MakeNowJust commented May 2, 2016

@kostya I guess that you use punycode.js implementation as reference, and so, I refered to it.

@MakeNowJust MakeNowJust force-pushed the MakeNowJust:feature/punycode branch from dccb071 to 87efb3e May 2, 2016

@MakeNowJust

This comment has been minimized.

Contributor

MakeNowJust commented May 2, 2016

Formatted!

end
end
raise Error.new "invalid input" unless init

This comment has been minimized.

@jhass

jhass May 30, 2016

Member

Do we really need a custom error class here? ArgumentError seems fine to me.

next if m == prev
prev = m
raise Error.new("overflow") if m.ord - n > (Int32::MAX - delta) / h

This comment has been minimized.

@jhass

jhass May 30, 2016

Member

"overflow" is a bit vague as an error message, could you be a bit more verbose here? Also ditto about custom error class.

@@ -0,0 +1,168 @@
module Punycode

This comment has been minimized.

@jhass

jhass May 30, 2016

Member

Not everybody knows what it is, small doc note would be nice to hint the right RFC at least :)

This comment has been minimized.

@ysbaddaden

ysbaddaden May 30, 2016

Member

Along with proper credits for punycode.js and its author if it was used as a reference / ported over? And is the license compatible? yes it's MIT.

init = false
end
digit = 'a' <= c && c <= 'z' ? c.ord - 0x61 : 'A' <= c && c <= 'z' ? c.ord - 0x41 : '0' <= c && c <= '9' ? c.ord - 0x30 + 26 : -1

This comment has been minimized.

@jhass

jhass May 30, 2016

Member

A bit long and nested, might benefit from a case/when.

This comment has been minimized.

@MaloJaffre

MaloJaffre May 30, 2016

Contributor

You can also use for more explicit code:

digit = case c
        when .alpha?
          c.downcase - 'a'
        when .digit?
          c - '0' + 26
        else
          raise Error.new("invalid input") # or ArgumentError
        end
@jhass

This comment has been minimized.

Member

jhass commented May 30, 2016

I wonder how hard versions of this would be that directly read/write to IO, if possible at all.

@asterite

This comment has been minimized.

Contributor

asterite commented May 30, 2016

We could accept this without support for appending to an IO and later improve it.

After applying the suggestions made by @jhass we could merge this.

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented May 30, 2016

What about moving it under the URI namespace as URI::Punycode and uri/punycode.cr?

@straight-shoota

This comment has been minimized.

Contributor

straight-shoota commented Jun 28, 2017

@MakeNowJust Could you address the open comments?

@sdogruyol

This comment has been minimized.

Member

sdogruyol commented Sep 10, 2017

This is also a good PR, @MakeNowJust how about updating?

@Sija

This comment has been minimized.

Contributor

Sija commented Sep 22, 2017

@MakeNowJust

This comment has been minimized.

Contributor

MakeNowJust commented Sep 22, 2017

@Sija Sorry I'm busy

@MakeNowJust MakeNowJust force-pushed the MakeNowJust:feature/punycode branch from 1d77610 to 86ac0d7 Mar 4, 2018

@MakeNowJust

This comment has been minimized.

Contributor

MakeNowJust commented Mar 4, 2018

Updated! Thank you @epergo!

@RX14

RX14 approved these changes Mar 4, 2018

end
def self.encode(string : String)
encode string.chars

This comment has been minimized.

@straight-shoota

straight-shoota Mar 4, 2018

Contributor

It would be better to use a Char::Reader instead of allocating an array of chars. I don't think there is a real usecase for supplying a char array, is it?

This comment has been minimized.

@MakeNowJust

MakeNowJust Mar 4, 2018

Contributor

Sure.

end
def self.encode(string : String, io)
encode string.chars, io

This comment has been minimized.

@straight-shoota

straight-shoota Mar 4, 2018

Contributor

ditto

Remove char array version `encode` method
And it uses `each_char` instaed of `chars`, it is memory efficient.
all = true
others = [] of Char
chars.each do |c|

This comment has been minimized.

@straight-shoota

straight-shoota Mar 4, 2018

Contributor

This line could just be string.each_char do |c| - this inlines the block instead of using an iterator instance.

String.build { |io| encode string, io }
end
def self.encode(string, io)

This comment has been minimized.

@straight-shoota

straight-shoota Mar 5, 2018

Contributor

Just a minor nit bit: methods receiving an optional IO should have it prepended as first argument. This makes it easier if maybe additional arguments are added later to keep both signatures similar. Even if that's unlikely I'd recommend to stick with this strategy.

This comment has been minimized.

@MakeNowJust

MakeNowJust Mar 6, 2018

Contributor

URI.encode is also accept String as first argument and IO as second argument. I think no problem.

end
def self.encode(string, io)
h = 0

This comment has been minimized.

@straight-shoota

straight-shoota Mar 5, 2018

Contributor

It would be great if these variables could be more descriptive. What does h mean?

This comment has been minimized.

@MakeNowJust

MakeNowJust Mar 6, 2018

Contributor

I can't describe this correctly (because the date I wrote this is 2 years ago.) Sorry 🙇 . But h and other variable names are traditional, they are used in RFC 3492's reference implementation.

return if others.empty?
others.sort!
io << DELIMITER unless all

This comment has been minimized.

@straight-shoota

straight-shoota Mar 5, 2018

Contributor

I don't think you need all at all. It can be replaced by h == 0 (note the next comment, then this needs to be after that line and h == 1).

This comment has been minimized.

@MakeNowJust

MakeNowJust Mar 6, 2018

Contributor

Yes, you are right.

firsttime = true
prev = nil
h += 1

This comment has been minimized.

@straight-shoota

straight-shoota Mar 5, 2018

Contributor

You could probably set the value to h = string.size - others.size + 1 instead of incrementing it in the loop (just a minor improvement).

end
def self.decode(string)
if delim = string.rindex(DELIMITER)

This comment has been minimized.

@straight-shoota

straight-shoota Mar 5, 2018

Contributor

You could just replace this entire conditional with

rest, _, output = string.rpartition(DELIMITER)
output = output.chars
# and later loop over `rest.each_char`:
rest.each_char do |c|

This comment has been minimized.

@MakeNowJust

MakeNowJust Mar 6, 2018

Contributor

Good, but correct code is:

output, _, rest = ...

MakeNowJust added some commits Mar 6, 2018

@MakeNowJust

This comment has been minimized.

Contributor

MakeNowJust commented Mar 10, 2018

ping

@Sija

This comment has been minimized.

Contributor

Sija commented Apr 9, 2018

Catching dust while waiting for merge... does anyone care?

@RX14

RX14 approved these changes Apr 9, 2018

@bcardiff bcardiff merged commit e4f637c into crystal-lang:master Apr 10, 2018

2 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@bcardiff bcardiff added this to the Next milestone Apr 10, 2018

@bcardiff

This comment has been minimized.

Member

bcardiff commented Apr 10, 2018

Thanks @MakeNowJust

@MakeNowJust MakeNowJust deleted the MakeNowJust:feature/punycode branch Apr 10, 2018

@paulkass paulkass referenced this pull request Jul 1, 2018

Merged

Implemented #2770 #6306

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment