Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internationalization & Localization #905

Closed
moxiegirl opened this issue Jan 15, 2019 · 3 comments
Closed

Internationalization & Localization #905

moxiegirl opened this issue Jan 15, 2019 · 3 comments
Labels

Comments

@moxiegirl
Copy link
Contributor

@larrysalibra commented on Wed Apr 08 2015

I hope that Blockstack can be used globally and not just limited to English speaking world. I'm opening this ticket to start a discussion about how the spec and movement can accommodate that.

Namespace:

Many (most?) Internet users don't use the latin alphabet. Should we force someone who's name is 李兆京 (me!) to pick a romanized version of his name? Seems culturally insensitive. DNS was ascii only because of various historical reasons. We don't have the same excuse.

Multiple languages:

Many people live their lives in a multilingual environment. New immigrants in the US often have family and friend in their original country with him they interact in one language and new friends in the US they interact with in English. In places like Hong Kong (where I live), most people have two legal names: a Chinese name and an English name. In Europe, many people have friends across the continent with whom they communicate with English and family and friends in their country of residence who may only read and write the local language.

We could either:

  1. Tell these people to create a separate openname for each language.
  2. Allow people to have language specific entries for free form fields: (Name, Location, Bio, etc)

For names: Twitter takes the first approach. Facebook takes the latter. I would prefer the latter as well. https://www.facebook.com/help/217868321565724

Branding & communications

A name & message that translates well across linguistic and cultural borders. (Will also bring this up in #41)

Love to hear what everyone else thinks!


@shea256 commented on Wed Apr 15 2015

Namespace

Interesting, currently we use the existing DNS character set (latin alphanumerics plus dashes) to save space in the blockchain. So instead of embedding each domain character in a byte and working in ascii/base 256, we're compressing the information into base 40, which let's us pack 1.5x the characters (so domains can be something like 24 letters max instead of just 16).

Do you think it would be valuable to use bytes instead? This would allow for all ASCII characters to be encoded, and with a special encoding it could be easily extended to unicode.

Multiple Languages

Hm, I like the latter as well. One way to do this would be to include an object within the top-level object called something like "translations" or "language" that provides the translated names in other languages along with their mappings.

Branding & Communications

Yes, I agree that we need to be cognizant of making sure our branding and communications translate well across languages. Looks like we're mostly set on passname, but that should translate ok right? Doesn't have to be an exact one. Also, as you mentioned in the other thread, we can use various carefully chosen phrases to describe it.


@larrysalibra commented on Sat Apr 18 2015

Do you think it would be valuable to use bytes instead? This would allow for all ASCII characters to be encoded, and with a special encoding it could be easily extended to unicode.

It could make sense if our goal is to support all unicode characters. Supporting all unicode characters leads to other complexities like homograph attacks and how (if?) we want to try to prevent them (say by excluding confusables from the namespace).


@shea256 commented on Sat Apr 18 2015

Hm, yeah because of homograph attacks, I don't think this can work, even by excluding confusables. It's just too hard of a task and the attack space is so large there'll always be something you miss.

Also, even if we figured it out, I just realized that we want this name system to be backwards compatible with ICANN DNS, so we need to support the same character set or a subset of those characters.

Moreover, it'd be ideal just to use the same rules and encodings as the existing system. There are a ton of advantages of this. Thus I'd make the case for going with byte encoding in blockstore and limiting the valid character set to alpha-numeric-dash (just like ICANN DNS).

In ICANN DNS, a peace sign is encoded as xn--7bi, so the domain would look like this: xn--7bi.com (which redirects to angel.co). The same would go for our TLDs, so it's really simple to add native browser support for them.


@larrysalibra commented on Sat Apr 18 2015

IDN is still poorly implemented and therefore not that widely adopted. Case in point: iOS mail app bug below: 7bi.com definitely doesn't forward to angel.co

Just to make sure I understand, are you suggesting at some point Passcard will use IDN encoding like ICANN DNS? If so, will we see the same rather "lazy" (slow, sloppy) implementation in the ecosystem?

Or are you saying we should stick with ^[a-z0-9_]{1,60}$?

It's just too hard of a task and the attack space is so large there'll always be something you miss.

Yes. Too complicated.

img_0169


@larrysalibra commented on Fri Oct 06 2017

Some of the reasons we don't use unicode in name registrations:

  1. The main reason is that we want to be backwards compatible with existing DNS infrastructure which uses Punycode to store unicode characters in ascii names
  2. When we designed this, bitcoin's OP_RETURN size was 1/2 as big and unicode characters take up a lot more space which means would have been limited in length.
  3. Uniqueness in internationalize names is a very hard problem: should 中國 be consider as the same or different name than 中国?To users reading them they are the same, but to computers they are different names.

@larrysalibra commented on Fri Oct 06 2017

On the topic of uniqueness being a challenge in the full unicode space - this is a problem in the latin alphabet as well which is why DNS names (and our names) are not case sensitive. LaRrY.iD is the same as larry.id.

@jcnelson
Copy link
Member

Without executing a hard fork, the BNS name alphabet allows punycode-encoded UTF-8 names. However, our client-side software (i.e. the explorer, the authenticator, the CLI, etc.) will need to be updated to convert punycode between the BNS alphabet and UTF-8.

When Stacks v2 goes live, BNS will be re-implemented as a smart contract. We'll have a chance then to remove the 40-character alphabet restriction and simply treat all BNS names as a sequence of bytes (which could be interpreted by client software as UTF-8 strings).

@stale
Copy link

stale bot commented Apr 17, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 17, 2020
@stale
Copy link

stale bot commented Apr 25, 2020

This issue has been automatically closed. Please reopen if needed.

@stale stale bot closed this as completed Apr 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants