Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base32 for shorter urls #43

Closed
xyb opened this issue May 5, 2013 · 6 comments
Closed

Base32 for shorter urls #43

xyb opened this issue May 5, 2013 · 6 comments

Comments

@xyb
Copy link

xyb commented May 5, 2013

Since we have base64-like encoding shorter urls(#42), I perform Crockford's Base32 encoding which is more human readable in case of you write it down.

@sametmax
Copy link
Contributor

I'm looking for someone with the skills required to assess this as I'm not good enough with prob / crypto to assess the impact of this. I haven't forget to process it. Sorry for the delay.

@xyb
Copy link
Author

xyb commented Jun 3, 2013

That's ok, I'm not familiar with JS too.

@Natim
Copy link
Contributor

Natim commented Jun 20, 2014

It seems that sjcl JS doesn't have any base32 codec.
I found one available here: https://github.com/agnoster/base32-js/blob/master/lib/base32.js

@sametmax
Copy link
Contributor

Actually, we can make the URL even shorter if we turn the scjl result
in something using
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ as a
base.

Now, in Python, I know I can convert to and from any base in base 10
using something like:

   from __future__ import division

   # symbole table
   SYM2VAL = {l: i for i, l in enumerate(string.uppercase.replace("O", 
""))}
   VAL2SYM = dict(enumerate(string.uppercase.replace("O", "")))

   def to_base_10(string, table):
       """ Convert from a custom base to base 10 """
       i = 0
       base = len(table)
       for c in string:
           i *= base
           i += table[c]
       return i

   def from_base_10(i, table):
       """ Convert from a base 10 to a custom base"""
       array = []
       base = len(table)
       if i == 0:
           return "A"
       while i:
           i, value = divmod(i, base)
           array.append(table[value])
       return ''.join(reversed(array))

So we don't need to install an additional lib. We can port this to JS,
get
SJCL result, convert it to base 10, then to our custom base.

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ if
shortness
is the priority, http://www.crockford.com/wrmg/base32.html if we want
standard
readable encoding, or some adaptation of our own.

I like 346789abcdefghijklmnopqrtuvwxyABCDEFGHIJKMNPQRTUVWXY because
it's still
a base 52 (URLS will be very short), but it strip any char that can be
confused
such as :

1, l, L
0, o
2, z, Z
5, S, s

It stills requires people to tell if it's lower case or uppercase, but
I think
it's ok.

@Natim
Copy link
Contributor

Natim commented Jun 20, 2014

Yes it seems to me that base64 is shorter actually:

>>> base64.b64encode("Toto la ricolala")
'VG90byBsYSByaWNvbGFsYQ=='
>>> base64.b32encode("Toto la ricolala")
'KRXXI3ZANRQSA4TJMNXWYYLMME======'

@sametmax
Copy link
Contributor

Yep. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants