Non-Crockford's Base32 letters converted differently in Java or Python implementations #58

eberbis · 2017-10-25T17:50:29Z

Hi Andrew,

first of all, thanks for the amazing library, we've been using a lot!

I have a doubt regarding how we fix the conversion of ULIDs which are not following Crockford's Base32 standard.

We are using Lua to generate some guids (https://github.com/Tieske/ulid.lua) and for some reason, we get from time to time letters outside the Crockford's Base32.
While trying to fix this on our side (we're not sure how this is happening to be honest), we realised that Java and Python implementations silently corrects this issue in different ways:

Java

ULID.Value ulidValueFromString = ULID.parseULID("01BX73KC0TNH409RTFD1JXKmO0")
--> "01BX73KC0TNH409RTFD1JXKM00"

mO is silently converted into M0

Python

In [1]: import ulid

In [2]: u = ulid.from_str('01BX73KC0TNH409RTFD1JXKmO0')

In [3]: u
Out[3]: <ULID('01BX73KC0TNH409RTFD1JXKQZ0')>

In [4]: u.str
Out[4]: '01BX73KC0TNH409RTFD1JXKQZ0'

mO is silently converted into QZ

Shouldn't the python library behave as the Java one as per the Crockford's Base32 spec, converting L and I to 1 and O to 0 and only upper casing lower case letters instead of changing them?

Thanks a lot in advance!

Eddie

The text was updated successfully, but these errors were encountered:

ahawker · 2017-10-25T18:27:46Z

Eddie,

Thanks for reporting the issue. I've got a couple ideas but this will need some deeper investigation. No promises but I should have some free time to dive-in within the next couple of days.

Best,
Andrew

ahawker · 2017-10-26T03:19:24Z

Eddie,

Thanks again for reporting this and I apologize for the issue!

I merged in a fix and pushed out version 0.0.5 that should address it.

Best,
Andrew

eberbis · 2017-10-26T09:12:02Z

Thanks Andrew, that was super fast!

eberbis · 2017-10-26T10:32:49Z

Andrew,

sorry to be a pain.

I was testing those cases and they now work perfectly (thanks again for being so fast!).
While doing so, I think, I may have found a related issue. I saw that when decoding u/U values, in our version we silently convert those to a new value. This is an example:

In [6]: u = ulid.from_str('01BX73KC0TNH409RTFD1UXKM00')

In [7]: u
Out[7]: <ULID('01BX73KC0TNH409RTFD3ZXKM00')>

In [8]: u.str
Out[8]: '01BX73KC0TNH409RTFD3ZXKM00'

1U is silently converted to 3Z.

I checked Crockford's Base32 and values u/U are regarded as (and I quote) an Accidental obscenity...

I'm guessing we should error out if we encounter those? I checked the Java implementation we're currently using and we do get an IllegalArgumentException for those cases (this hilarious test reflects it as well):

java.lang.IllegalArgumentException: Illegal character 'U'!

Is this behaviour something we should also be compliant with in the Python version?

Thanks again!

Eddie

ahawker · 2017-10-26T16:11:01Z

Eddie,

Yes, this observation is correct. I was working on changes for this last night as well but didn't have time to finish so I just got out a quick release to the initial issue.

The current implementation will validate that the input string is at least in ASCII but yes, there are certain characters that don't overlap between that and Base32, so those will sneak in.

The code to solve this is relatively straight forward but I want to run it though some performance benchmarks to see the impact before committing to an implementation design.

I've created #60 to track this.

Best,
Andrew

eberbis · 2017-10-26T17:30:09Z

Thanks Andrew!

ahawker · 2017-10-28T23:59:27Z

Eddie,

I merged in a fix tonight and closed #60. Version 0.0.6 on PyPI should contain all of these changes.

Let me know if it works or if you run into any regressions.

Thanks again for reporting this!

Best,
Andrew

eberbis · 2017-10-29T10:20:23Z

Thanks Andrew,

just made a few examples on my local and now it raises a ValueError if there's a non-base32 character found.

Thanks again for the swift response to this!

Eddie

ahawker added the investigation label Oct 25, 2017

ahawker self-assigned this Oct 25, 2017

ahawker mentioned this issue Oct 26, 2017

Properly base32 decode 'iIoOlL' characters. #59

Merged

ahawker closed this as completed in #59 Oct 26, 2017

ghost removed the investigation label Oct 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-Crockford's Base32 letters converted differently in Java or Python implementations #58

Non-Crockford's Base32 letters converted differently in Java or Python implementations #58

eberbis commented Oct 25, 2017 •

edited

Loading

ahawker commented Oct 25, 2017

ahawker commented Oct 26, 2017 •

edited

Loading

eberbis commented Oct 26, 2017

eberbis commented Oct 26, 2017

ahawker commented Oct 26, 2017

eberbis commented Oct 26, 2017

ahawker commented Oct 28, 2017

eberbis commented Oct 29, 2017

Non-Crockford's Base32 letters converted differently in Java or Python implementations #58

Non-Crockford's Base32 letters converted differently in Java or Python implementations #58

Comments

eberbis commented Oct 25, 2017 • edited Loading

Java

Python

ahawker commented Oct 25, 2017

ahawker commented Oct 26, 2017 • edited Loading

eberbis commented Oct 26, 2017

eberbis commented Oct 26, 2017

ahawker commented Oct 26, 2017

eberbis commented Oct 26, 2017

ahawker commented Oct 28, 2017

eberbis commented Oct 29, 2017

eberbis commented Oct 25, 2017 •

edited

Loading

ahawker commented Oct 26, 2017 •

edited

Loading