Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-Crockford's Base32 letters converted differently in Java or Python implementations #58

Closed
eberbis opened this issue Oct 25, 2017 · 8 comments
Assignees

Comments

@eberbis
Copy link

eberbis commented Oct 25, 2017

Hi Andrew,

first of all, thanks for the amazing library, we've been using a lot!

I have a doubt regarding how we fix the conversion of ULIDs which are not following Crockford's Base32 standard.

We are using Lua to generate some guids (https://github.com/Tieske/ulid.lua) and for some reason, we get from time to time letters outside the Crockford's Base32.
While trying to fix this on our side (we're not sure how this is happening to be honest), we realised that Java and Python implementations silently corrects this issue in different ways:

Java

ULID.Value ulidValueFromString = ULID.parseULID("01BX73KC0TNH409RTFD1JXKmO0")
--> "01BX73KC0TNH409RTFD1JXKM00"

mO is silently converted into M0

Python

In [1]: import ulid

In [2]: u = ulid.from_str('01BX73KC0TNH409RTFD1JXKmO0')

In [3]: u
Out[3]: <ULID('01BX73KC0TNH409RTFD1JXKQZ0')>

In [4]: u.str
Out[4]: '01BX73KC0TNH409RTFD1JXKQZ0'

mO is silently converted into QZ

Shouldn't the python library behave as the Java one as per the Crockford's Base32 spec, converting L and I to 1 and O to 0 and only upper casing lower case letters instead of changing them?

Thanks a lot in advance!

Eddie

@ahawker ahawker self-assigned this Oct 25, 2017
@ahawker
Copy link
Owner

ahawker commented Oct 25, 2017

Eddie,

Thanks for reporting the issue. I've got a couple ideas but this will need some deeper investigation. No promises but I should have some free time to dive-in within the next couple of days.

Best,
Andrew

@ahawker
Copy link
Owner

ahawker commented Oct 26, 2017

Eddie,

Thanks again for reporting this and I apologize for the issue!

I merged in a fix and pushed out version 0.0.5 that should address it.

Best,
Andrew

@eberbis
Copy link
Author

eberbis commented Oct 26, 2017

Thanks Andrew, that was super fast!

@eberbis
Copy link
Author

eberbis commented Oct 26, 2017

Andrew,

sorry to be a pain.

I was testing those cases and they now work perfectly (thanks again for being so fast!).
While doing so, I think, I may have found a related issue. I saw that when decoding u/U values, in our version we silently convert those to a new value. This is an example:

In [6]: u = ulid.from_str('01BX73KC0TNH409RTFD1UXKM00')

In [7]: u
Out[7]: <ULID('01BX73KC0TNH409RTFD3ZXKM00')>

In [8]: u.str
Out[8]: '01BX73KC0TNH409RTFD3ZXKM00'

1U is silently converted to 3Z.

I checked Crockford's Base32 and values u/U are regarded as (and I quote) an Accidental obscenity...

I'm guessing we should error out if we encounter those? I checked the Java implementation we're currently using and we do get an IllegalArgumentException for those cases (this hilarious test reflects it as well):

java.lang.IllegalArgumentException: Illegal character 'U'!

Is this behaviour something we should also be compliant with in the Python version?

Thanks again!

Eddie

@ahawker
Copy link
Owner

ahawker commented Oct 26, 2017

Eddie,

Yes, this observation is correct. I was working on changes for this last night as well but didn't have time to finish so I just got out a quick release to the initial issue.

The current implementation will validate that the input string is at least in ASCII but yes, there are certain characters that don't overlap between that and Base32, so those will sneak in.

The code to solve this is relatively straight forward but I want to run it though some performance benchmarks to see the impact before committing to an implementation design.

I've created #60 to track this.

Best,
Andrew

@eberbis
Copy link
Author

eberbis commented Oct 26, 2017

Thanks Andrew!

@ahawker
Copy link
Owner

ahawker commented Oct 28, 2017

Eddie,

I merged in a fix tonight and closed #60. Version 0.0.6 on PyPI should contain all of these changes.

Let me know if it works or if you run into any regressions.

Thanks again for reporting this!

Best,
Andrew

@eberbis
Copy link
Author

eberbis commented Oct 29, 2017

Thanks Andrew,

just made a few examples on my local and now it raises a ValueError if there's a non-base32 character found.

Thanks again for the swift response to this!

Eddie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants