Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smaller tables #68

Merged
merged 13 commits into from
Jul 12, 2016
Merged

Smaller tables #68

merged 13 commits into from
Jul 12, 2016

Conversation

benibela
Copy link
Contributor

See #67

@ivarne
Copy link

ivarne commented Jun 1, 2016

Seems like travis is failing because of an expired cert

# use JuliaLang caching (https://github.com/staticfloat/cache.julialang.org)
# so that Travis builds do not depend on anyone's flaky servers but our own
    - URLCACHE=https://cache.e.ip.saba.us/

cc: @staticfloat

@ivarne
Copy link

ivarne commented Jun 1, 2016

The travis failiure will probably be fixed by #69

@tkelman
Copy link
Contributor

tkelman commented Jun 1, 2016

It might need a rebase and force push though. Changes to .travis.yml might not propagate automatically by just restarting the build.

@stevengj
Copy link
Member

stevengj commented Jun 1, 2016

Needs a version bump. Since this is a technically a backwards incompatible change, we'll need to change the major version?

Maybe we should take the opportunity to export other accessor functions, like utf8proc_casefold(codepoint), utf8proc_bidi(codepoint), ...?

We could unexport the utf8proc_property_struct entirely, replacing it by utf8proc_property(codepoint) functions. Theoretically, this would make it less efficient to extract multiple properties of a codepoint at once, but I'm skeptical that this is common in practice.

@stevengj
Copy link
Member

is this ready to merge?

@benibela
Copy link
Contributor Author

Unless you want to increase the version number first for people using the property struct directly.

And Unicode 9 was just released. Might be worth checking if it still fits in the tables. Some are almost full.

@Keno Keno mentioned this pull request Jun 25, 2016
@stevengj
Copy link
Member

Needs a rebase and update now that Unicode 9 support has been merged.

* title-case character, if any; otherwise (if there is no title-case
* variant, or if `c` is not a valid codepoint) return `c`.
*/
UTF8PROC_DLLEXPORT utf8proc_int32_t utf8proc_totitle(utf8proc_int32_t c);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be equivalent to toupper if there is no titlecase?

@stevengj
Copy link
Member

stevengj commented Jul 2, 2016

Awesome, thanks for updating this. Did you have any problems with the table size?

@stevengj
Copy link
Member

stevengj commented Jul 2, 2016

What's the overall reduction in (compiled) table size from this patch?

@benibela
Copy link
Contributor Author

benibela commented Jul 2, 2016

200 k (42%)

7975 elements in the sequence array now (from 7834). At 8192 it gets troublesome

@stevengj stevengj merged commit eeebf70 into JuliaStrings:master Jul 12, 2016
lencode = 7
end
idx = pushary(array)
raise "Array index out of bound" if idx > 0x1FFF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exception triggered in Unicode 14. @benibela, what do you suggest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevengj one bit needs to be taken from lencode (7 -> 3) and given to idx (0x1FFF -> 0x3FFF, 13 -> 14)

and the same in seqindex_write_char_decomposed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make a PR?

@benibela benibela mentioned this pull request Dec 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants