Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to Unicode 13.0.0 #31

Merged
merged 2 commits into from May 11, 2020
Merged

Update to Unicode 13.0.0 #31

merged 2 commits into from May 11, 2020

Conversation

adithyaov
Copy link
Member

Closes #28

@Bodigrim
Copy link
Collaborator

Bodigrim commented May 4, 2020

It seems that some builds have cached an old one, Unicode 12.1 unicode-data/ucdxml/ucd.all.flat.pdb and thus fail. Plus the installed icu must be the latest, version 66+. Locally all works fine for me.

@harendra-kumar
Copy link
Member

We are currently using an xml file ucdxml/ucd.all.flat.zip for parsing the unicode data. Every time we add a new one it adds to the size of the repository, I am not sure if git is able to create a delta efficiently for this file (.git directory size on my machine is 36 MB). Also, it also takes a lot of time parsing the xml, though it is only a one time job. It contains all the unicode database even a lot of stuff that we do not need.

I am wondering if we should switch to the text files - https://www.unicode.org/Public/13.0.0/ucd/ . That will reduce the size and speed significantly because we need only a few of these and git should be able to handle the incremental changes to these files much more efficiently. It requires some effort and testing though. We can consider this when creating a new package for the data.

Copy link
Member

@harendra-kumar harendra-kumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you separate the version bump changes to a separate PR? That should be the last commit before we decide to release.

@adithyaov adithyaov force-pushed the unicode-update-13 branch 3 times, most recently from dee1118 to ed239d9 Compare May 5, 2020 01:21
@adithyaov
Copy link
Member Author

adithyaov commented May 5, 2020

It seems that some builds have cached an old one, Unicode 12.1 unicode-data/ucdxml/ucd.all.flat.pdb and thus fail. Plus the installed icu must be the latest, version 66+. Locally all works fine for me.

@Bodigrim Since the pr CI uses the Github's merge-commit, I'm guessing the problem was due to the git's merge strategy.

@adithyaov
Copy link
Member Author

adithyaov commented May 5, 2020

Please note that updating to Unicode 13 did not change anything. The changes are because of executing ucd2haskell before the update.

@harendra-kumar
Copy link
Member

Why would running ucd2haskell before the update change anything at all? Any changes should come after the update. Maybe you had a newer version .pdb lying around in your workspace when you ran it, it may be from your earlier experiments.

When I run $ cabal run ucd2haskell -- ucdxml/ucd.all.flat.xml ../Data/Unicode/Properties/ in my workspace, it does not change anything at all.

Can you try cloning the repo in a fresh workspace and run it?

@adithyaov
Copy link
Member Author

adithyaov commented May 6, 2020

You're right, I found that strange too. It looks like I messed up.

@Bodigrim
Copy link
Collaborator

Is there anything unresolved left?

@adithyaov
Copy link
Member Author

It should be mergeable now.

@harendra-kumar harendra-kumar merged commit d6c8974 into master May 11, 2020
@harendra-kumar harendra-kumar deleted the unicode-update-13 branch April 18, 2021 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update to Unicode version 13
3 participants