Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream the Unicode database table generator and update the tables to 15 #8617

Merged
merged 4 commits into from
Dec 19, 2022

Conversation

rikkimax
Copy link
Contributor

Right let's see how much this breaks.

@dlang-bot
Copy link
Contributor

dlang-bot commented Oct 31, 2022

Thanks for your pull request and interest in making D better, @rikkimax! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#8617"

@burner
Copy link
Member

burner commented Nov 1, 2022

Considering the size alone of this PR. I would welcome a somewhat long description of what this PR tries to do.
I think I have an idea (update the unicode in phobos), but I might just be overconfident.

@rikkimax
Copy link
Contributor Author

rikkimax commented Nov 1, 2022

Considering the size alone of this PR. I would welcome a somewhat long description of what this PR tries to do. I think I have an idea (update the unicode in phobos), but I might just be overconfident.

The goal is to get the Unicode table generator into Phobos where it belongs.

From what I can tell, it has never been updated and could be ~8 releases out of date.

Unfortunately, people went messing with the tables since then, and not all changes got committed to the generator so it's not quite as simple as just adding it.

The quality of the generator doesn't matter, nor do the tables themselves. As long as it passes the test suite we are good (Dmitry attempted to update the tables to 14 already, and this approach was the outcome of that).

string[string] aliases;
}

PropertyTable general;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to make them thread-local, it bloats the TLS and its effectively global state.

std/internal/unicode_table_generator.d Outdated Show resolved Hide resolved
std/internal/unicode_table_generator.d Outdated Show resolved Hide resolved
std/internal/unicode_table_generator.d Outdated Show resolved Hide resolved
std/internal/unicode_table_generator.d Outdated Show resolved Hide resolved
std/internal/unicode_table_generator.d Outdated Show resolved Hide resolved
std/internal/unicode_table_generator.d Outdated Show resolved Hide resolved
std/internal/unicode_table_generator.d Outdated Show resolved Hide resolved
std/internal/unicode_table_generator.d Outdated Show resolved Hide resolved
@rikkimax
Copy link
Contributor Author

rikkimax commented Nov 1, 2022

@ljmf00 I didn't develop the generator, that was Dmitry Olshansky ~2013. Its quality does not matter as long as it works. If we do not get the generator into Phobos std.uni is effectively dead.

Right now I want to change as little as possible, it needs to work, beyond that it's out of scope of the PR.

@ljmf00
Copy link
Member

ljmf00 commented Nov 1, 2022

@ljmf00 I didn't develop the generator, that was Dmitry Olshansky ~2013. Its quality does not matter as long as it works. If we do not get the generator into Phobos std.uni is effectively dead.

Right now I want to change as little as possible, it needs to work, beyond that it's out of scope of the PR.

I didn't request changes tho, just suggestions :) Some scope and DIP1000 changes may help in discovering memory corruption.

@rikkimax
Copy link
Contributor Author

rikkimax commented Nov 1, 2022

@ljmf00 I didn't develop the generator, that was Dmitry Olshansky ~2013. Its quality does not matter as long as it works. If we do not get the generator into Phobos std.uni is effectively dead.
Right now I want to change as little as possible, it needs to work, beyond that it's out of scope of the PR.

I didn't request changes tho, just suggestions :) Some scope and DIP1000 changes may help in discovering memory corruption.

All good, I just don't want people wasting their effort on this PR for code review, that stuff can come later once the harder unicode parts are resolved.

source\std\uni\package.d(1534,57): Error: cannot take address of local `idxArray` in `@safe` function `__unittest_L1531_C7`

I'm getting these errors, I'd love it if someone could verify if std.uni is even getting unittested right now.

@rikkimax
Copy link
Contributor Author

rikkimax commented Nov 2, 2022

Okay looks like the testsuite has got the brilliant idea that it needs to run the generator.

std.exception.ErrnoException@std/stdio.d(547): Cannot open file ucd-15/Blocks.txt' in mode rb' (No such file or directory)

But otherwise its looking like its passing :D

@ibuclaw
Copy link
Member

ibuclaw commented Nov 2, 2022

Okay looks like the testsuite has got the brilliant idea that it needs to run the generator.

std.exception.ErrnoException@std/stdio.d(547): Cannot open file ucd-15/Blocks.txt' in mode rb' (No such file or directory)

But otherwise its looking like its passing :D

The generator should sit outside of the std source tree, such as in a tools directory.

posix.mak Outdated Show resolved Hide resolved
posix.mak Outdated Show resolved Hide resolved
Copy link
Member

@ibuclaw ibuclaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't see anywhere else that generated code would fall foul of the crude style checks.

tools/unicode_table_generator.d Outdated Show resolved Hide resolved
@rikkimax
Copy link
Contributor Author

rikkimax commented Nov 2, 2022

Buildkite is failing due to a bug in mir-algorithm, so it's safe to assume as of right now, that this is green!

posix.mak Outdated Show resolved Hide resolved
posix.mak Outdated Show resolved Hide resolved
@ibuclaw
Copy link
Member

ibuclaw commented Nov 2, 2022

Buildkite is failing due to a bug in mir-algorithm, so it's safe to assume as of right now, that this is green!

It's only failing with this PR though, so perhaps you've indirectly affected something?

https://github.com/libmir/mir-algorithm/blob/aa1e914663999eb94d89e84a30c645ea94def179/source/mir/format.d#L909-L918

@ibuclaw
Copy link
Member

ibuclaw commented Nov 2, 2022

Buildkite is failing due to a bug in mir-algorithm, so it's safe to assume as of right now, that this is green!

It's only failing with this PR though, so perhaps you've indirectly affected something?

https://github.com/libmir/mir-algorithm/blob/aa1e914663999eb94d89e84a30c645ea94def179/source/mir/format.d#L909-L918

Answer, yes: see Mir's implementation of print

https://github.com/libmir/mir-algorithm/blob/e0acce4313a26385d2e461f58ec5a1a7a6a3acc1/source/mir/format.d#L882-L906

So I'd consider you to be the owner of fixing Mir if you're going to break downstream. :-)

@rikkimax
Copy link
Contributor Author

rikkimax commented Nov 2, 2022

Buildkite is failing due to a bug in mir-algorithm, so it's safe to assume as of right now, that this is green!

It's only failing with this PR though, so perhaps you've indirectly affected something?
https://github.com/libmir/mir-algorithm/blob/aa1e914663999eb94d89e84a30c645ea94def179/source/mir/format.d#L909-L918

Answer, yes: see Mir's implementation of print

https://github.com/libmir/mir-algorithm/blob/e0acce4313a26385d2e461f58ec5a1a7a6a3acc1/source/mir/format.d#L882-L906

So I'd consider you to be the owner of fixing Mir if you're going to break downstream. :-)

Yeah it's calling into isGraphical, and the character in question wasn't allocated before 7 so it couldn't be graphical in previous tables.

Should be an easy fix, remove one character from the unittest methodology effectively.

@rikkimax
Copy link
Contributor Author

rikkimax commented Nov 3, 2022

Looks like I'll have to remove myself as a code owner since I don't have write permissions

@rikkimax rikkimax force-pushed the unicode_tables branch 2 times, most recently from 14f643e to 4145185 Compare November 3, 2022 08:29
@RazvanN7
Copy link
Collaborator

cc @atilaneves . Any thoughts on this?

@atilaneves
Copy link
Contributor

I agree with @ibuclaw in that the generator should probably be in tools but it seems like it is already?

@rikkimax
Copy link
Contributor Author

rikkimax commented Nov 15, 2022

I agree with @ibuclaw in that the generator should probably be in tools but it seems like it is already?

It wasn't originally, but I applied that change since it made sense.

The question is can we merge? I wanna get Turkic support into caseless matching ;)

@rikkimax
Copy link
Contributor Author

std/traits.d(8779): Error: static assert: 1LU == 2LU is false

what the?

I see in another PR has this exact issue for the same failed tests. So not my fault, phew.

@RazvanN7
Copy link
Collaborator

Rebasing to latest master will fix this.

@rikkimax
Copy link
Contributor Author

Oh dangit I shouldn't have let SourceTree merge branches like that. Oh well, I'll let CI run and confirm the fix for prior error.

@RazvanN7 RazvanN7 added the Merge:72h no objection -> merge The PR will be merged if there are no objections raised. label Dec 15, 2022
@RazvanN7
Copy link
Collaborator

@atilaneves @ibuclaw any other objections or is this good to go?

@RazvanN7 RazvanN7 merged commit 67d4521 into dlang:master Dec 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merge:72h no objection -> merge The PR will be merged if there are no objections raised.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants