fix #46 (make sure symbol-like codepoints have nonzero width) #47

stevengj · 2015-06-24T18:08:51Z

This uses width 1 for symbols that are missing from Unifont but have categories indicating that they have nonzero width. See #46.

It also corrects for a few apparent bugs in Unifont's widths (https://savannah.gnu.org/bugs/index.php?45395).

…they aren't in Unifont)

ScottPJones · 2015-06-24T18:12:53Z

👍 LGTM

stevengj · 2015-06-24T21:29:54Z

@jiahao, is this the same failure you were seeing in #45?

jiahao · 2015-06-25T22:13:04Z

It's very similar. The mdf5 hashes agree for both the cached and original versions of UnicodeData.txt - 3a83069e69e2a9101dc4749593cd3268. Cannot reproduce locally.

jiahao · 2015-06-25T22:15:02Z

My ruby version is ruby 2.1.2p95 (2014-05-08) [x86_64-linux-gnu]

jiahao · 2015-06-25T22:31:45Z

Looks like we are seeing ruby version-specific behavior. I get a different utf8proc_data.c output on ruby 2.0.0p481 (2014-05-08 revision 45883) [universal.x86_64-darwin14] with the same UnicodeData.txt. Our Travis build uses ruby-1.9.3-p551.

stevengj · 2015-06-25T23:07:47Z

I'm using Ruby 2.0.0p481.

Perhaps the culprit is the last few lines of data_generator.rb:

$stdout << "const utf8proc_int32_t utf8proc_combinations[] = {\n  "
i = 0
comb1st_indicies.keys.each_index do |a|
  comb2nd_indicies.keys.each_index do |b|
    i += 1
    if i == 8
      i = 0
      $stdout << "\n  "
    end
    $stdout << ( comb_array[a][b] or -1 ) << ", "
  end
end
$stdout << "};\n\n"

It looks like the output order could depend on the order of the keys in a hash table. Probably we should just sort them.

ScottPJones · 2015-06-25T23:09:19Z

Maybe we should just rewrite that in a better language? 😀 I happen to know a very nice one!

stevengj · 2015-06-26T15:01:51Z

Nope, that wasn't it.

jiahao · 2015-06-26T15:26:29Z

I don't think the current Unicode data file is sorted

stevengj · 2015-06-26T16:44:45Z

I regenerated the unicode_data.c file and it didn't change for me...

stevengj · 2015-06-26T18:29:13Z

Okay, whatever @jiahao did seems to have worked.

fix #46 (make sure symbol-like codepoints have nonzero width even if …

6a7f92d

…they aren't in Unifont)

sort keys to try to eliminate data dependence on Ruby version

eefdaed

jiahao mentioned this pull request Jun 26, 2015

Try again to update Unicode 8 data #49

Merged

stevengj merged commit eefdaed into master Jun 26, 2015

stevengj deleted the more_widths branch June 27, 2015 14:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix #46 (make sure symbol-like codepoints have nonzero width) #47

fix #46 (make sure symbol-like codepoints have nonzero width) #47

stevengj commented Jun 24, 2015

ScottPJones commented Jun 24, 2015

stevengj commented Jun 24, 2015

jiahao commented Jun 25, 2015

jiahao commented Jun 25, 2015

jiahao commented Jun 25, 2015

stevengj commented Jun 25, 2015

ScottPJones commented Jun 25, 2015

stevengj commented Jun 26, 2015

jiahao commented Jun 26, 2015

stevengj commented Jun 26, 2015

stevengj commented Jun 26, 2015

fix #46 (make sure symbol-like codepoints have nonzero width) #47

fix #46 (make sure symbol-like codepoints have nonzero width) #47

Conversation

stevengj commented Jun 24, 2015

ScottPJones commented Jun 24, 2015

stevengj commented Jun 24, 2015

jiahao commented Jun 25, 2015

jiahao commented Jun 25, 2015

jiahao commented Jun 25, 2015

stevengj commented Jun 25, 2015

ScottPJones commented Jun 25, 2015

stevengj commented Jun 26, 2015

jiahao commented Jun 26, 2015

stevengj commented Jun 26, 2015

stevengj commented Jun 26, 2015