Skip to content

Commit

Permalink
issue-2 - exception while sorting wide characters is fixed
Browse files Browse the repository at this point in the history
  • Loading branch information
CaballerosTeam committed Jun 12, 2018
1 parent 0c2d082 commit 9685403
Show file tree
Hide file tree
Showing 7 changed files with 69 additions and 8 deletions.
1 change: 1 addition & 0 deletions MANIFEST
Expand Up @@ -6,6 +6,7 @@ t/Sort-Naturally-XS.t
t/Sort-Naturally-XS-ncmp.t
t/Sort-Naturally-XS-nsort.t
t/Sort-Naturally-XS-sorted.t
t/Sort-Naturally-XS-wchar.t
Changes
const-c.inc
const-xs.inc
Expand Down
2 changes: 1 addition & 1 deletion README
Expand Up @@ -22,7 +22,7 @@ Sergey Yurzin, [jurzin.s@gmail.com](mailto:jurzin.s@gmail.com)

COPYRIGHT AND LICENSE

Copyright (C) 2017 by Sergey Yurzin
Copyright (C) 2018 by Sergey Yurzin

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.18.2 or,
Expand Down
15 changes: 14 additions & 1 deletion README.md
Expand Up @@ -160,6 +160,19 @@ argument:
# $result_ca contains A, a, B, b, C, c
```

Also, make sure your list does not contain "wide characters", otherwise "Wide character in subroutine entry" exception
will be thrown. Be vigilant if `use utf8` is in effect or your source code contains multibyte characters. It's a
developer's responsibility to explicitly encode characters in a target encoding:

```perl
use utf8;
use Encode;
use Sort::Naturally::XS qw/sorted/;

my $fruits = [qw/яблоко банан манго киви груша/];
my $result = sorted([map {Encode::encode('utf8', $_)} @{$fruits}], locale => 'ru_RU.utf8');
```

Note: due to the complexity of a cross-platform support, a locale aware sorting is guaranteed on Unix-like operating
systems only.

Expand Down Expand Up @@ -228,7 +241,7 @@ Sergey Yurzin, [jurzin.s@gmail.com](mailto:jurzin.s@gmail.com)

## COPYRIGHT AND LICENSE

Copyright (C) 2017 by Sergey Yurzin
Copyright (C) 2018 by Sergey Yurzin

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.18.2 or,
Expand Down
8 changes: 4 additions & 4 deletions XS.xs
Expand Up @@ -13,16 +13,16 @@
static I32
S_sv_ncmp(pTHX_ SV *a, SV *b)
{
const char *ia = (const char *) SvPVbyte_nolen(a);
const char *ib = (const char *) SvPVbyte_nolen(b);
const char *ia = (const char *) SvPVutf8_nolen(a);
const char *ib = (const char *) SvPVutf8_nolen(b);
return _ncmp(ia, ib, 0, 0);
}

static I32
S_sv_ncmp_reverse(pTHX_ SV *a, SV *b)
{
const char *ia = (const char *) SvPVbyte_nolen(a);
const char *ib = (const char *) SvPVbyte_nolen(b);
const char *ia = (const char *) SvPVutf8_nolen(a);
const char *ib = (const char *) SvPVutf8_nolen(b);
return _ncmp(ia, ib, 1, 0);
}

Expand Down
13 changes: 12 additions & 1 deletion lib/Sort/Naturally/XS.pm
Expand Up @@ -163,6 +163,17 @@ keyword argument:
my $result_ca = sorted($list, locale => 'en_CA.utf8');
# $result_ca contains A, a, B, b, C, c
Also, make sure your list does not contain "wide characters", otherwise "Wide character in subroutine entry" exception
will be thrown. Be vigilant if C<use utf8> is in effect or your source code contains multibyte characters. It's a
developer's responsibility to explicitly encode characters in a target encoding:
use utf8;
use Encode;
use Sort::Naturally::XS qw/sorted/;
my $fruits = [qw/яблоко банан манго киви груша/];
my $result = sorted([map {Encode::encode('utf8', $_)} @{$fruits}], locale => 'ru_RU.utf8');
Note: due to the complexity of a cross-platform support, a locale aware sorting is guaranteed on Unix-like operating
systems only.
Expand Down Expand Up @@ -237,7 +248,7 @@ Sergey Yurzin, L<jurzin.s@gmail.com|mailto:jurzin.s@gmail.com>
=head1 COPYRIGHT AND LICENSE
Copyright (C) 2017 by Sergey Yurzin
Copyright (C) 2018 by Sergey Yurzin
This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.18.2 or,
Expand Down
1 change: 0 additions & 1 deletion t/Sort-Naturally-XS-sorted.t
Expand Up @@ -109,7 +109,6 @@ SKIP: {
my $ar_ca_local__actual = sorted($ar_en_local, locale => $locale);
ok(eq_array($ar_ca_local__expected, $ar_ca_local__actual), 'Locale CA test');
}

}

done_testing();
37 changes: 37 additions & 0 deletions t/Sort-Naturally-XS-wchar.t
@@ -0,0 +1,37 @@
#!/usr/bin/perl

use strict;
use warnings;
use Test::More;
use Sort::Naturally::XS qw/nsort ncmp sorted/;
use List::Util qw/first/;
use utf8;
use Config;

my $ar_mixed_utf8 = [qw/Як-100 Ка-8 Ми-20 Ка-10 Ка-26 Ка-15 Ка-25 Ми-4 Ми-6 Ми-8 Ка-31 Ми-14 Ми-24 Ка-18 Ка-22 Ми-26
Ми-30 Ми-171 Як-24 Як-60 Ка-27 Ка-29 Ка-32 Ка-126 Ми-10 Ми-1/];
my $ar_mixed_utf8__expected = [qw/Ка-8 Ка-10 Ка-15 Ка-18 Ка-22 Ка-25 Ка-26 Ка-27 Ка-29 Ка-31 Ка-32 Ка-126 Ми-1 Ми-4 Ми-6
Ми-8 Ми-10 Ми-14 Ми-20 Ми-24 Ми-26 Ми-30 Ми-171 Як-24 Як-60 Як-100/];
ok(eq_array($ar_mixed_utf8__expected, [nsort(@{$ar_mixed_utf8})]), "Wide characters in input of 'nsort'");

ok(eq_array($ar_mixed_utf8__expected, [sort {ncmp($a, $b)} @{$ar_mixed_utf8}]), "Wide characters in input of 'ncmp'");

ok(eq_array($ar_mixed_utf8__expected, sorted($ar_mixed_utf8)), "Wide characters in input of 'sorted'");

# issue-2 example
{
no utf8;

my @issue_2_list = ( qq(\x{2603}), q(abc) );
my @issue_2_list__expected = ( q(abc), qq(\x{2603}) );
my @issue_2_list__actual = nsort(@issue_2_list);

ok(eq_array(\@issue_2_list__expected, \@issue_2_list__actual), "Wide character (not letter) in input of 'nsort'");

@issue_2_list__actual = sort {ncmp($a, $b)} @issue_2_list;
ok(eq_array(\@issue_2_list__expected, \@issue_2_list__actual), "Wide character (not letter) in input of 'ncmp'");

ok(eq_array(\@issue_2_list__expected, sorted(\@issue_2_list)), "Wide character (not letter) in input of 'sorted'");
}

done_testing();

0 comments on commit 9685403

Please sign in to comment.