Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test: UTF-8 vs latin-1 regression #140

Merged
merged 5 commits into from
Apr 23, 2014
Merged

Test: UTF-8 vs latin-1 regression #140

merged 5 commits into from
Apr 23, 2014

Conversation

joedevivo
Copy link
Contributor

We need to determine wether or not Riak 1.4's app.config and vm.args file could accept UTF-8 values or if they are restricted to latin-1. For example, multi backend bucket names.

If riak 1.4 can accept UTF-8 values, cuttlefish needs to be able to as well. If they can't , then it's desirable for cuttlefish to be able to detect non latin1 files and print an error message, but that might be a 2.0.1 fix.

@joedevivo
Copy link
Contributor Author

Tried the following settings for platform_data_dir in riak 1.4.8

./dataŒ
./dataŸ

which parsed fine, but created the following directories:

drwxr-xr-x   3 joe  staff   102 Apr  2 13:18 dataÅ?
drwxr-xr-x   3 joe  staff   102 Apr  2 13:18 dataŸ

which seems bad.

as for vm.args,

## Name of the riak node
-name riakŒ@127.0.0.1

and the node won't even start:

➜  riak-1.4.8  ./bin/riak console
config is OK
Exec: /Users/joe/Downloads/riak-1.4.8/bin/../erts-5.9.1/bin/erlexec -boot /Users/joe/Downloads/riak-1.4.8/bin/../releases/1.4.8/riak              -config /Users/joe/Downloads/riak-1.4.8/bin/../etc/app.config             -pa /Users/joe/Downloads/riak-1.4.8/bin/../lib/basho-patches             -args_file /Users/joe/Downloads/riak-1.4.8/bin/../etc/vm.args -- console
Root: /Users/joe/Downloads/riak-1.4.8/bin/..
{error_logger,{{2014,4,2},{13,24,58}},"Invalid node name: ~p~n",['riak?\222@127.0.0.1']}
{error_logger,{{2014,4,2},{13,24,58}},crash_report,[[{initial_call,{net_kernel,init,['Argument__1']}},{pid,<0.20.0>},{registered_name,[]},{error_info,{exit,{error,badarg},[{gen_server,init_it,6,[{file,"gen_server.erl"},{line,320}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]}},{ancestors,[net_sup,kernel_sup,<0.10.0>]},{messages,[]},{links,[<0.17.0>]},{dictionary,[{longnames,true}]},{trap_exit,true},{status,running},{heap_size,987},{stack_size,24},{reductions,518}],[]]}
{error_logger,{{2014,4,2},{13,24,58}},supervisor_report,[{supervisor,{local,net_sup}},{errorContext,start_error},{reason,{'EXIT',nodistribution}},{offender,[{pid,undefined},{name,net_kernel},{mfargs,{net_kernel,start_link,[['riak?\222@127.0.0.1',longnames]]}},{restart_type,permanent},{shutdown,2000},{child_type,worker}]}]}
{error_logger,{{2014,4,2},{13,24,58}},supervisor_report,[{supervisor,{local,kernel_sup}},{errorContext,start_error},{reason,shutdown},{offender,[{pid,undefined},{name,net_sup},{mfargs,{erl_distribution,start_link,[]}},{restart_type,permanent},{shutdown,infinity},{child_type,supervisor}]}]}
{error_logger,{{2014,4,2},{13,24,58}},std_info,[{application,kernel},{exited,{shutdown,{kernel,start,[normal,[]]}}},{type,permanent}]}
{"Kernel pid terminated",application_controller,"{application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}}"}

Crash dump was written to: ./log/erl_crash.dump
Kernel pid terminated (application_controller) ({application_start_failure,kernel,{shutdown,{kernel,start,[normal,[]]}}})

@joedevivo
Copy link
Contributor Author

## Cookie for distributed erlang.  All nodes in the same cluster
## should use the same cookie or they will not be able to communicate.
-setcookie riakŒ

(riak@127.0.0.1)1> erlang:get_cookie().
'riakÅ\222'

Looks like UTF-8 that are outside of latin-1 are supported in the erlang cookie, and I can join two nodes with the above cookie with no problem.

@joedevivo
Copy link
Contributor Author

Here's a list of characters that seem good for testing this:

The CP1252 characters that are not part of ANSI/ISO 8859-1, and that should therefore always be encoded as Unicode characters greater than 255, are the following:

 Windows   Unicode    Char.
  char.   HTML code   test         Description of Character
  -----     -----     ---          ------------------------
ALT-0130   &#8218;   ‚    Single Low-9 Quotation Mark
ALT-0131   &#402;    ƒ    Latin Small Letter F With Hook
ALT-0132   &#8222;   „    Double Low-9 Quotation Mark
ALT-0133   &#8230;   …    Horizontal Ellipsis
ALT-0134   &#8224;   †    Dagger
ALT-0135   &#8225;   ‡    Double Dagger
ALT-0136   &#710;    ˆ    Modifier Letter Circumflex Accent
ALT-0137   &#8240;   ‰    Per Mille Sign
ALT-0138   &#352;    Š    Latin Capital Letter S With Caron
ALT-0139   &#8249;   ‹    Single Left-Pointing Angle Quotation Mark
ALT-0140   &#338;    Π   Latin Capital Ligature OE
ALT-0145   &#8216;   ‘    Left Single Quotation Mark
ALT-0146   &#8217;   ’    Right Single Quotation Mark
ALT-0147   &#8220;   “    Left Double Quotation Mark
ALT-0148   &#8221;   ”    Right Double Quotation Mark
ALT-0149   &#8226;   •    Bullet
ALT-0150   &#8211;   –    En Dash
ALT-0151   &#8212;   —    Em Dash
ALT-0152   &#732;    ˜    Small Tilde
ALT-0153   &#8482;   ™    Trade Mark Sign
ALT-0154   &#353;    š    Latin Small Letter S With Caron
ALT-0155   &#8250;   ›    Single Right-Pointing Angle Quotation Mark
ALT-0156   &#339;    œ    Latin Small Ligature OE
ALT-0159   &#376;    Ÿ    Latin Capital Letter Y With Diaeresis

@joedevivo joedevivo self-assigned this Apr 17, 2014
@joedevivo
Copy link
Contributor Author

In riak.conf

##
## Default: ./data
##
## Acceptable values:
##   - the path to a directory
platform_data_dir = ./dataŒ
{platform_data_dir,[46,47,100,97,116,97,338]},
drwxr-xr-x   4 joe  staff   136 Apr 22 08:44 dataŒ
2014-04-22 08:44:48.167 [warning] <0.216.0>@riak_core_ring_manager:reload_ring:355 No ring file available.
2014-04-22 08:44:48.261 [error] <0.222.0> CRASH REPORT Process <0.222.0> with 0 neighbours exited with reason: bad argument in call to eleveldb:destroy([46,47,100,97,116,97,338,47,99,108,117,115,116,101,114,95,109,101,116,97,47,116,114,101,101,115], []) in hashtree:destroy/1 line 262 in gen_server:init_it/6 line 328
2014-04-22 08:44:48.261 [error] <0.202.0> Supervisor riak_core_sup had child riak_core_metadata_hashtree started with riak_core_metadata_hashtree:start_link() at undefined exit with reason bad argument in call to eleveldb:destroy([46,47,100,97,116,97,338,47,99,108,117,115,116,101,114,95,109,101,116,97,47,116,114,101,101,115], []) in hashtree:destroy/1 line 262 in context start_error
2014-04-22 08:44:48.264 [error] <0.200.0> CRASH REPORT Process <0.200.0> with 0 neighbours exited with reason: {{shutdown,{failed_to_start_child,riak_core_metadata_hashtree,{badarg,[{eleveldb,destroy,[[46,47,100,97,116,97,338,47,99,108,117,115,116,101,114,95,109,101,116,97,47,116,114,101,101,115],[]],[]},{hashtree,destroy,1,[{file,"src/hashtree.erl"},{line,262}]},{hashtree_tree,create_node,2,[{file,"src/hashtree_tree.erl"},{line,457}]},{hashtree_tree,new,2,[{file,"src/hashtree_tree.erl"},{line,187}]},{riak_core_metadata_hashtree,init,1,[{file,"src/riak_core_metadata_hashtree.erl"},{line,169}]},{gen_server,...},...]}}},...} in application_master:init/4 line 133
2014-04-22 08:44:48.265 [info] <0.7.0> Application riak_core exited with reason: {{shutdown,{failed_to_start_child,riak_core_metadata_hashtree,{badarg,[{eleveldb,destroy,[[46,47,100,97,116,97,338,47,99,108,117,115,116,101,114,95,109,101,116,97,47,116,114,101,101,115],[]],[]},{hashtree,destroy,1,[{file,"src/hashtree.erl"},{line,262}]},{hashtree_tree,create_node,2,[{file,"src/hashtree_tree.erl"},{line,457}]},{hashtree_tree,new,2,[{file,"src/hashtree_tree.erl"},{line,187}]},{riak_core_metadata_hashtree,init,1,[{file,"src/riak_core_metadata_hashtree.erl"},{line,169}]},{gen_server,...},...]}}},...}

[46,47,100,97,116,97,338,47,99,108,117,115,116,101,114,95,109,101,116,97,47,116,114,101,101,115] = "./dataŒ/cluster_meta/trees"

@joedevivo
Copy link
Contributor Author

distributed_cookie = riakŒ
09:01:41.482 [info] /Users/joe/Downloads/riak-ee-2.0.0beta1/bin/../etc/advanced.config detected, overlaying proplists
escript: exception error: bad argument
  in function  io_lib:format/2
     called as io_lib:format("~s ~s",['-setcookie',[114,105,97,107,338]])
  in call from cuttlefish_vmargs:stringify_line/2 (src/cuttlefish_vmargs.erl, line 17)
  in call from cuttlefish_vmargs:'-stringify/1-lc$^0/1-0-'/1 (src/cuttlefish_vmargs.erl, line 13)
  in call from cuttlefish_vmargs:'-stringify/1-lc$^0/1-0-'/1 (src/cuttlefish_vmargs.erl, line 13)
  in call from cuttlefish_escript:engage_cuttlefish/1 (src/cuttlefish_escript.erl, line 359)
  in call from cuttlefish_escript:generate/1 (src/cuttlefish_escript.erl, line 235) 

@joedevivo
Copy link
Contributor Author

so, as far as data dir goes, 1.4 creates the wrong name, but still starts. 2.0 creates the right name, but riak can't start.

@joedevivo
Copy link
Contributor Author

➜  riak git:(develop) ✗ ./bin/riak console
13:14:28.422 [error] /Users/joe/dev/basho/riak_ee/rel/riak/bin/../etc/riak.conf: Error converting value on line #200 to latin1
Error generating config with cuttlefish
  run `riak config generate -l debug` for more information.

@joedevivo
Copy link
Contributor Author

Eunit's failing on the builder, but passes locally. will investigate.

@joedevivo
Copy link
Contributor Author

All good now. didn't check in the test fixtures

@@ -48,7 +48,15 @@ key <- head:word tail:("." word)* %{
%};

%% A value is any character, with trailing whitespace stripped.
value <- (!((ws* crlf) / comment) .)+ `unicode:characters_to_list(iolist_to_binary(Node))`;
value <- (!((ws* crlf) / comment) .)+ %{
case
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's with the weird line-wrapping and indentation here?

-- whitespace fixes in conf_parse.peg
-- removed no longer applicable comment in favor of something more
      accurate
-- added utf8 unit test to conf_parse.peg
@seancribbs
Copy link
Contributor

👍 8ef24f7

borshop added a commit that referenced this pull request Apr 23, 2014
Test: UTF-8 vs latin-1 regression

Reviewed-by: seancribbs
@joedevivo
Copy link
Contributor Author

@borshop merge

@borshop borshop merged commit 8ef24f7 into develop Apr 23, 2014
@joedevivo joedevivo deleted the bugfix/jd/utf-hate branch April 23, 2014 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants