Skip to content

Internationalization (I18N, UTF 8 support)

buttermilk-crypto edited this page Jul 4, 2016 · 10 revisions

A Worked Example

Based on YouTube clicks, direct UTF-8 support in properties files is the most searched for feature related to Bracket Properties. So here are some worked examples of the kinds of things you can do with version 2.0.

In the earlier work (1.3.6 and below) I tried to make .properties files function as UTF-8 encoded files. After reflection (well, 5 years) I tend to think it is best to leave .properties to mean the ASCII-encoded and unicode escaped format it always was. We can better support internationalization by using explicitly UTF-8 encoded files where those are desired:

file: app_ja.properties
email=E\u30e1\u30fc\u30eb
userid=\u30e6\u30fc\u30b6\u30fcID
password=\u30d1\u30b9\u30ef\u30fc\u30c9
login=\u30ed\u30b0\u30a4\u30f3

file: app_ja.utf8
email=Eメール
userid=ユーザーID
password=パスワード
login=ログイン

It is possible to detect dynamically what's in a file, but that was too much work. It is better to manage this yourself directly. The code to load the above is simple and you can have complete control over the above in terms of encodings:

File propsFile = new File("app_ja.properties");
InputAdapter in = new InputAdapter();
in.readFile(propsFile, StandardCharsets.ISO_8859_1); // use ASCII for the legacy encoding
System.out.println(in.props.toString());

This will output:

 {email=E\u30e1\u30fc\u30eb, userid=\u30e6\u30fc\u30b6\u30fcID, password=\u30d1\u30b9\u30ef\u30fc\u30c9, login=\u30ed\u30b0\u30a4\u30f3}

Now you can explicitly convert this to native encoding for use:

// clone to native (UTF-8)
Properties nativeEncoded = in.props.asciiToNative();
System.err.println(nativeEncoded.toString());

Output:

{email=Eメール, userid=ユーザーID, password=パスワード, login=ログイン}

The reverse process works just as well with a UTF-8 encoded file (do not use a BOM):

File propsFile = new File("app_ja.utf8");
InputAdapter in = new InputAdapter();
in.readFile(propsFile, StandardCharsets.UTF-8); // use native java (international) encoding
System.out.println(in.props.toString());

Output is:

{email=Eメール, userid=ユーザーID, password=パスワード, login=ログイン}

You can also make legacy tooling such as ResourceBundles work by converting native encoding into ASCII:

 // clone to ASCII (unicode escapes)
Properties nativeEncoded = in.props.asciiToNative();
System.err.println(nativeEncoded.toString());

Output is:

{email=E\u30e1\u30fc\u30eb, userid=\u30e6\u30fc\u30b6\u30fcID, password=\u30d1\u30b9\u30ef\u30fc\u30c9, login=\u30ed\u30b0\u30a4\u30f3}

or, you can use the OutputFormat API to output native java encoding into ASCII:

StringWriter writer = new StringWriter();
try {
    new OutputAdapter(in.props).writeTo(writer, new AsciiOutputFormat());
} catch (IOException e) {
    e.printStackTrace();
}
System.out.println(writer.toString());

Output is:

#;; charset=ISO-8859-1
#;; generated=2016-07-04T15:40:12.917+1000

email=E\u30e1\u30fc\u30eb
userid=\u30e6\u30fc\u30b6\u30fcID
password=\u30d1\u30b9\u30ef\u30fc\u30c9
login=\u30ed\u30b0\u30a4\u30f3

#;; eof

Regarding what file extension to use, there is nothing to stop you from using .properties with UTF-8 encoded files. It is just a naming convention and the right thing to do depends on who or what else might be consuming the files.

The ResourceFinder Class

ResourceFinder is the Bracket approach to what the ResourceBundle mechanism does. Here's an example:

// legacy ResourceBundle encoding approach using classpath-embedded resource files
ResourceFinder rf = new ResourceFinder("ibmresbundle/app", Locale.JAPANESE, ".properties");
Properties props = rf.locate();
String email = props.get("email"); // still in unicode escape format
Assert.assertEquals("E\\u30e1\\u30fc\\u30eb", email);
	
//to make them usable, encode Properties to UTF-8 with asciiToNative()
Properties nativeImpl = props.asciiToNative(); // makes a clone in native format
email = nativeImpl.get("email");
Assert.assertEquals("Eメール", email);

Using legacy files requires an encoding conversion as an explicit step. However, using UTF-8 encoded files does not:

ResourceFinder rf = new ResourceFinder("ibmresbundle/app", Locale.JAPANESE, ".utf8");
Properties props = rf.locate();
String email = props.get("email"); // UTF-8 encoded
Assert.assertEquals("Eメール", email);

rf = new ResourceFinder("ibmresbundle/app", Locale.KOREAN, ".utf8");
props = rf.locate();
email = props.get("email"); // UTF-8 encoded
Assert.assertEquals("이메일", email);

This allows you to avoid the tedious unicode escape conversion step which would normally be required.