New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot Train POS with Locale Other Than English #45

Open
schrieveslaach opened this Issue Jun 4, 2018 · 2 comments

Comments

Projects
None yet
2 participants
@schrieveslaach
Copy link

schrieveslaach commented Jun 4, 2018

If I train the arktweet POS tagger with e.g. German locale (cf. train method), the training process fails because it generates a file containing decimals with German formatting. For example, numbers like 0.2 are formatted as 0,2 (German notation) and the trainer component fails to load this file because of the comma.

@brendano

This comment has been minimized.

Copy link
Owner

brendano commented Dec 26, 2018

Hm. Maybe the locale has to be set as a java flag or property? Or it would be better to use non-locale-dependent, floating point number parsing code. I have no idea how to do that in java.

@schrieveslaach

This comment has been minimized.

Copy link

schrieveslaach commented Dec 29, 2018

I agree to use a locale independent format. For example, you could write number in English format.

public class I18NTester {
   public static void main(String[] args) {
      String pattern = "###.##";
      double number = 123.45;

      Locale enlocale  = new Locale("en", "US");
   
      DecimalFormat decimalFormat = (DecimalFormat) NumberFormat.getNumberInstance(enlocale);
      decimalFormat.applyPattern(pattern);

      System.out.println(decimalFormat.format(number));
   
      }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment