-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
korean encoding issue #9
Conversation
It does not make sense to add this in the function. |
Yes, that's correct, you need to make sure x is in UTF-8 encoding, that's what the doc of udpipe_annotate indicates. So the second example is how you should do it.
|
@jwijffels Then, how about check Encoding(x)!="UTF-8" then print warnning message include "you make sure Encoding(x) is UTF-8. If not, let try x <- iconv(x, to = "UTF-8") first." |
@jwijffels anyway, can you show me your sessionInfo()? I tried to assign text on windows 10, ubuntu 16.04, Mac 10.13.2. and all os return Encoding(x) is "unknown". |
Checking for 'unknown' encoding is not a good solution as ASCII always gives encoding 'unknown' so that would generate warnings for every call in all European languages, even on CRAN.
Change the locale as follows:
I think we should move these type of discussions to Issues, as the pull requests will give errors on all European Windows machines. |
When I tried to get annotate in korean, text Encoding of result is broken.
I fixed to add code below.
I checked in windows and ubuntu 16.04
windows
R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=Korean_Korea.949 LC_CTYPE=Korean_Korea.949 LC_MONETARY=Korean_Korea.949
[4] LC_NUMERIC=C LC_TIME=Korean_Korea.949
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] udpipe_0.3 RevoUtils_10.0.6 RevoUtilsMath_10.0.1
loaded via a namespace (and not attached):
[1] compiler_3.4.2 Matrix_1.2-11 tools_3.4.2 yaml_2.1.14
[5] Rcpp_0.12.13 grid_3.4.2 data.table_1.10.4-2 lattice_0.20-35
ubuntu
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] udpipe_0.3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.14 lattice_0.20-35 digest_0.6.13 withr_2.1.1
[5] grid_3.4.3 R6_2.2.2 git2r_0.20.0 httr_1.3.1
[9] curl_3.0 data.table_1.10.4-3 Matrix_1.2-12 devtools_1.13.4
[13] tools_3.4.3 yaml_2.1.16 compiler_3.4.3 memoise_1.1.0
[17] knitr_1.17