You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I may have found a bug that was introduced in version 0.8.6 (last version on CRAN at the time of writing). Using special characters generates the following error:
library(udpipe)
library(tm)
# Text datatextData<-data.frame(
doc_id=1,
text="tradução"
)
# Download and load modeludModel<- udpipe_download_model(language="portuguese-gsd",
model_dir= getwd())
udModel<- udpipe_load_model('portuguese-gsd-ud-2.5-191206.udpipe')
# Make a corpus textCorp<- VCorpus(DataframeSource(textData))
text<- lapply(textCorp, content)
text<-data.frame(doc_id=1:nrow(textData),
text= unlist(text))
udpipe(text, object=udModel)
The error is generated by the letters "çã" in the text (removing them makes the error disappear). Also, I think this error is generated by the following line in the source code:
Removing fixed = TRUE in the line above removes the error. In case it helps, fixed = TRUE was introduced in c7557b6.
Session info
- Session info ---------------------------------------------------------
setting value
version R version 4.1.0 (2021-05-18)
os Windows 10 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate French_France.1252
ctype French_France.1252
tz Europe/Paris
date 2021-10-18
Packages -------------------------------------------------------------
package * version date lib source
cli 3.0.1 2021-07-17 [1] CRAN (R 4.1.1)
data.table 1.14.2 2021-09-27 [1] standard (@1.14.2)
lattice 0.20-45 2021-09-22 [1] CRAN (R 4.1.1)
Matrix 1.3-4 2021-06-01 [1] CRAN (R 4.1.0)
NLP * 0.2-1 2020-10-14 [1] standard (@0.2-1)
Rcpp 1.0.7 2021-07-07 [1] standard (@1.0.7)
rstudioapi 0.13 2020-11-12 [1] standard (@0.13)
sessioninfo 1.1.1 2018-11-05 [1] standard (@1.1.1)
slam 0.1-48 2020-12-03 [1] standard (@0.1-48)
tm * 0.7-8 2020-11-18 [1] standard (@0.7-8)
udpipe * 0.8.6 2021-06-01 [1] standard (@0.8.6)
withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0)
xml2 1.3.2 2020-04-23 [1] CRAN (R 4.1.0)
[1] C:/Users/etienne/Documents/R/R-4.1.0/library
Best,
The text was updated successfully, but these errors were encountered:
Hello,
I may have found a bug that was introduced in version 0.8.6 (last version on CRAN at the time of writing). Using special characters generates the following error:
The error is generated by the letters "çã" in the text (removing them makes the error disappear). Also, I think this error is generated by the following line in the source code:
udpipe/R/udpipe_parse.R
Line 254 in fdcc4cc
Removing
fixed = TRUE
in the line above removes the error. In case it helps,fixed = TRUE
was introduced in c7557b6.Session info
package * version date lib source
cli 3.0.1 2021-07-17 [1] CRAN (R 4.1.1)
data.table 1.14.2 2021-09-27 [1] standard (@1.14.2)
lattice 0.20-45 2021-09-22 [1] CRAN (R 4.1.1)
Matrix 1.3-4 2021-06-01 [1] CRAN (R 4.1.0)
NLP * 0.2-1 2020-10-14 [1] standard (@0.2-1)
Rcpp 1.0.7 2021-07-07 [1] standard (@1.0.7)
rstudioapi 0.13 2020-11-12 [1] standard (@0.13)
sessioninfo 1.1.1 2018-11-05 [1] standard (@1.1.1)
slam 0.1-48 2020-12-03 [1] standard (@0.1-48)
tm * 0.7-8 2020-11-18 [1] standard (@0.7-8)
udpipe * 0.8.6 2021-06-01 [1] standard (@0.8.6)
withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0)
xml2 1.3.2 2020-04-23 [1] CRAN (R 4.1.0)
[1] C:/Users/etienne/Documents/R/R-4.1.0/library
Best,
The text was updated successfully, but these errors were encountered: