NOTE "marked UTF-8 string in data" should be addressed #1663

dankelley · 2020-02-24T12:28:02Z

Although I had clean checks prior to CRAN submission, I am seeing at

https://cran.r-project.org/web/checks/check_results_oce.html

that some machines running the R-devel version are producing

Check: data for non-ASCII characters
Result: NOTE
     Note: found 1 marked UTF-8 string

so I want to fix that up.

The text was updated successfully, but these errors were encountered:

dankelley · 2020-02-24T12:29:18Z

Here's how I located the problem. First, I moved all the data/*.rda files to another directory. Then, I started adding files back into data, doing

tools:::.check_package_datasets(".")

after every move. This can be done in a bisection sort of way, to avoid having to do it once per each file.

I found that the culprit is data/xbt.rda.

dankelley · 2020-02-24T12:32:24Z

I suspect the problem is that the file from which data/xbt.rda is created (which is create_data/xbt/xbt.edf) has a degree sign in line 34, quoted next.

Depth (m) - Temperature (°C) - Sound Velocity (m/s)

but, actually, it could also be in the dashes that are in the file, since they can cause problems sometimes.

I will look further into the file, and then I'll fix up the code.

dankelley · 2020-02-24T12:36:20Z

An aside: I am aware that this is similar to #1211 but I am recording my steps here as well.

dankelley · 2020-02-24T13:16:11Z

I'm trying to dredge up some memory of dealing with encodings in R (bad memories!) and below is a scent of the trail

> l<-readLines("create_data/xbt/xbt.edf",encoding="UTF-8")
> bad<-which(is.na(iconv(l,"UTF-8","UTF-8")))
> l[bad+seq(-1,1)]
[1] "//"                                                      
[2] "Depth (m) - Temperature (<U+00B0>C) - Sound Velocity (m/s)"
[3] "4.7 20.91 1575.30"                                       
> 
> l<-readLines("create_data/xbt/xbt.edf",encoding="latin1")
> bad<-which(is.na(iconv(l,"UTF-8","UTF-8")))
> l[bad+seq(-1,1)]
[1] "//"                                                    
[2] "Depth (m) - Temperature (<b0>C) - Sound Velocity (m/s)"
[3] "4.7 20.91 1575.30"                                     
>

dankelley · 2020-02-24T13:19:01Z

Oh, I see. The problem is that we are storing the header, and it contains the funny character

> xbt <- read.xbt("create_data/xbt/xbt.edf")
> which(is.na(iconv(xbt@metadata$header,"UTF-8","UTF-8")))
[1] 34
> xbt@metadata$header[34]
[1] "Depth (m) - Temperature (<U+00B0>C) - Sound Velocity (m/s)"
>

See #1663 for details.

dankelley · 2020-02-24T16:37:49Z

This is done now in "develop" commit ff03b72
and the docs for the function point out how we are altering @metadata$header.

dankelley added the bug label Feb 24, 2020

dankelley self-assigned this Feb 24, 2020

dankelley mentioned this issue Feb 24, 2020

save() used in data creation is messed up #1664

Closed

dankelley added a commit that referenced this issue Feb 24, 2020

update data(xbt) for UTF-8 (issue 1663)

e14be21

See #1663 for details.

dankelley closed this as completed Feb 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NOTE "marked UTF-8 string in data" should be addressed #1663

NOTE "marked UTF-8 string in data" should be addressed #1663

dankelley commented Feb 24, 2020

dankelley commented Feb 24, 2020

dankelley commented Feb 24, 2020

dankelley commented Feb 24, 2020

dankelley commented Feb 24, 2020

dankelley commented Feb 24, 2020

dankelley commented Feb 24, 2020

NOTE "marked UTF-8 string in data" should be addressed #1663

NOTE "marked UTF-8 string in data" should be addressed #1663

Comments

dankelley commented Feb 24, 2020

dankelley commented Feb 24, 2020

dankelley commented Feb 24, 2020

dankelley commented Feb 24, 2020

dankelley commented Feb 24, 2020

dankelley commented Feb 24, 2020

dankelley commented Feb 24, 2020