Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NOTE "marked UTF-8 string in data" should be addressed #1663

Closed
dankelley opened this issue Feb 24, 2020 · 6 comments
Closed

NOTE "marked UTF-8 string in data" should be addressed #1663

dankelley opened this issue Feb 24, 2020 · 6 comments
Assignees
Labels

Comments

@dankelley
Copy link
Owner

Although I had clean checks prior to CRAN submission, I am seeing at

https://cran.r-project.org/web/checks/check_results_oce.html

that some machines running the R-devel version are producing

Check: data for non-ASCII characters
Result: NOTE
     Note: found 1 marked UTF-8 string 

so I want to fix that up.

@dankelley dankelley added the bug label Feb 24, 2020
@dankelley dankelley self-assigned this Feb 24, 2020
@dankelley
Copy link
Owner Author

Here's how I located the problem. First, I moved all the data/*.rda files to another directory. Then, I started adding files back into data, doing

tools:::.check_package_datasets(".")

after every move. This can be done in a bisection sort of way, to avoid having to do it once per each file.

I found that the culprit is data/xbt.rda.

@dankelley
Copy link
Owner Author

I suspect the problem is that the file from which data/xbt.rda is created (which is create_data/xbt/xbt.edf) has a degree sign in line 34, quoted next.

Depth (m) - Temperature (°C) - Sound Velocity (m/s)

but, actually, it could also be in the dashes that are in the file, since they can cause problems sometimes.

I will look further into the file, and then I'll fix up the code.

@dankelley
Copy link
Owner Author

An aside: I am aware that this is similar to #1211 but I am recording my steps here as well.

@dankelley
Copy link
Owner Author

I'm trying to dredge up some memory of dealing with encodings in R (bad memories!) and below is a scent of the trail

> l<-readLines("create_data/xbt/xbt.edf",encoding="UTF-8")
> bad<-which(is.na(iconv(l,"UTF-8","UTF-8")))
> l[bad+seq(-1,1)]
[1] "//"                                                      
[2] "Depth (m) - Temperature (<U+00B0>C) - Sound Velocity (m/s)"
[3] "4.7 20.91 1575.30"                                       
> 
> l<-readLines("create_data/xbt/xbt.edf",encoding="latin1")
> bad<-which(is.na(iconv(l,"UTF-8","UTF-8")))
> l[bad+seq(-1,1)]
[1] "//"                                                    
[2] "Depth (m) - Temperature (<b0>C) - Sound Velocity (m/s)"
[3] "4.7 20.91 1575.30"                                     
> 

@dankelley
Copy link
Owner Author

Oh, I see. The problem is that we are storing the header, and it contains the funny character

> xbt <- read.xbt("create_data/xbt/xbt.edf")
> which(is.na(iconv(xbt@metadata$header,"UTF-8","UTF-8")))
[1] 34
> xbt@metadata$header[34]
[1] "Depth (m) - Temperature (<U+00B0>C) - Sound Velocity (m/s)"
> 

dankelley added a commit that referenced this issue Feb 24, 2020
@dankelley
Copy link
Owner Author

This is done now in "develop" commit ff03b72
and the docs for the function point out how we are altering @metadata$header.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant