Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read.argo() cannot trim leading whitespace #2206

Closed
dankelley opened this issue Apr 26, 2024 · 9 comments
Closed

read.argo() cannot trim leading whitespace #2206

dankelley opened this issue Apr 26, 2024 · 9 comments
Assignees

Comments

@dankelley
Copy link
Owner

I'm seeing this in some argo work. We get a warning, so I suppose that may be all we need, but my plan is to look more carefully at those files, to see why this happens, and whether there is a reasonable workaround or better warning message.

Below are a few instances. Notice that this is one particular Argo float.

/Users/kelley/data/argo/argo_summer_project/D4900883_003.nc
Warning in read.argo(file, profile = 1) :
  cannot trim leading/trailing whitespace in metadata$scientificCalibCoefficient
/Users/kelley/data/argo/argo_summer_project/D4900883_004.nc
Warning in read.argo(file, profile = 1) :
  cannot trim leading/trailing whitespace in metadata$scientificCalibCoefficient
/Users/kelley/data/argo/argo_summer_project/D4900883_005.nc
Warning in read.argo(file, profile = 1) :
  cannot trim leading/trailing whitespace in metadata$scientificCalibCoefficient
/Users/kelley/data/argo/argo_summer_project/D4900883_006.nc
Warning in read.argo(file, profile = 1) :
  cannot trim leading/trailing whitespace in metadata$scientificCalibCoefficient
@dankelley
Copy link
Owner Author

Below is what I get, for the first problematic file (attached). The problem is that read.argo() is assuming this is a single character value, but it is a matrix of character values. Notably, this only comes up in a few of the 1500 or so argo files I'm looking at today, so I don't think there is much point in trying to make the existing code work. All I'm doing is trimming whitespace at the start and end of a character value, because that helps with later steps.

, , 1

     [,1]                                                                                                                                                                                                                                                              
[1,] "                                                                                                                                                                                                                                                                "
[2,] "                                                                                                                                                                                                                                                                "
[3,] "No significant salinity drift detected; r=1.000000                                                                                                                                                                                                              "
     [,2]                                                                                                                                                                                                                                                              
[1,] "                                                                                                                                                                                                                                                                "
[2,] "                                                                                                                                                                                                                                                                "
[3,] "COEFFICIENT r FOR CONDUCTIVITY IS 1.000347, +/- 0.0003095027                                                                                                                                                                                                    "
     [,3]                                                                                                                                                                                                                                                                 
[1,] "ADDITIVE COEFFICIENT FOR PRESSURE ADJUSTMENT IS 0db                                                                                                                                                                                                             "   
[2,] "                                                                                                                                                                                                                                                                "   
[3,] "r=1.000401, \xb1 2.177525e-005                                                                                                                                                                                                                                     "

D4900883_003.nc.gz

@dankelley
Copy link
Owner Author

Oh, hang on. Maybe the problem is that \xb1 character. After all that's what the warning is about: a bad character. Duh.

@dankelley
Copy link
Owner Author

The 0xb1 character means plus-or-minus. But I don't really know what that last entry is. The r cannot be a correlation coefficient (it's in excess of 1) so I suppose this means that some parameter named r is 1.00401 plus or minus 2e-5.

I'll look into whether there is a way to make trimws() handle a string with whatever encoding this is. Maybe there's a way I can get the function to work quietly.

@dankelley
Copy link
Owner Author

Maybe I should just do as follows, because then I can employ the useBytes parameter. (Here, value is the character value being converted)

gsub("^[ \t\r\n](.*)[ \t\r\n]$", "\\1", value, useBytes=TRUE)

@dankelley
Copy link
Owner Author

At https://blog.r-project.org/2022/07/12/speedups-in-operations-with-regular-expressions/ I see that the advice is to avoid useBytes.

@dankelley
Copy link
Owner Author

Or, I can do as follows. This avoids the use of useBytes, and I think I'd like to do that. But might I be wrecking some strings? I'll change the code to do as below, and run my approx. 1500 files through it to see if problems show up.

 a <- "r=1.000401, \xb1 2.177525e-005"
> trimws(a)
Error in sub(re, "", x, perl = TRUE) : input string 1 is invalid UTF-8

> b<-iconv(a,from="latin1", to="UTF-8")
> b
[1] "r=1.000401, ± 2.177525e-005"

> trimws(b)
[1] "r=1.000401, ± 2.177525e-005"

@dankelley
Copy link
Owner Author

Seems OK on this test case. Note that it is displaying the +- properly, but that's not my concern -- my concern is whether it fails. I'd be interested to know whether this is a one-off problem with this float, or whether other Argo Canada files have this property. In any case, I'm going to run my approx. 1500 argo files now, and if they seem ok (i.e. no new problems) I'll do more local checks and then push to GH.

> library(oce)
> f<-"/Users/kelley/data/argo/argo_summer_project/D4900883_003.nc"
> d<-read.oce(f)
> d@metadata$scientificCalibCoefficient
, , 1

     [,1]                                                
[1,] ""                                                  
[2,] ""                                                  
[3,] "No significant salinity drift detected; r=1.000000"
     [,2]                                                          
[1,] ""                                                            
[2,] ""                                                            
[3,] "COEFFICIENT r FOR CONDUCTIVITY IS 1.000347, +/- 0.0003095027"
     [,3]                                                 
[1,] "ADDITIVE COEFFICIENT FOR PRESSURE ADJUSTMENT IS 0db"
[2,] ""                                                   
[3,] "r=1.000401, ± 2.177525e-005" 

@dankelley
Copy link
Owner Author

All my local tests passed, and also the R-CMD action worked. I'll start the R-hub action now. That takes maybe 30 minutes to an hour so I'll come back to this later, to close it.

@dankelley
Copy link
Owner Author

The r-hub completed quickly! Maybe that's because it failed on the macos and windows machines. But the failure is because those machines cannot build ncdf4. By contrast, the linux test machine went through OK.

I guess I'll think twice before wasting electricity on r-hub builds. Their within-R system got so flakey that I gave up on it. I was hoping this gh-action system would be better, but ... maybe not so much.

Closing time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant