Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readStandardGenotypes fails on recent data.table #7

Closed
MichaelChirico opened this issue Apr 23, 2018 · 5 comments
Closed

readStandardGenotypes fails on recent data.table #7

MichaelChirico opened this issue Apr 23, 2018 · 5 comments

Comments

@MichaelChirico
Copy link

@MichaelChirico MichaelChirico commented Apr 23, 2018

Hello, data.table is pushing for a release to CRAN (1.11.0) and we realized your package would be broken by recent updates. In particular:

install.packages('data.table', type = 'source', repos = 'http://Rdatatable.github.io/data.table')
pkg = 'PhenotypeSimulator'
install.packages(pkg, dependencies = TRUE)
library(pkg, character.only = TRUE)
example(readStandardGenotypes)
# ...

Error in strsplit(data$V2, split = "") : non-character argument

In particular this is because data comes from fread and V2 no longer exists. I took a glance at the file and couldn't figure out why at a glance, but hopefully you can from a check on the NEWS. There have been a lot of changes (improvements, we hope!) to fread since the last update; if you believe the new behavior is in error, please file an issue.

@MichaelChirico
Copy link
Author

@MichaelChirico MichaelChirico commented Apr 23, 2018

This may be a regression of data.table; your sample file is being used as a test case; updates here: Rdatatable/data.table#2786

@mattdowle
Copy link

@mattdowle mattdowle commented Apr 29, 2018

Dear Hannah,
A fix was needed to data.table dev to cope with the file in your package, which is now done (Rdatatable/data.table#2808) but it's still a breaking a change for your package. Sorry for the inconvenience.
When data.table v1.11.0 goes to CRAN, the breakage will be :

* checking examples ... ERROR
Running examples in ‘PhenotypeSimulator-Ex.R’ failed
The error most likely occurred in:

> ### Name: readStandardGenotypes
> ### Title: Read genotypes from file.
> ### Aliases: readStandardGenotypes
> 
> ### ** Examples
> 
> # Genome format
> filename_genome  <- system.file("extdata/genotypes/genome/",
+ "genotypes_genome.txt",
+ package = "PhenotypeSimulator") 
> data_genome <- readStandardGenotypes(N=100, filename_genome, format ="genome")
Warning in data.table::fread(filename, skip = "Samples:", data.table = FALSE,  :
  Detected 1 column names but the data has 2 columns (i.e. invalid file). Added 1 extra
  default column name for the first column which is guessed to be row names or an index.
  Use setnames() afterwards if this guess is not correct, or fix the file write command
  that created the file to create a valid file.
Error in strsplit(data$V2, split = "") : non-character argument
Calls: readStandardGenotypes -> matrix -> unlist -> strsplit
Execution halted

This is because the fread command is being passing skip="Samples:" to land on that line which contains that single string (Samples:) just before the 2 columns of POP1: data.

The NEWS item affecting that is :

Too few column names are now auto filled with default column names, with warning, #1625. If there is just one missing column name it is guessed to be for the first column (row names or an index), otherwise the column names are filled at the end. Similarly, too many column names now automatically sets fill=TRUE, with warning.

What happens is the "Samples:" is treated a column name and budged over to the right, with V1 becoming the automatic column name for the first column. This is intended new behaviour but it is a change from before.

Once v1.11.0 is on CRAN, please could you change that line to :

fread( filename, skip="POP1:", sep=" ", colClasses="character", header=FALSE)

I've copied a subset of your file to our test suite and added the following tests :

# skip= is now consistent as if the file started on that line.
+# Found via rev dep checking (package PhenotypeSimulator), #2786. It is still a breaking change that PhenotypeSimulator will need to accomodate please.
+test(1909.1, names(ans<-fread(testDir("genotypes_genome.txt"), skip="Samples:", sep=" ", colClasses="character")),
+             c("V1","Samples:"),
+             warning="Detected 1 column name.*but the data has 2 columns.*Added 1 extra default column name for the first column")
+test(1909.2, ans$V1, c("POP1:","POP1:","POP1:"))
+test(1909.3, nchar(ans[["Samples:"]]), INT(3287,3287,3287))
+test(1909.4, names(ans<-fread(testDir("genotypes_genome.txt"), skip="POP1:", sep=" ", colClasses="character", header=FALSE)),
+             c("V1","V2"))
+test(1909.5, ans$V1, c("POP1:","POP1:","POP1:"))
+test(1909.6, nchar(ans$V2), INT(3287,3287,3287))
@HannahVMeyer
Copy link
Owner

@HannahVMeyer HannahVMeyer commented Apr 30, 2018

Dear Matt,

thanks for the heads up. I have changed the line in question and will update PhenotypeSimulator on CRAN as soon as data.table v1.11.0 is on CRAN. Could you let me know when this happened?

Thanks!

@mattdowle
Copy link

@mattdowle mattdowle commented May 11, 2018

Dear Hannah,
Great thanks! 1.11.2 is on CRAN now.
Best, Matt

@HannahVMeyer
Copy link
Owner

@HannahVMeyer HannahVMeyer commented May 13, 2018

PhenotypeSimulator v0.2.2 on CRAN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.