Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dearseq #1307

Closed
8 tasks done
borishejblum opened this issue Nov 25, 2019 · 26 comments
Closed
8 tasks done

dearseq #1307

borishejblum opened this issue Nov 25, 2019 · 26 comments
Assignees
Labels
3a. accepted will be ingested into Bioconductor daily builder for distribution OK

Comments

@borishejblum
Copy link

Update the following URL to point to the GitHub repository of
the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

  • I understand that by submitting my package to Bioconductor,
    the package source and all review commentary are visible to the
    general public.

  • I have read the Bioconductor Package Submission
    instructions. My package is consistent with the Bioconductor
    Package Guidelines.

  • I understand that a minimum requirement for package acceptance
    is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS.
    Passing these checks does not result in automatic acceptance. The
    package will then undergo a formal review and recommendations for
    acceptance regarding other Bioconductor standards will be addressed.

  • My package addresses statistical or bioinformatic issues related
    to the analysis and comprehension of high throughput genomic data.

  • I am committed to the long-term maintenance of my package. This
    includes monitoring the support site for issues that users may
    have, subscribing to the bioc-devel mailing list to stay aware
    of developments in the Bioconductor community, responding promptly
    to requests for updates from the Core team in response to changes in
    R or underlying software.

I am familiar with the essential aspects of Bioconductor software
management, including:

  • The 'devel' branch for new packages and features.
  • The stable 'release' branch, made available every six
    months, for bug fixes.
  • Bioconductor version control using Git
    (optionally via GitHub).

For help with submitting your package, please subscribe and post questions
to the bioc-devel mailing list.

@bioc-issue-bot
Copy link
Collaborator

Hi @borishejblum

Thanks for submitting your package. We are taking a quick
look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: dearseq
Type: Package
Title: Differential Expression Analysis for RNA-seq data
Version: 0.99.0
Date: 2019-11-21
Authors@R: c(person("Denis", "Agniel", role = c("aut"),
          email = "denis.agniel@gmail.com"),
	           person("Boris P.", "Hejblum", role = c("aut", "cre"),
	                  email = "boris.hejblum@u-bordeaux.fr"),
	           person("Marine", "Gauthier", role = c("aut"),
	                  email = "marine.gauthier@u-bordeaux.fr"))
Depends: R (>= 3.6.0)
Imports: CompQuadForm, ggplot2, GSA, KernSmooth, methods, parallel, pbapply, stats, statmod
Suggests: Biobase, BiocManager, edgeR, DESeq2, GEOquery, knitr, limma, readxl, rmarkdown, 
S4Vectors, SummarizedExperiment, testthat, covr
Description: Differential Expression Analysis RNA-seq data with variance 
   component score test accounting for data heteroscedasticity through
   precision weights. Perform both gene-wise and gene set analyses, 
   and can deal with repeated or longitudinal data. Methods are 
   detailed in: Agniel D & Hejblum BP (2017) Variance component score 
   test for time-course gene set analysis of longitudinal RNA-seq 
   data, Biostatistics, 18(4):589-604.
LazyData: true
License: GPL-2 | file LICENSE
biocViews: BiomedicalInformatics, CellBiology, DifferentialExpression, DNASeq,
        GeneExpression, Genetics, GeneSetEnrichment, ImmunoOncology, Regression, 
        RNASeq, Sequencing, SystemsBiology, TimeCourse, Transcription, 
        Transcriptomics
BugReports: https://github.com/borishejblum/dearseq/issues
Encoding: UTF-8
RoxygenNote: 7.0.0
VignetteBuilder: knitr

Add SSH keys to your GitHub account. SSH keys
will are used to control access to accepted Bioconductor
packages. See these instructions to add SSH keys to
your GitHub account.

@bioc-issue-bot
Copy link
Collaborator

A reviewer has been assigned to your package. Learn what to expect
during the review process.

IMPORTANT: Please read the instructions for setting
up a push hook on your repository, or further changes to your
repository will NOT trigger a new build.

@bioc-issue-bot bioc-issue-bot added 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place and removed 1. awaiting moderation labels Nov 25, 2019
@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

@mtmorgan
Copy link
Contributor

As preliminary comment, please clean the root level of your repository, during the Bioconductor submission process, to contain only files and folders required of a Bioconductor package. For instance, you could copy the current branch to 'full' and then remove files etc from master to bring the repository into conformance. When your package is accepted it will be cloned into the bioconductor git repository, and we do not want to include material not relevant to the nightly build of your package.

Please ensure that your package interoperates smoothly with standard Bioconductor representations. For instance, dear_seq() must accept a SummarizedExperiment, and this is the preferred way to demonstrate it's functionality in your vignette, e.g.,

se <- SummarizedExperiment(assay = London, colData = ...)
## further data manipulation, e.g., coordinated subsetting `se[, ...]`
dear_seq(se, ...)

@borishejblum
Copy link
Author

Thank you very much for this preliminary feedback. We have created such a 'full' branch and removed the unecessary files from the 'master' branch. In addition, we have added support for SummarizedExperiment objects, and now demonstrate the package functionnality in the vignette using this input class.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

60ad1b4 bumping version for Bioconductor check

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

3e8dba5 attempt at fixing warning on BiocCheck from doc th...

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

09d34aa 2nd attempt at fixing warning on BiocCheck from do...

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

520e7fa forgot to BUMP version

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details.

@mtmorgan
Copy link
Contributor

mtmorgan commented Dec 9, 2019

9 December, 2019

  • Specific comments should be addressed in all locations where they apply.

DESCRIPTION, NAMESPACE

  • Provide a title that differentiates your package from others

vignettes

  • dearseqUserguide.Rmd:72 vignettes should not install packages (i.e., this code chunk should be marked eval = FALSE); the build system installs packages listed in the Suggests: field.

  • dearseqUserguide.RMd:103 very long lines in vignette code chunks makes them difficult to read. Revise as

    group[which(infos_df$Status[-group_London_LTBI]=="ActiveTB")] <-
        rep(1,length(which(infos_df$Status[-group_London_LTBI]=="ActiveTB")))
    

    or (much better!!!)

    idx <- infos_df$Status[-group_London_LTBI] == "ActiveTB"
    group[idx] <- 1
    

    please pay attention to readaiblity throughout your code, avoiding the long lines involving multiple computational transformations and repetitive code illustrated in this short code chunk. Please heavily revise your vignette and code, along the lines indicated above, to improve readaiblity, robustness, and efficiency.

  • dearseqUserguide.RMd:118: London is not included in the package

    > library(dearseq)
    > data("London")
    Warning message:
    In data("London") : data set ‘London’ not found
    
  • dearseqUserguide.Rmd: 153 rather than having to insert a comment to 'Remember the Latent TB samples are excluded' consider making a single SummarizedExperiment with all the data, and then subsetting se[, !se$Status %in% "LTBI"] or similar.

  • dearseqUserguide.Rmd:191 Consider using a standard gene set formulation like BiocSet to represent these data. I don't understand why this code chunk is marked 'eval = FALSE', the problem with unevaluated code chunks is that the chunk becomes only a 'picture of code', is inevitably not maintained when, e.g., arguments names are changed, and then presents the user with a misleading and frustrating understanding of how the code is supposed to work.

    baduel =
        do.call(BiocSet, baduel_gmt$genesets) %>%
        mutate_set(description = unlist(baduel_gmt$geneset.description))
    

R / man

  • generally seems to be ok and well-documented

  • PBT_gmt format is reported as a gmt object but it is really just a list with three, potentially redundant, elements (e.g., the names of baduel_gmt gene sets are provided twice, causing confusion and potential for error). Describe the content of these lists more carefully or use a formal object (e.g., from BiocSet) to reduce ambiguity.

  • dgsa_seq.R:335 adding !!! to warnings just leads to a kind of arms race; if it's a warning, let the user know. If it's more important than a warning, make it an error.

  • dgsa_seq.R:50 consider more efficient rowVars() etc., including from the matrixStats package

  • dgsa_seq.R:366 it is almost never necessary to explicilty invoke rm(); if memory management is really important then a better strategy is to write modular code that, more-or-less as a convenient conseuqnce, isolates memory allocations to the body of the function, and can be garbage collected automatically

  • dgsa_seq.R:417 options() belong to users and not to package authors. This code

    options(warn = -1)
    N_possible_perms <- factorial(ncol(y_lcpm))
    options(warn = 0)
    

    changes the user option warn to 0, regardless of it's initial value. A better style is

    oopt <- options(warn = -1)
    N_possible_perms <- factorial(ncol(y_lcpm))
    options(oopt)
    

    as seen with

    > options(warn = 2)
    > oopt = options(warn = -1)
    > options(oopt)
    > getOption("warn")
    [1] 2
    

    but a first question is why one has to change this option instead of using suppressWarnings() and perhaps more importantly can the data be conditioned to avoid warning entirely? For instance, in the above, I'm not understanding why a warning is expected at all?

  • dgsa_seq.R:546 preferred style is to use tryCatch() rather than try() and if (inherits...)

  • sp_weights.R:112 usually warning(paste(...)) (similarly, stop(), message()) can be replaced with simply warning(...).

  • adding doPlot conflates the computational and the visualization aspects; it would be better to have a separate, stand-alone, function for plotting.

other

  • the tests/testthat/test_dear_seq.R serves no purpose and should be removed
  • for the purposes of review, please make sure that the master branch contains only material related to the package, e.g., the GSE107991 is not part of the package.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

8b17258 update following bioconductor review

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

d85f2e6 version bump

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

57a5509 proper link to BiocSet + appveyor bioc update

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details.

@borishejblum
Copy link
Author

11 December, 2019

First of all, thank you so much @mtmorgan for having spent time in reviewing our package and for your feedback !

Below are my specific responses to your comments.

9 December, 2019

  • Specific comments should be addressed in all locations where they apply.

DESCRIPTION, NAMESPACE

  • Provide a title that differentiates your package from others

The title has been changed to:
"Differential Expression Analysis for RNA-seq data through a robust variance component test"

vignettes

  • dearseqUserguide.Rmd:72 vignettes should not install packages (i.e., this code chunk should be marked eval = FALSE); the build system installs packages listed in the Suggests: field.

This chunk has been removed from the vignette.

  • dearseqUserguide.RMd:103 very long lines in vignette code chunks makes them difficult to read. Revise as
    group[which(infos_df$Status[-group_London_LTBI]=="ActiveTB")] <-
        rep(1,length(which(infos_df$Status[-group_London_LTBI]=="ActiveTB")))
    
    or (much better!!!)
    idx <- infos_df$Status[-group_London_LTBI] == "ActiveTB"
    group[idx] <- 1
    
    please pay attention to readability throughout your code, avoiding the long lines involving multiple computational transformations and repetitive code illustrated in this short code chunk. Please heavily revise your vignette and code, along the lines indicated above, to improve readability, robustness, and efficiency.

The code has been revised to avoid such long lines. In this particular example, having a single Summarized Experiment (as suggested below) makes subsetting much clearer.

  • dearseqUserguide.RMd:118: London is not included in the package
    > library(dearseq)
    > data("London")
    Warning message:
    In data("London") : data set ‘London’ not found
    

This chunk has been fixed (indeed London data set is directly downloaded from GEO, and not included in the package).

  • dearseqUserguide.Rmd: 153 rather than having to insert a comment to 'Remember the Latent TB samples are excluded' consider making a single SummarizedExperiment with all the data, and then subsetting se[, !se$Status %in% "LTBI"] or similar.

A single Summarized Experiment with all the data has been created, and is subsetted as advised.

  • dearseqUserguide.Rmd:191 Consider using a standard gene set formulation like BiocSet to represent these data. I don't understand why this code chunk is marked 'eval = FALSE', the problem with unevaluated code chunks is that the chunk becomes only a 'picture of code', is inevitably not maintained when, e.g., arguments names are changed, and then presents the user with a misleading and frustrating understanding of how the code is supposed to work.
    baduel =
        do.call(BiocSet, baduel_gmt$genesets) %>%
        mutate_set(description = unlist(baduel_gmt$geneset.description))
    

You are right, and this code chunk is now marked eval = TRUE.
Regarding the gene set object format, the gmt format seem to be quite well established as it is used by the Molecular Signature Database (http://software.broadinstitute.org/gsea/msigdb/index.jsp).
I have made the dgsa_seq() function compatible with BiocSet format inputs, however, such an input would add the following code lines to the vignettes:

baduel <- do.call(BiocSet, baduel_gmt$genesets) %>%
      mutate_set(description = unlist(baduel_gmt$geneset.description))
baduel_2gs <- baduel %>%
  filter_elementset(set %in% c("VR_TBG", "DE_KAvsTBG"))

I am not sure this improves the readability of the code, and I would therefore be enclined to stick with the gmt format given it is widely used as well.

R / man

  • generally seems to be ok and well-documented
  • PBT_gmt format is reported as a gmt object but it is really just a list with three, potentially redundant, elements (e.g., the names of baduel_gmt gene sets are provided twice, causing confusion and potential for error). Describe the content of these lists more carefully or use a formal object (e.g., from BiocSet) to reduce ambiguity.

The gmt format is now extensively documented inside the dearseq package (it is a format used by the Molecular Signature Database for instance).

  • dgsa_seq.R:335 adding !!! to warnings just leads to a kind of arms race; if it's a warning, let the user know. If it's more important than a warning, make it an error.

Those multiple ! have been removed as they are indeed unnecessary and misleading.

  • dgsa_seq.R:50 consider more efficient rowVars() etc., including from the matrixStats package

matrixStats::rowVars and matrixStats::colVars are indeed much more efficient and are now being used in place of apply(..., FUN = var).

  • dgsa_seq.R:366 it is almost never necessary to explicitly invoke rm(); if memory management is really important then a better strategy is to write modular code that, more-or-less as a convenient consequence, isolates memory allocations to the body of the function, and can be garbage collected automatically

All such (unnecessary) calls to rm() have been removed.

  • dgsa_seq.R:417 options() belong to users and not to package authors. This code
    options(warn = -1)
    N_possible_perms <- factorial(ncol(y_lcpm))
    options(warn = 0)
    
    changes the user option warn to 0, regardless of it's initial value. A better style is
    oopt <- options(warn = -1)
    N_possible_perms <- factorial(ncol(y_lcpm))
    options(oopt)
    
    as seen with
    > options(warn = 2)
    > oopt = options(warn = -1)
    > options(oopt)
    > getOption("warn")
    [1] 2
    
    but a first question is why one has to change this option instead of using suppressWarnings() and perhaps more importantly can the data be conditioned to avoid warning entirely? For instance, in the above, I'm not understanding why a warning is expected at all?

Indeed, these Warnings suppressions have been removed as only strictly positive integers should be supplied to factorial() in this code.

  • dgsa_seq.R:546 preferred style is to use tryCatch() rather than try() and if (inherits...)

try() + if (inherits...) have been replaced by tryCatch()

  • sp_weights.R:112 usually warning(paste(...)) (similarly, stop(), message()) can be replaced with simply warning(...).

The unnecessary paste()s have been removed

  • adding doPlot conflates the computational and the visualization aspects; it would be better to have a separate, stand-alone, function for plotting.

a standalone plotting function plot_weights() has been added, and the doPlot argument has been removed to better separate computation from visualization.

other

  • the tests/testthat/test_dear_seq.R serves no purpose and should be removed

This empty file has been removed.

  • for the purposes of review, please make sure that the master branch contains only material related to the package, e.g., the GSE107991 is not part of the package.

The master branch has been stripped of any material not directly related to the package.

Thanks again for this review,
Boris

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

71d1383 usual version bump forgetting

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details.

@borishejblum
Copy link
Author

borishejblum commented Jan 9, 2020

Dear @mtmorgan,

I have made the solicited changes to the dearseq package (please see my comment above).

Let me know If there remain things that would need to be fixed.

Thanks a lot for you time reviewing our package,
Boris

@mtmorgan mtmorgan added 3a. accepted will be ingested into Bioconductor daily builder for distribution and removed 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place labels Jan 15, 2020
@bioc-issue-bot
Copy link
Collaborator

Your package has been accepted. It will be added to the
Bioconductor Git repository and nightly builds. Additional
information will be posed to this issue in the next several
days.

Thank you for contributing to Bioconductor!

@mtmorgan
Copy link
Contributor

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/borishejblum.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/
https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("dearseq"). The package 'landing page' will be created at

https://bioconductor.org/packages/dearseq

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3a. accepted will be ingested into Bioconductor daily builder for distribution OK
Projects
None yet
Development

No branches or pull requests

3 participants