Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparrow #2342

Closed
10 tasks done
lianos opened this issue Sep 29, 2021 · 23 comments
Closed
10 tasks done

sparrow #2342

lianos opened this issue Sep 29, 2021 · 23 comments
Assignees
Labels
3a. accepted will be ingested into Bioconductor daily builder for distribution WARNINGS

Comments

@lianos
Copy link

lianos commented Sep 29, 2021

Update the following URL to point to the GitHub repository of
the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

  • I understand that by submitting my package to Bioconductor,
    the package source and all review commentary are visible to the
    general public.

  • I have read the Bioconductor Package Submission
    instructions. My package is consistent with the Bioconductor
    Package Guidelines.

  • I understand Bioconductor Package Naming Policy and acknowledge
    Bioconductor may retain use of package name.

  • I understand that a minimum requirement for package acceptance
    is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS.
    Passing these checks does not result in automatic acceptance. The
    package will then undergo a formal review and recommendations for
    acceptance regarding other Bioconductor standards will be addressed.

  • My package addresses statistical or bioinformatic issues related
    to the analysis and comprehension of high throughput genomic data.

  • I am committed to the long-term maintenance of my package. This
    includes monitoring the support site for issues that users may
    have, subscribing to the bioc-devel mailing list to stay aware
    of developments in the Bioconductor community, responding promptly
    to requests for updates from the Core team in response to changes in
    R or underlying software.

  • I am familiar with the Bioconductor code of conduct and
    agree to abide by it.

I am familiar with the essential aspects of Bioconductor software
management, including:

  • The 'devel' branch for new packages and features.
  • The stable 'release' branch, made available every six
    months, for bug fixes.
  • Bioconductor version control using Git
    (optionally via GitHub).

For questions/help about the submission process, including questions about
the output of the automatic reports generated by the SPB (Single Package
Builder), please use the #package-submission channel of our Community Slack.
Follow the link on the home page of the Bioconductor website to sign up.

@bioc-issue-bot
Copy link
Collaborator

Hi @lianos

Thanks for submitting your package. We are taking a quick
look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: sparrow
Type: Package
Title: Take command of set enrichment analyses through a unified interface
Version: 0.99.00
Authors@R: c(
    person(
        "Steve", "Lianoglou", role = c("aut", "cre"),
        email = "slianoglou@gmail.com",
        comment = c(ORCID = "0000-0002-0924-1754")),
    person("Denali Therapeutics", role = c("fnd"), comment = "2018+"),
    person("Genentech", role = c("fnd"), comment = "2014 - 2017"))
Description: Provides a unified interface to a variety of GSEA techniques from
    different bioconductor packages. Results are harmonized into a single object
    and can be interrogated uniformly for quick exploration and interpretation
    of results. Interactive exploration of GSEA results is enabled through
    a shiny app provided by a sparrow.shiny sibling package.
Depends:
    R (>= 4.0)
Imports:
    babelgene (>= 21.4),
    BiocGenerics,
    BiocParallel,
    checkmate,
    circlize,
    ComplexHeatmap (>= 2.0),
    data.table (>= 1.10.4),
    DelayedMatrixStats,
    edgeR (>= 3.18.1),
    ggplot2 (>= 2.2.0),
    graphics,
    grDevices,
    GSEABase,
    GSVA,
    irlba,
    limma,
    Matrix,
    methods,
    plotly (>= 4.9.0),
    stats,
    utils,
    viridis
Suggests:
    AnnotationDbi,
    BiasedUrn,
    Biobase (>= 2.24.0),
    BiocStyle,
    DESeq2,
    dplyr,
    dtplyr,
    fgsea,
    GO.db,
    goseq,
    hexbin,
    magrittr,
    matrixStats,
    msigdbr (>= 7.4.1),
    KernSmooth,
    knitr,
    PANTHER.db (>= 1.0.3),
    R.utils,
    reactome.db,
    reshape2,
    rmarkdown,
    SummarizedExperiment,
    statmod,
    stringr,
    testthat,
    webshot
biocViews: GeneSetEnrichment, Pathways
BiocType: Software
VignetteBuilder: knitr
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Collate:
    'AllClasses.R'
    'AllGenerics.R'
    'GeneSetDb-class.R'
    'GeneSetDb-methods.R'
    'SparrowResult-methods.R'
    'aaa.R'
    'bioc-accessors.R'
    'calculateIndividualLogFC.R'
    'convertIdentifiers.R'
    'validateInputs.R'
    'do.camera.R'
    'do.cameraPR.R'
    'do.fgsea.R'
    'do.fry.R'
    'do.geneSetTest.R'
    'do.goseq.R'
    'do.logFC.R'
    'do.ora.R'
    'do.roast.R'
    'do.romer.R'
    'do.svdGeneSetTest.R'
    'geneSetSummaryByGenes.R'
    'get-msigdb.R'
    'get-panther.R'
    'get-reactome.R'
    'getKeggGeneSetDb.R'
    'gsea-helpers.R'
    'package.R'
    'plots-corplot.R'
    'plots-interactive.R'
    'plots-mgheatmap.R'
    'plots-mgheatmap2.R'
    'renameCollections.R'
    'renameRows.R'
    'scale_rows.R'
    'scoreSingleSamples.R'
    'seas.R'
    'single-sample-scoring-methods.R'
    'species.R'
    'testing-helpers.R'
    'utilities.R'
    'volcano_plot.R'
    'zzz.R'
RoxygenNote: 7.1.1
Roxygen: list(markdown = TRUE)

@bioc-issue-bot bioc-issue-bot added the 1. awaiting moderation submitted and waiting clearance to access resources label Sep 29, 2021
@bioc-issue-bot
Copy link
Collaborator

A reviewer has been assigned to your package. Learn what to expect
during the review process.

IMPORTANT: Please read this documentation for setting
up remotes to push to git.bioconductor.org. It is required to push a
version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org
access. To manage keys and future access you may want to active your
Bioconductor Git Credentials Account

@bioc-issue-bot bioc-issue-bot added 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place and removed 1. awaiting moderation submitted and waiting clearance to access resources labels Oct 2, 2021
@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "TIMEOUT".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/sparrow to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: 71fc95a9831d6ee22c5b323004f0962c512d8dd7

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/sparrow to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: b80ccd2508299f327055b4d0d683578d280bddb4

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/sparrow to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: 05a0b8e865f936287b44b3430bb77d14a6094d52

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/sparrow to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@bioc-issue-bot bioc-issue-bot added OK and removed ERROR labels Oct 5, 2021
@nturaga nturaga assigned hpages and unassigned nturaga Oct 5, 2021
@hpages
Copy link
Contributor

hpages commented Oct 6, 2021

Hi Steve, it's been a long time!

Thanks for this new submission. You're apparently aware of the existence of the BiocSet package by @Kayla-Morrell, @mtmorgan, @kevinrue, and @llrs. From your vignette (which is very nice):

NOTE: The GeneSetDb class was initially developed to address some peculiarities of storing, querying, and manipulating gene sets using the old-school GSEABase::GeneSetCollection class. It is not meant to replace the more recently developd BiocSet class, and we plan to support and migrate to that gene set container in the future. This section provides a quick overview of some of the functionality of gene set container.

Any reason why you're not just using that? I'm worried that getting a GeneSetDb-centric sparrow into Bioconductor will make a later migration to BiocSet much harder than if the package was migrated now, before people start using it and depending on it.

Small problem with section numbers in the vignette:

Screenshot from 2021-10-05 18-14-30

Thanks,
H.

@llrs
Copy link
Contributor

llrs commented Oct 6, 2021

Many thanks for mentioning me but I'm not an author of the package just a contributor on the discussions.

The Bioconductor common Methods and Classes page mentions GSEABase classes for gene set, if this is discouraged maybe the website might need to be updated.

Last there is also my package BaseSet on CRAN with similar functionalities. It already allows to load data from GSEABase GeneSetCollection to its own class.

@lianos
Copy link
Author

lianos commented Oct 6, 2021

Hi Herve!

Thanks for the quick feedback, and yes: has been a very long time!

Any reason why you're not just using that [a BiocSet]? I'm worried that getting a GeneSetDb-centric sparrow into Bioconductor will make a later migration to BiocSet much harder than if the package was migrated now, before people start using it and depending on it.

Yes, I was worried that you would be worried about that :-)

The reason I'm not using that at the moment is that it would take quite a while to refactor and test. There has been a lot of functionality baked into the GeneSetDb class over the years, and I just haven't had the time to check and port over to BiocSet. I do plan on taking a shot at porting it over in the future.

Why am I submitting now? It has long been on my todo list, but also there are updates in another package a colleague of mine has in Biocondcutor (gCrisprTools) that has incoming updates for next release that rely on functionality in sparrow, and was hoping to support those developments there by getting sparrow published now.

I can understand why this gives you pause, though. The good news, perhaps, is that while the GeneSetDb class plays an important role within the package, its use is quite ... uhm ... not so important "outside" of the package. I suspect users would interact very little (if at all) with the GeneSetDb, as the majority of the use cases I've found working with sparrow simply return data.frames of gene sets after analyses.

For instance, if you run the main analysis function seas() by giving it a GeneSetCollection's worth of gene sets, you can very likely never even have to know what a GeneSetDb is.

Would you be OK if, for now, I simply add the ability to provide a gene set collection as a BiocSet as inputs to the main user-facing functions (seas() being the primary one, but also probably scoreSingleSamples()) and iterate on migrating the internals of the package to BiocSet in time?

It's not often that users will extract the internal GeneSetDb class, but I can also provide the ability to have it returned as a BiocSet as well.

Would that be sufficient?

Also, thank you for pointing out the header-level issue in the vignette, I'll fix each section to use an h1 header element and readjust from there.

@kevinrue
Copy link

kevinrue commented Oct 6, 2021

Thanks for mentioning me as well @hpages

I also haven't directly contributed to BiocSet, but I have very much appreciated the discussions that ensued from the parallel efforts on BiocSet, BaseSet, and my own unisets.
(I haven't pushed unisets to any other repository than GitHub, to avoid further confusion, and leave full visibility to BiocSet moving forward).

To be honest, I would point out that even since BiocSet was released, I haven't witnessed wide community adoption of the package (yet).
From a brief look at https://bioconductor.org/packages/release/bioc/html/BiocSet.html I see today 1 package that Depends on BiocSet and 1 package that Suggests it.

Obviously, Rome wasn't built in a day, and the rich-gets-richer way of software adoption means that BiocSet probably won't be appealing to many developers until more developers buy into the new infrastructure and create a critical mass of users, developers and documentation driving others to it.

While sparrow could be one of those early adopters, I wouldn't blame anyone for waiting to see how others get along with BiocSet before investing their own time into migrating their own package. Some deity somewhere probably knows what the future holds for BiocSet and whether some other infrastructure might come along and motivate yet another migration.

In many ways gene set analysis and infrastructure unfortunately very much still feels like a wild west of data structures, much in need of homoegeneisation between packages (hence the efforts invested in BiocSet, BaseSet and unisets). However, old favourites (e.g. clusterProfiler) are still implicitly supporting certain older data structures, and the amount of work involved in migrating those packages from one infrastructure to another are considerable factors, in my view, holding back the adoption of new data structures for gene sets.

I'm not the package reviewer here, but that part of @lianos 's latest message

I suspect users would interact very little (if at all) with the GeneSetDb, as the majority of the use cases I've found working with sparrow simply return data.frames of gene sets after analyses.

makes me feel that this package might not hurt any user by using the GeneSetDb internally, since it accepts a major Bioconductor data structure as input (GeneSetCollection) and produce data.frame as ouput.

Having said that, of course, using standard Bioconductor data structures internally is always encouraged for readability and contributions/maintenance by other Bioconductor developers. Packages that somehow get accepted into Bioconductor while not using standard containers are more challenging (and lonely) to maintain and more likely to fall into deprecation over time.

@lianos
Copy link
Author

lianos commented Oct 6, 2021

Thanks for your thoughtful commentary, @kevinrue.

Looking a bit more closely at BiocSet, there are areas of clear overlap with the GeneSetDb class I implemented, as well as functionality that is specific to my GeneSetDb class that sparrow relies on internally that I don't yet see has a clear analog for in BiocSet (yet :-).

Hopefully the extra functionality I rely on is useful to the design/implementation of BiocSet, and working on an internal conversion of GeneSetDb to BiocSet can help make that more clear. If the functionality is also useful to other end-users and GSEA package developers, as well, I am happy to help discuss/contribute changes to BiocSet to make that happen.

My hope is that sparrow can be approved with its internal GeneSetDb now, and we can gradually work towards the GeneSetDb -> BiocSet transition in time. 🤞

As it is, I've already added GeneSetDb <-> BiocSet conversion functions that allow users to send in BiocSet objects into sparrow functions and convert a GeneSetDb object back to a BiocSet in the event that the user wants to pull this object out of the classes in sparrow.

Once those changes work their way through my github actions to ensure all package level tests pass, I can also push back to the bioconductor git repo (or should I wait for a more formal review of the codebase as it currently is?)

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: fb5aba738efa5e1b7ef6f72257fbdd8ecb8a3f43

@lianos
Copy link
Author

lianos commented Oct 7, 2021

Just to keep everyone up to speed, I've run through the package and made more changes to promote the use of a BiocSet where GeneSetDb has been used for all the main user facing functions of this package.

This package also provides convenience functions that return GeneSetDb objects from different sources, like:

  • getMSigGeneSetDb() for a wrapper to msigdbr to get the MSigDb collections;
  • getKeggGeneSetDb() to get KEGG annotations, etc.

I've added sibling functions to all these methods to return a BiocSet of the same, so they can easily be used in the analyses provided by sparrow (ie. getMSigCollection(), getKeggCollection(), etc)

I also have changed documentation and the vignette to suggest a BiocSet first use of the package to end users, although the vignette itself is still using a GeneSetDb.

I can/will continue to make these modifications over time, but just want to make it abundantly clear that I'm putting in a good-faith effort to assuage any fears of not being willing to do a full BiocSet migration when it can be done down the road.

I can not completely replace the internals of sparrow to use BiocSet because there is (well tested) functionality and speed found in the GeneSetDb class and how the functions in sparrow interact with it.

Again, I just want to re-iterate that as the package now stands, users can use any/all functionality it provides by providing a BiocSet's worth of gene sets to the sparrow functions, and the use of a GeneSetDb is an implementation detail that is internal to the package. This makes it possible to carefully migrate the internals without disrupting the use of sparrow by the community since the interface won't change (and that interface now supports a BiocSet).

Thanks for your consideration.

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/sparrow to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@bioc-issue-bot bioc-issue-bot added WARNINGS and removed OK labels Oct 7, 2021
@kevinrue
Copy link

kevinrue commented Oct 7, 2021

Again, I'm not the reviewer here, but as a fellow package developer, I cannot stress how much I appreciate hearing updates from someone taking BiocSet for a test run. Your willingness to do this is providing evidence that might convince/comfort/motivate others (including me) that BiocSet is a viable infrastructure ready to be re-used across the Bioconductor project, not just a concept package that just looks pretty.

From the sounds of it, you are also providing the feedback that BiocSet developers/maintainers have been needing to further consider functionality that it might miss to cater for the needs of developers holding back still.

Thanks for your work and hard effort!

@lianos
Copy link
Author

lianos commented Oct 7, 2021

Thanks again for the thoughtful comments, @kevinrue !

@hpages according to the build report, the linux buildbot "nebbiolo2" seems to be taking ~5x longer to run the code in the package than the other two, and has timed out on the R CMD check run, which has now slapped a warning label on the submission.

I'll wait to make some meatier changes primarily to the documentation to push another commit to trigger the re-build. If this "warning" label is holding up the package review, though, I can try to tickle the build bot again to see if I get some better luck on the next build.

@hpages
Copy link
Contributor

hpages commented Oct 11, 2021

Thanks @llrs and @kevinrue for chiming in.

@kevinrue Agree with you on the importance of exposing BiocSet thru more real world use cases and gathering feedback that will ultimately allow the container to improve and convince. Steve's work will help tremendously achieve that goal.

@lianos All is fine. I didn't realize that GeneSetDb was mostly used behind the scene and that migrating the internals of the package to use BiocSet instead of GeneSetDb wouldn't turn out to be so disruptive after all. Thanks for adding GeneSetDb <-> BiocSet conversion functions. They will definitely help interoperability and pave the way to a future migration to BiocSet. Your other changes/additions sound good too. The vignette is very nice.

Best,
H.

@hpages hpages added 3a. accepted will be ingested into Bioconductor daily builder for distribution and removed 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place labels Oct 11, 2021
@bioc-issue-bot
Copy link
Collaborator

Your package has been accepted. It will be added to the
Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor
community. If you are interested in becoming a Bioconductor package
reviewer, please see Reviewers Expectations.

@lianos
Copy link
Author

lianos commented Oct 12, 2021

@hpages that's great news, thanks for your feedback and taking the time to review the package!

Thank you also to @llrs and @kevinrue for sharing your thoughts, as well.

@lshep
Copy link
Contributor

lshep commented Oct 12, 2021

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/lianos.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/
https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("sparrow"). The package 'landing page' will be created at

https://bioconductor.org/packages/sparrow

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.

@lshep lshep closed this as completed Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3a. accepted will be ingested into Bioconductor daily builder for distribution WARNINGS
Projects
None yet
Development

No branches or pull requests

8 participants