Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRaNIE #2556

Closed
10 tasks done
chrarnold opened this issue Mar 7, 2022 · 40 comments
Closed
10 tasks done

GRaNIE #2556

chrarnold opened this issue Mar 7, 2022 · 40 comments
Assignees
Labels
3a. accepted will be ingested into Bioconductor daily builder for distribution OK

Comments

@chrarnold
Copy link

Update the following URL to point to the GitHub repository of
the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

  • I understand that by submitting my package to Bioconductor,
    the package source and all review commentary are visible to the
    general public.

  • I have read the Bioconductor Package Submission
    instructions. My package is consistent with the Bioconductor
    Package Guidelines.

  • I understand Bioconductor Package Naming Policy and acknowledge
    Bioconductor may retain use of package name.

  • I understand that a minimum requirement for package acceptance
    is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS.
    Passing these checks does not result in automatic acceptance. The
    package will then undergo a formal review and recommendations for
    acceptance regarding other Bioconductor standards will be addressed.

  • My package addresses statistical or bioinformatic issues related
    to the analysis and comprehension of high throughput genomic data.

  • I am committed to the long-term maintenance of my package. This
    includes monitoring the support site for issues that users may
    have, subscribing to the bioc-devel mailing list to stay aware
    of developments in the Bioconductor community, responding promptly
    to requests for updates from the Core team in response to changes in
    R or underlying software.

  • I am familiar with the Bioconductor code of conduct and
    agree to abide by it.

I am familiar with the essential aspects of Bioconductor software
management, including:

  • The 'devel' branch for new packages and features.
  • The stable 'release' branch, made available every six
    months, for bug fixes.
  • Bioconductor version control using Git
    (optionally via GitHub).

For questions/help about the submission process, including questions about
the output of the automatic reports generated by the SPB (Single Package
Builder), please use the #package-submission channel of our Community Slack.
Follow the link on the home page of the Bioconductor website to sign up.

@bioc-issue-bot
Copy link
Collaborator

Hi @chrarnold

Thanks for submitting your package. We are taking a quick
look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: GRaNIE
Title: GRaNIE: Reconstruction cell type specific gene regulatory networks including enhancers using chromatin accessibility and RNA-seq data
Version: 0.99.0
Encoding: UTF-8
Authors@R: c(person("Christian", "Arnold", email =
        "chrarnold@web.de", role = c("cre","aut")),
        person("Judith", "Zaugg", email =
        "judith.zaugg@embl.de", role = c("aut")),
        person("Rim", "Moussa", email =
        "rim.moussa01@gmail.com", role = "aut"),
        person("Armando", "Reyes-Palomares", email =
        "armandorp@gmail.com", role = "ctb"),
        person("Giovanni", "Palla", email =
        "giov.pll@gmail.com", role = "ctb"),
        person("Maksim", "Kholmatov", email =
        "maksim.kholmatov@embl.de", role = "ctb"))
Description: Genetic variants associated with diseases often affect non-coding regions, thus likely having a regulatory role. To understand the effects of genetic variants in these regulatory regions, identifying genes that are modulated by specific regulatory elements (REs) is crucial. The effect of gene regulatory elements, such as enhancers, is often cell-type specific, likely because the combinations of transcription factors (TFs) that are regulating a given enhancer have celltype specific activity. This TF activity can be quantified with existing tools such as diffTF and captures differences in binding of a TF in open chromatin regions. Collectively, this forms a gene regulatory network (GRN) with cell-type and data-specific TF-RE and RE-gene links. Here, we reconstruct such a GRN using bulk RNAseq and open chromatin (e.g., using ATACseq or ChIPseq for open chromatin marks) and optionally TF activity data. Our network contains different types of links, connecting TFs to regulatory elements, the latter of which is connected to genes in the vicinity or within the same chromatin domain (TAD). We use a statistical framework to assign empirical FDRs and weights to all links using a permutation-based approach.
Imports:
    futile.logger,
    checkmate,
    patchwork,
    reshape2,
    data.table,
    matrixStats,
    Matrix,
    GenomicRanges,
    RColorBrewer,
    colorspace,
    ComplexHeatmap,
    DESeq2,
    csaw,
    circlize,
    robust,
    progress,
    utils,
    methods,
    stringr,
    scales,
    BiocManager,
    BiocParallel,
    igraph,
    S4Vectors,
    ggplot2,
    rlang,
    Biostrings,
    GenomeInfoDb,
    IRanges,
    SummarizedExperiment,
    forcats,
    gridExtra,
    limma,
    purrr,
    tidyselect,
    readr,
    grid,
    tidyr,
    dplyr,
    stats,
    grDevices,
    graphics,
    magrittr,
    tibble,
    viridis,
    BiocFileCache
Depends:
    R (>= 4.1.0),
    tidyverse,
    topGO
Suggests:
    knitr,
    BSgenome.Hsapiens.UCSC.hg19,
    BSgenome.Hsapiens.UCSC.hg38,
    BSgenome.Mmusculus.UCSC.mm10,
    BSgenome.Mmusculus.UCSC.mm9,
    TxDb.Hsapiens.UCSC.hg19.knownGene,
    TxDb.Hsapiens.UCSC.hg38.knownGene,
    TxDb.Mmusculus.UCSC.mm10.knownGene,
    TxDb.Mmusculus.UCSC.mm9.knownGene,
    org.Hs.eg.db,
    org.Mm.eg.db,
    IHW,
    biomaRt,
    clusterProfiler,
    ReactomePA,
    DOSE,
    ChIPseeker,
    testthat (>= 3.0.0),
    BiocStyle
VignetteBuilder: knitr
biocViews: Software, GeneExpression, GeneRegulation, NetworkInference, GeneSetEnrichment, BiomedicalInformatics, Genetics, Transcriptomics, ATACSeq, RNASeq, GraphAndNetwork, Regression, Transcription, ChIPSeq
License: Artistic-2.0
LazyData: false
URL: https://grp-zaugg.embl-community.io/GRaNIE
BugReports: https://git.embl.de/grp-zaugg/GRaNIE/issues
RoxygenNote: 7.1.2
Config/testthat/edition: 3

@bioc-issue-bot bioc-issue-bot added the 1. awaiting moderation submitted and waiting clearance to access resources label Mar 7, 2022
@vjcitn
Copy link
Collaborator

vjcitn commented Mar 14, 2022

Quitting from lines 212-213 (workflow.Rmd) 
Error: processing vignette 'workflow.Rmd' failed with diagnostics:
cannot open file '/g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/PCA_sharedSamples_RNA.raw.pdf'
--- failed re-building ‘workflow.Rmd’

I am going to pass this package into single package build so that you can fix it. thanks for your submission.

@vjcitn vjcitn added the pre-check passed pre-review performed and ready to be added to git label Mar 14, 2022
@chrarnold
Copy link
Author

Hi there, in the vignette, there is no explicit path being mentioned (I just checked again), I am a bit confused where the absolute path you posted here comes from?

@vjcitn
Copy link
Collaborator

vjcitn commented Mar 14, 2022

we'll find out in testing locally i know it is not in vignette it has something to do with templating

@lshep
Copy link
Contributor

lshep commented Mar 14, 2022

FWIW I can also reproduce the error locally on my machine

Quitting from lines 212-213 (workflow.Rmd) 
Error: processing vignette 'workflow.Rmd' failed with diagnostics:
cannot open file '/g/scb2/zaugg/carnold/Projects/GRN_pipeline/src/GRaNIE/vignettes/output/plots/PCA_sharedSamples_RNA.raw.pdf'
--- failed re-building ‘workflow.Rmd’

SUMMARY: processing the following file failed:
  ‘workflow.Rmd’

Error: Vignette re-building failed.
Execution halted

@chrarnold
Copy link
Author

Ok I will try this today on my local machine and see where it comes from! I am on it, thanks.

@chrarnold
Copy link
Author

I know the reason now. In order to fix it, I'd like to query here what the maximum time is the vignette code is allowed to run.

@lshep
Copy link
Contributor

lshep commented Mar 16, 2022

I believe the time limit is 10 min for software packages but ideally far less than that. Remember it is often sufficient to use a small dummy dataset to show functionality and proof of principle.

@chrarnold
Copy link
Author

chrarnold commented Mar 19, 2022

I pushed a new version 0.99.1 to the Git today that results in 0 errors or warnings on an external machine with the newest R devel and Biocondutor 3.15. I hope the same is true for the build machines :). I optimized a few things and can summarize if needed. Thanks @lshep and @vjcitn for your input so far!

The issue with BiocFileCache and making sure it redownloads the example object only when needed is still ongoing, but not affecting the package review.

@bioc-issue-bot
Copy link
Collaborator

A reviewer has been assigned to your package. Learn what to expect
during the review process.

IMPORTANT: Please read this documentation for setting
up remotes to push to git.bioconductor.org. It is required to push a
version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org
access. To manage keys and future access you may want to active your
Bioconductor Git Credentials Account

@bioc-issue-bot bioc-issue-bot added 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place and removed 1. awaiting moderation submitted and waiting clearance to access resources pre-check passed pre-review performed and ready to be added to git labels Mar 21, 2022
@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR, TIMEOUT".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/GRaNIE to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@chrarnold
Copy link
Author

chrarnold commented Mar 21, 2022

Thank you for assigning a reviewer quickly here! I already checked the build report and uploaded a new version 0.99.2 to Github (not Bioconductor yet) which reduces the repo size a lot, I hope I am below the limit now - not sure I can decrease it further with little effort, but let's see what the next build report says. I am not sure but I have an idea where the Windows error comes from (I pushed a first attempt to fix it but not sure about it), I do not have a Windows machine currently available to test it unfortunately. I am hoping this can be solved after the first comments have been laid out, hopefully.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: d0b1deb67edb0f2cddf050c3491bd6dd78141b4b

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR, TIMEOUT".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/GRaNIE to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@chrarnold
Copy link
Author

I checked the new build report, and I noticed 2 things:

  1. Timeout for R CMD check after 15 minutes (I'd like to see what takes up the most time, but this is unfortunately not shown)
  2. The Package Source tarball exceeds Bioconductor size of 5 Mb (13 MB)

I do not know how to satisfy these with a reasonable effort.

  1. My package is quite large, has many functions and each of them, as required and recommended, with @examples to show the usage o the functions. Even with the small example dataset that I use, this takes 30 seconds for a function to run, and collectively adds up running time for R CMD build. Similarly for using testthat. I cannot use a much smaller test dataset because then, the resulting gene-regulatory network becomes meaningless and so tiny that no useful enrichment etc can be run on it, it would look bad in a vignette.
  2. Similarly, because I intended to show in the workflow and package details vignette all main output plots along with explanations, the images I am including in the HTMLs are just above 5 Mb and I cant really do much about it... It is literally over 50 or so png images, each with varying size. The "inst" folder in the tarball alone is therefore exceeding the 5 MB. I can completely remove the workflow vignette and put it only on our dedicated package website with a stable URL, but isnt this against the whole philosophy a bit?

Please advise what to do, I cant revamp the whole idea and package just because of these quite stringent criteria :(.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: 197f7c94eb0d68284743301fa87a02299dc16631

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR, TIMEOUT".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/GRaNIE to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@vjcitn
Copy link
Collaborator

vjcitn commented Mar 22, 2022

Hi, thanks for the submission, we really hope it is possible to review it. But you'll have to do some work. The 5MB limit is met by over 1000 packages submitted since the standard was imposed. You have 5+MB of png and that is quite uncommon. You might want to serialize a pdf with the detailed information. But the size limit is firm. As for timeout, again all packages have to meet that standard. There is a longtests option (https://bioconductor.org/developers/how-to/long-tests/) that may be useful to you. Consider applying to the developer mentoring program https://bioconductor.org/developers/new-developer-program/ ...

@chrarnold
Copy link
Author

chrarnold commented Mar 22, 2022

Thanks, I will try my best - but I am going on vacation for 3.5 weeks on Saturday, only returning mid April, so not sure acceptance of the package can be finalized by April 20th, lets see.

Maybe I can explain to you how we currently do it with the PNGs: The functions that produce plots produce a PDF file with multiple (often many) pages. In the detailed Workflow vignette, we run all main functions, and only show selected plots by including a PNG of the page of the output PDF. This contributes mainly to the large size. While it is possible to just plot to the currently active graphics device(and not to a PDF), if I did so in the Vignette, then the vignette would contain a lot of plots I dont want to show. Alternatively, I could not show any example plots, but this is again not really ideal and defeats the purpose a bit. I could decrease the resolution of the images much more, but then readability suffers. Do you have any other ideas how to go about it?

For the time limit: I managed to reduce it a bit, and now, a timeout occurs on only one of the machines, I presume no timeout can occur on any of the machines? Is the testing via testthat etc also part of the time limit? Thanks for the longtest option - I quickly checked it but not sure how much it helps: The running time comes mainly from the examples, while the "tests" contribute only a little. The description from the link you sent says: "The maximum amount of time that R CMD check is allowed to spend on each package is 40 min.". Is this really 40 minutes, it seems it is 15 according to the build reports?

@vjcitn
Copy link
Collaborator

vjcitn commented Mar 22, 2022

The graphics problem you describe needs to be solved in another way. I don't have a specific solution at the moment, you might ask in slack on developer forum channel.

the longtests should behave as advertised; if you are doing too many examples, cut it down and provide code for users to run additional examples as needed.

@chrarnold
Copy link
Author

chrarnold commented Mar 22, 2022

Thanks, for the examples that is fine - I recall that 80% of the functions must have a running example (and I tried to follow), otherwise a Bioc warning is raised - but if this is more negotiable and the more time-consuming ones can be skipped that would be a start.

Is it possible, if ok with Bioc, to only include a "package details" Vignette (see hre: https://grp-zaugg.embl-community.io/GRaNIE/articles/packageDetails.html) as part o the repo and do the full Workflow vignette outside of the Bioc package (and provide a link for users so they can go to the website and see it there)? That would solve many problems and users would be happy too because I can include more of the output as part of the vignette:)

Sorry for the many discussions and troubles, just trying to find a good way that works for all involved ;)

@vjcitn
Copy link
Collaborator

vjcitn commented Mar 25, 2022

Bioconductor submissions should achieve a certain amount of conventionality. This makes it easier for users to explore and understand what they are getting involved with. Many developers have made efforts to introduce function and data combinations that demonstrate package capabilities conveniently in the vignette, which typically includes extensive coverage of key functions in the package, that run in the available time. Please take the advice from the bioc-devel dialogue and try to meet Bioconductor standards for your package. I would say the "package details" vignette, while very nice and informative, does not meet the standard, and that meeting it would be good for both the software and the intended user base.

@lshep
Copy link
Contributor

lshep commented Mar 25, 2022

I think trying to split up the package or having a more minimal dataset in the package but then submitting an accompanied workflow package with the more extensive, detailed workflow vignette are two very good options mentioned on the bioc-devel conversation.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: 8c502bd1c5fccbbcd136bade51c2e10189196c12

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/GRaNIE to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@chrarnold
Copy link
Author

chrarnold commented Apr 9, 2022

Hi all, I finally had time to work on the issues, and I think they are resolved now. All plot functions can now plot to the currently active graphics device, page-specifically, eliminating the need to store any images in the repo. I also removed the other vignettes as suggested. The dataset used in the workflow is more minimal now, and it seems to complete within the allowed time frame. I am on vacation, and available on the 12th again for further work if needed. Thanks for the input so far, and sorry for the initial troubles @lshep .

@lshep lshep assigned vjcitn and unassigned lshep Apr 14, 2022
@vjcitn
Copy link
Collaborator

vjcitn commented Apr 25, 2022

I am going to accept this package. But please revise the vignette so that the abstract tells the reader WHAT IS ACTUALLY GOING TO BE DONE, SCIENTIFICALLY, FROM THE POINT OF VIEW OF SOMEONE WHO HAS NO IDEA WHAT "GRaNIE" IS. The introductory paragraph could benefit from similar elaborations.

@vjcitn vjcitn added 3a. accepted will be ingested into Bioconductor daily builder for distribution and removed 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place labels Apr 25, 2022
@bioc-issue-bot
Copy link
Collaborator

Your package has been accepted. It will be added to the
Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor
community. If you are interested in becoming a Bioconductor package
reviewer, please see Reviewers Expectations.

@vjcitn
Copy link
Collaborator

vjcitn commented Apr 25, 2022

and do not name your vignette "workflow.Rmd". Use a less generic name.

@lshep
Copy link
Contributor

lshep commented Apr 25, 2022

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/chrarnold.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/
https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("GRaNIE"). The package 'landing page' will be created at

https://bioconductor.org/packages/GRaNIE

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.

@lshep lshep closed this as completed Apr 25, 2022
@chrarnold
Copy link
Author

Thanks a lot, we are happy about this! I will revise the vignette today before the freeze.

About the renaming: Would GRaNIE_workflow.RMd ? Not sure which other name reflects better of what the vignettes shows. Some packages such as DESeq just name the vignette as the package name, would that be more appropriate yes? Thanks!

@vjcitn
Copy link
Collaborator

vjcitn commented Apr 25, 2022 via email

@chrarnold
Copy link
Author

OK thanks! Another quick question: The original Git repository, what do I do with it? In the future, I commit changes directly to the Bioconductor remotes and there is no need for the Git repository, correct? I'd like to avoid and remove redundancy and complexity so I am thinking about retiring the Git repo then. What are best practices here?

@chrarnold
Copy link
Author

I pushed a new commit 5 minutes ago, but I do not see the automated "Received a valid push on git.bioconductor.org" from the bot. Is this because the issue was closed or because of another reason?

@lshep
Copy link
Contributor

lshep commented Apr 25, 2022

Yes once it's accepted and closed it doesn't build on demand. It will be reflected when it shows in the daily build report

@chrarnold
Copy link
Author

OK thanks for the quick reply, let's hope there is no error. Locally all looked good and I changed only few things, there shouldnt be anything new :). I will wait for the report then thank you :)

@lshep
Copy link
Contributor

lshep commented Apr 25, 2022

No problem. Here is the main build check page for convenience https://www.bioconductor.org/checkResults/

@chrarnold
Copy link
Author

Hi, I noticed that GRaNIE appears only in Bioconductor 3.16 (devel) in the reports but not in Bioconductor 3.15 (release). From my understanding, shouldnt it be in the release branch also, given it built without errors before the deadline on April 20th or so?

@lshep
Copy link
Contributor

lshep commented Apr 28, 2022

we investigated the issue and identified the cause.
The recently accepted new packages should be available after the next build report posting.
Sorry for the inconvenience.

@chrarnold
Copy link
Author

I see perfect thanks for the quick reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3a. accepted will be ingested into Bioconductor daily builder for distribution OK
Projects
None yet
Development

No branches or pull requests

4 participants