-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CustomGenome #3224
Comments
Hi @mtekman Thanks for submitting your package. We are taking a quick The DESCRIPTION file for this package is:
|
Hi, I'm just providing some thoughts/comments since I was also dealing a lot with Ensembl annotations.
The general idea to create custom annotations or add additional sequence/gene information to reference genomes is however really nice. |
Hello @jorainer, thanks for the comments!
Exactly, I was surprised it wasn't already a package!
Ah that's why these reviews are golden. I wish I had heard about this before I implemented the backend for Ensembl downloads that I did. library(AnnotationHub)
ah = AnnotationHub(cache = "/my/dest/cache")
ah[ah$species == "Mus musculus" &
ah$dataprovider == "Ensembl" &
ah$genome=="GRCm38" &
ah$sourcetype %in% c("GTF", "ensembl")] The problem I'm seeing is how to pair a given genome with a given GTF file? With Ensembl this is pretty unambiguous, but with AnnotationHub I'm overwhelmed with choice. I think this is something I will implement into the next version of this tool, since right now this is very much geared towards Ensembl downloads.
I had never heard of this, and it sounds pretty useful for some future analyses. I have some angst related to how it checks for data consistency: https://bioconductor.org/packages/release/bioc/vignettes/BiocFileCache/inst/doc/BiocFileCache.html
It sounds like it doesn't do any consistency checks with remote checksums, which is a main indicator for me on whether a large file downloaded isn't truncated. Have you had any issues with this in your code? Also, how does BiocFileCache interact with AnnotionHub? I have the impression that AnnotationHub has it's own caching. |
BiocFileCache is used as the backend caching mechanism in AnnotationHub. |
You'll need an |
Your package has been added to git.bioconductor.org to continue the IMPORTANT: Please read this documentation for setting Bioconductor utilized your github ssh-keys for git.bioconductor.org |
Dear Package contributor, This is the automated single package builder at bioconductor.org. Your package has been built on the Bioconductor Single Package Builder. On one or more platforms, the build results were: "WARNINGS, ERROR". Please see the build report for more details. The following are build products from R CMD build on the Single Package Builder: Links above active for 21 days. Remember: if you submitted your package after July 7th, 2020, |
Please correct ERROR, Warning, Notes from the build report before a reviewer is assigned for in depth review. |
@lshep Thanks, I've fixed the issue! Sorry for the long delay, it was a busy month |
Please push the changes to git.bioconductor.org to trigger a new build report. see #3224 (comment) |
Received a valid push on git.bioconductor.org; starting a build for commit id: 5b002c807b9b8c1b5591e01bd3473c47e6eefb7d |
Dear Package contributor, This is the automated single package builder at bioconductor.org. Your package has been built on the Bioconductor Single Package Builder. On one or more platforms, the build results were: "WARNINGS, ERROR". Please see the build report for more details. The following are build products from R CMD build on the Single Package Builder: Links above active for 21 days. Remember: if you submitted your package after July 7th, 2020, |
Received a valid push on git.bioconductor.org; starting a build for commit id: 17a2b45aa9cd1500f950ee794216c1253d217ad1 |
Dear Package contributor, This is the automated single package builder at bioconductor.org. Your package has been built on the Bioconductor Single Package Builder. On one or more platforms, the build results were: "ERROR". Please see the build report for more details. The following are build products from R CMD build on the Single Package Builder: Links above active for 21 days. Remember: if you submitted your package after July 7th, 2020, |
Received a valid push on git.bioconductor.org; starting a build for commit id: a20b98307ed8f4bd284a7a691d337a4e190a7fe5 |
Dear Package contributor, This is the automated single package builder at bioconductor.org. Your package has been built on the Bioconductor Single Package Builder. On one or more platforms, the build results were: "WARNINGS, ERROR". Please see the build report for more details. The following are build products from R CMD build on the Single Package Builder: Links above active for 21 days. Remember: if you submitted your package after July 7th, 2020, |
Received a valid push on git.bioconductor.org; starting a build for commit id: 4ad4318a3cef1df2d87ea278507046de3e0a0851 |
Dear Package contributor, This is the automated single package builder at bioconductor.org. Your package has been built on the Bioconductor Single Package Builder. On one or more platforms, the build results were: "TIMEOUT, ERROR". Please see the build report for more details. The following are build products from R CMD build on the Single Package Builder: Links above active for 21 days. Remember: if you submitted your package after July 7th, 2020, |
Received a valid push on git.bioconductor.org; starting a build for commit id: 41a09577a658c8b6e5783c78b3dad502377a79e2 |
Received a valid push on git.bioconductor.org; starting a build for commit id: 1d973511bf3aa102b4e5d390c63a4b2bf0b18ffa |
Dear Package contributor, This is the automated single package builder at bioconductor.org. Your package has been built on the Bioconductor Single Package Builder. On one or more platforms, the build results were: "TIMEOUT, ERROR". Please see the build report for more details. The following are build products from R CMD build on the Single Package Builder: Links above active for 21 days. Remember: if you submitted your package after July 7th, 2020, |
Received a valid push on git.bioconductor.org; starting a build for commit id: 036cc2b9ac273b5ec21b03509e6bafce23782da3 |
Dear Package contributor, This is the automated single package builder at bioconductor.org. Your package has been built on the Bioconductor Single Package Builder. On one or more platforms, the build results were: "WARNINGS, ERROR". Please see the build report for more details. The following are build products from R CMD build on the Single Package Builder: Links above active for 21 days. Remember: if you submitted your package after July 7th, 2020, |
@lshep I'm not sure how to get around these errors. The main problem with them is that sometimes a file downloads incomplete and then the checksum fails, but sometimes this works fine (as is the case here: #3224 (comment)) What do I need to do to get this to pass, other than removing more test checks that work fine on my local devices? |
Do you implement any sort of caching so that if a file works it doesnt have to be redownloaded the next time? See BiocFileCache |
I don't, but the code does check to see if the file has already been downloaded to the temp folder. I think the main issue is the sheer size of these files. Using BiocFileCache wouldn't fix this problem if I understand correctly, since if I run this N times over N push/commits path <- tempfile()
bfc <- BiocFileCache(path, ask = FALSE)
bfcadd(bfc, "genome", fpath="https://www.ensembl.com/big/file.fasta.gz")
bfcadd(bfc, "othergenome", fpath="https://www.ensembl.com/big/file2.fasta.gz")
## later:
mygenome1 <- bfcquery(bfc, "genome")
mygenome2 <- bfcquery(bfc, "othergenome") it would still initialise N different independent caches and still have to initially download all files that the start of each test. This doesn't change much from my initial code which downloads to a fixed folder for genomes and re-uses the resources over subsequent calls. Is there anything else I can do? |
If the size of the files are large then caching will be a necessity and likely a requirement by a reviewer. I would recommend a package specific directory that we general allow in cases like this. See UniProt.ws example:
But also be very clear to the user how much space they will need to run example data / vignettes before the start -- understanding that it can be variable once used with other data. |
Received a valid push on git.bioconductor.org; starting a build for commit id: 864159a746289e7ca5b23bf12a1ea6b6c15b7115 |
Dear Package contributor, This is the automated single package builder at bioconductor.org. Your package has been built on the Bioconductor Single Package Builder. On one or more platforms, the build results were: "WARNINGS". Please see the build report for more details. The following are build products from R CMD build on the Single Package Builder: Links above active for 21 days. Remember: if you submitted your package after July 7th, 2020, |
Received a valid push on git.bioconductor.org; starting a build for commit id: a6a69c13a70649825f07ac8b37769633974ea3b1 |
Dear Package contributor, This is the automated single package builder at bioconductor.org. Your package has been built on the Bioconductor Single Package Builder. On one or more platforms, the build results were: "TIMEOUT, WARNINGS". Please see the build report for more details. The following are build products from R CMD build on the Single Package Builder: Links above active for 21 days. Remember: if you submitted your package after July 7th, 2020, |
Received a valid push on git.bioconductor.org; starting a build for commit id: 630ed6ce4fe1602ac605460188da79966a45eaf5 |
Dear Package contributor, This is the automated single package builder at bioconductor.org. Your package has been built on the Bioconductor Single Package Builder. On one or more platforms, the build results were: "WARNINGS". Please see the build report for more details. The following are build products from R CMD build on the Single Package Builder: Links above active for 21 days. Remember: if you submitted your package after July 7th, 2020, |
This version implements BiocFileCache (within get_genome_files) and retires
the manual checksum based methods of confirming a cached file.
I see some warnings, but otherwise I think it's ready to be reviewed again.
|
A reviewer has been assigned to your package for an indepth review. |
Hi @mtekman, I started to review CustomGenome but it soon became clear that it needs some improvements before the review can proceed. I always start my package review by working through the vignette, like a new user would when they encounter your package. BiocCheck::BiocCheck('CustomGenome_0.99.10.tar.gz')
# Omitting some output
* Checking vignette directory...
* WARNING: Evaluate more vignette chunks.
10 out of 13 code chunks = 76% unevaluated
10 non-exec code chunk(s) (e.g., '```r')
# Omitting some more output Instead, the vignette has static copy+pasted code and output, and the problem with that is it can easily become out of sync with the actual code in the package. library(CustomGenome)
data("user_sequences")
#> Warning in data("user_sequences"): data set 'user_sequences' not found Continuing on, the next chunk contains invalid code: homo_sap = get_genome_files(species="homo_sapiens",
output_folder="genomes")
#> Error in get_genome_urls(...): unused argument (output_folder = "genomes") I corrected that error (using homo_sap = get_genome_files(species="homo_sapiens",
cache_folder="genomes")
#> Downloading: Mus_musculus.GRCm39.105.gtf.gz At this point I stopped my review because it's clear the package is not yet ready. A few other comments from what I saw when reading the first part of the vignette:
Please take a read of https://contributions.bioconductor.org/ to ensure that your package complies with this advice. Cheers, |
Hi @PeteHaitch Thanks a lot for the review, I guess I need to work a lot on the vignette. I somehow assumed that a reproducible vignette was not a neccessity, especially for large datasets -- but I can see that I neglected it so long that it did indeed fall out of sync with the code. Cheers to @lshep for the R_user_dir suggestion, I somehow overlooked that comment when I reimplemented the caching. I'll get back on this soon, thanks again |
Update the following URL to point to the GitHub repository of
the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
I understand that by submitting my package to Bioconductor,
the package source and all review commentary are visible to the
general public.
I have read the Bioconductor Package Submission
instructions. My package is consistent with the Bioconductor
Package Guidelines.
I understand Bioconductor Package Naming Policy and acknowledge
Bioconductor may retain use of package name.
I understand that a minimum requirement for package acceptance
is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS.
Passing these checks does not result in automatic acceptance. The
package will then undergo a formal review and recommendations for
acceptance regarding other Bioconductor standards will be addressed.
My package addresses statistical or bioinformatic issues related
to the analysis and comprehension of high throughput genomic data.
I am committed to the long-term maintenance of my package. This
includes monitoring the support site for issues that users may
have, subscribing to the bioc-devel mailing list to stay aware
of developments in the Bioconductor community, responding promptly
to requests for updates from the Core team in response to changes in
R or underlying software.
I am familiar with the Bioconductor code of conduct and
agree to abide by it.
I am familiar with the essential aspects of Bioconductor software
management, including:
months, for bug fixes.
(optionally via GitHub).
For questions/help about the submission process, including questions about
the output of the automatic reports generated by the SPB (Single Package
Builder), please use the #package-submission channel of our Community Slack.
Follow the link on the home page of the Bioconductor website to sign up.
The text was updated successfully, but these errors were encountered: