-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
funomics package #3322
Comments
Thanks for submitting your package. We are taking a quick The DESCRIPTION file for this package is:
|
Please make sure R CMD build, R CMD check, and BiocCheck can run on your package without ERROR. Your package currently can not be built as it is missing a NAMESPACE file. Could you please provide an abstract/intro section in your vignette that provides motivation for inclusion in Bioconductor and when appropriate a review and comparison to existing Bioconductor packages with similar functionality or scope. It will also be important to show interoperability with some of the standard Bioconductor classes for pathway analysis and gene sets like KEGGREST, Gostats, BiocSet/GSEAset, etc. See also common classes or results of pathway biocview |
Thanks for the revision. The NAMESPACE was not in the repository because it was mistakenly included in the .gitignore file, I already fixed it. The R CMD build, check and BiocCheck are passed smoothly. I have updated the intro section in the vignette with a motivation and comparison with existing Bioconductor packages. As for the interoperability, the inputs for the package are essentially an omics data matrix or dataframe and the molecular sets in the format of a list of lists. This list of lists is a generic format, not tied to any particular package or class and can be obtained by processing the downloads from https://data.broadinstitute.org/gsea-msigdb/msigdb/release/ or https://mips.helmholtz-muenchen.de/corum/#download/ as described in the vignette section "Real data: Where to find molecular sets?". To better clarify, I added to the vignette both an example of how to read the downloads from those sources to get a list of lists containing pathways and molecules (e.g. genes), and another code snippet for how to retrieve pathway gene sets as a list of lists from KEGGREST, but the idea is that the user uses their preferred source of molecular sets as desired for their analyses and these are just some examples. Regarding GOstats or GSEAset, these packages do not align with the current scenario because they are meant to test enrichment and thus require a "universe of genes". Please note that enrichment is not the purpose of funomics package. All changes have been uploaded to the repository linked in this issue. |
Thanks for this submission. I understand that the package code handles data matrices or data frames and lists of lists. So we have, in the vignette,
The aim of Bioconductor data structure designs is to hide the complexity of code like this. The |
Thanks a lot for your revision and comment. This code from the vignette corresponds to the generation of simulated data: A matrix X of omics abundance (e.g., gene counts) is generated, as well as a list of lists containing the genes involved in pathways. I think this code would not be less complex if using a SummarizedExperiment object instead of a matrix and a list of lists. The main value of SummarizedExperiment is to have sample annotations and the counts matrix in the same object, as you mention indeed. However, in the context of this package, the pathway information is independent of sample-specific annotations typically stored in the colData slot of a SummarizedExperiment object (e.g., treatment, diagnosis). The focus in the vignette is on demonstrating the core functionalities of More importantly, the pathway list of lists contains information about the molecular sets and their constituent molecules (e.g., gene IDs), and is typically downloaded or obtained from external sources (as described in the vignette, there is a section on this, and how to integrate programatically with KEGGREST), hence a separate list aligns well with how pathway information is often obtained in practice, beyond the simulated pathway list in the vignette. Therefore, a data matrix/dataframe and a separate list of molecular sets provides flexibility to work with various data formats, such as for users whose data does not necessarily come from a SummarizedExperiment object. Note that if the actual data is in I've also updated the repository with a slight change in the package name (funomics -> funOmics). Please, let me know how i should proceed. |
You are correct on all counts. But the contribution guidelines are clear: Interoperate with other Bioconductor packages by re-using common data structures (see Common Bioconductor Methods and Classes) and existing infrastructure (e.g., rtracklayer::import() for input of common genomic files). I get that you are reluctant to do this, and you explain that users who already have SummarizedExperiment or gene set data structures can disassemble them to use your package. The "ecosystem" and "interoperability" concepts are somewhat vague, but wouldn't your package be appropriate for CRAN? We have limited review resources and since you are not going to use Bioconductor data structures to any advantage, it would be better if our reviewers could pass on this one. Close the issue if you agree. Thanks again. |
Thank you for your feedback. I understand and appreciate the emphasis on Bioconductor contribution guidelines. While I initially considered CRAN due to the package's flexibility in handling diverse data formats, I've been advised and encouraged that 'funomics' core functionality of aggregating omics data into pathway-level features aligns better with Bioconductor's focus. The package funomics interoperates with the Bioconductor ecosystem in several ways. It leverages KEGGREST for retrieving pathway information, a common Bioconductor package for accessing KEGG resources. Additionally, funomics potentially complements packages like pathifier by offering alternative functionalities for pathway-level aggregation (e.g., user-defined pooling operators). Indeed, funomics package provides a kind of wrapper for the pathifier function "quantify_pathways_deregulation". I've also realized pathifier package performs somewhat similar operations, and does not require input from SummarizedExperiments object, hence I guess it is not a strict requirement? Nevertheless I understand the benefits of SummarizedExperiment for data management. Disassembling and reassembling SummarizedExperiment objects within funomics could be technically feasible, and I'm not reluctant to work on this, but it introduces unnecessary complexity (at this stage, this would artificially force the package to use it without real necessity). However, I'm open to exploring ways to integrate SummarizedExperiment or other Bioconductor structures in future versions of the package based on specific use cases and user feedback where using it provides added value to their analyses (this is why I point to the use of github issues to suggest improvements). Given funomics' focus on functional-level omics data aggregation and its interoperability with Bioconductor packages, I believe it can still be a valuable addition to the bioconductor repository and community even if at this stage, it does not take SummarizedExperiments objects as input. It operates with other packages, demonstrates utility in the field, and can also interact with the assay slot of SummarizedExperiments objects, so it's compatible with bioconductor structures. I'm happy to discuss this further and address any additional questions you might have. If absolutely required, as mentioned I can artifically force funomics to take SummarizedExperiment object as input, disassembling it internally, and assembling a SummarizedExperiment as an output, but I believe doing this just for the sake of using this class is not necessarily a good practice. |
Thank you. The only "necessity" I am thinking of at this point is fairness. Many developers of packages in the ecosystem took the time to achieve interoperability and did not get reviewed until this was accomplished. I think your package has to meet the same standard. Please revise. |
Hi, I'm not sure I follow your last comment. I pointed some examples in which funomics package interacts with other packages of the bioconductor ecosystem, hence the intereoperability. I also pointed an example bioconductor package using omics data where SummarizedExperiment in particular is not used. Most importantly, I explained why using SummarizedExperiment input does not add much value at this stage, indeed if only SummarizedExperiment structures are accepted as input, it would limit its flexibility and restrict its use to this specific data format, which is not necessarily the data format all potential users of funomics implement. Then, I don't really understand what is the requirement that is missing for the package to move forward. Are there alternative ways funomics can demonstrate interoperability with the Bioconductor ecosystem beyond SummarizedExperiment integration? or can you elaborate on how this SummarizedExperiment integration could make sense given the functionality and current structure of the package? |
I would argue then show an example using one of the many available SummarizedExperiment objects rather than a toss away comment in the vignette as a potential extension of the "real work" example in the vignette. There are many predefined SummarizedExperiments provided in packages and in the ExpeirmentHub. Also, the work around I generally suggest for people that have extensively coded out generalized structures rather than Bioconductor structures, if their documentation and vignette are thorough and code fairly sound and they argue its not as much value, is to create wrapper function around Bioconductor objects therefore simply adding a secondary function specific for the Bioconductor class structure; or simply check for the input to a function and if its a SummarizedExperiment ( Similarly you use KEGGREST (which is good) but then there is a complicated looping section to force it into your structure.
As Vince said above, this is the complexity we like to try to minimize for users. Where we would suggest again, a function that would take care of this formatting for a user. We will move this into the next stage of review but would expect some additional work to make it easier for end users. |
As far as the renaming, the DESCRIPTION and the repo must match exactly including capitalization. Please also rename the repo and update the url in the first comment. |
Your package has been added to git.bioconductor.org to continue the IMPORTANT: Please read this documentation for setting Bioconductor utilized your github ssh-keys for git.bioconductor.org |
Dear Package contributor, This is the automated single package builder at bioconductor.org. Your package has been built on the Bioconductor Single Package Builder. On one or more platforms, the build results were: "ABNORMAL". Please see the build report for more details. The following are build products from R CMD build on the Single Package Builder: Links above active for 21 days. Remember: if you submitted your package after July 7th, 2020, |
I think this has to do with the change of capitalization. I thought it would work to just update. I'll look into it more this weekend and get back to you on if we can change it on our end or what steps to take next. |
I think everything is straightened out. You should receive a new report shortly |
Dear Package contributor, This is the automated single package builder at bioconductor.org. Your package has been built on the Bioconductor Single Package Builder. On one or more platforms, the build results were: "WARNINGS". Please see the build report for more details. The following are build products from R CMD build on the Single Package Builder: Links above active for 21 days. Remember: if you submitted your package after July 7th, 2020, |
Thanks both for your comments and understanding:
Since there is this "abnormality", just to avoid messing the bioconductor git I will keep updating my personal repo ( https://github.com/elisagdelope/funOmics) until you let me know about next steps. |
Received a valid push on git.bioconductor.org; starting a build for commit id: e7b15bd312400e3ec980cb4be6c815e086a82ebf |
An example using the SummarizedExperiment object from airway data and a new function get_kegg_sets() to retrieve pathway gene sets from KEGG have been implemented. Since SummarizedExperiment assays must be of the same dimensions, and I think the vignette is now more complete, and shows effectively how Re the "abnormal" label, it seems like it still assigns it when I push. |
I'll check why the abnormal tag is still being added but it looks like it was resolved since you pushed and received a build report so please continue to do so. |
Thanks, from the report I see this note "License stub is invalid DCF." and I don't really understand what is to be done here, as I do have the MIT license file in the repository, and it is indicated in the DESCRIPTION file. Regarding the 'suppressWarnings'/'*Messages' if possible (found 2 times), I have used it on suppressMessages(AnnotationDbi::mapIds function()) that is used internally in get_kegg_sets() function to avoid displaying messages coming from a gene id internal mapping that could confound the user (as those messages are not related to the core purpose of get_kegg_sets(), and are not always shown, only if using homo sapiens samples). I think having them displayed is quite unclear in the context of get_kegg_sets, hence the suppressMessages. Please let me know about these, and if there's anything else that needs to be addressed. |
Hi, I addressed the concerns on interoperability and added a function to get KEGG sets as suggested. The package build without warnings/errors. Are there any updates on the package revision? |
There will be. Feel free to make further improvements. |
Early in the vignette we have
I commented previously that code like this in a vignette is unwelcome. Later in the vignette you The vignette mentions
Are any of these tasks illustrated in the package or in literature? Please clarify. |
It seems to me that there are some omissions in the package that the package checking system is missing. |
I do think we are getting closer to acceptance. We appreciate your efforts thus man
vignette
and your summary statistics
this is where defining a class structure and having dedicated helper functions |
Received a valid push on git.bioconductor.org; starting a build for commit id: 12285408e915dfc77416557de3aea7a375caa335 |
Pending restructure of the vignette to highlight the use case on airway data as compared to simulated data. |
Dear Package contributor, This is the automated single package builder at bioconductor.org. Your package has been built on the Bioconductor Single Package Builder. Congratulations! The package built without errors or warnings Please see the build report for more details. The following are build products from R CMD build on the Single Package Builder: Links above active for 21 days. Remember: if you submitted your package after July 7th, 2020, |
Thank you. Let us know when the vignette is updated and we will take a look again. |
Received a valid push on git.bioconductor.org; starting a build for commit id: b8f4ac8e63e0c502dbb434dcebf19a2b5fac3c8e |
Dear Package contributor, This is the automated single package builder at bioconductor.org. Your package has been built on the Bioconductor Single Package Builder. On one or more platforms, the build results were: "skipped, ERROR". Please see the build report for more details. The following are build products from R CMD build on the Single Package Builder: Links above active for 21 days. Remember: if you submitted your package after July 7th, 2020, |
I'm currently facing an issue with the NMF package. One of the functions in funOmics uses this package. The package is imported both in the NAMESPACE file and in the functions.R file, where the functions are located (in roxygen2 format @import, @importFrom). I have also tried to call the functions as NMF::nmf() and NMF::coef() instead of just nmf() and coef(), however the error persists. The error (see below) only gets removed if I explicitly load the library itself, like, if I have library(NMF) before calling the package function that, internally, uses one of its functions. I have explored a range of possible solutions but i have not been successful. What would you recommend? I know it's not ideal but, is it fine if i add library(NMF) in the vignette? Weirdly enough, this issue only happens with this library, not with any other. Error traceback:
When in vignette:
|
feel free to post this at the developer forum on slack |
Received a valid push on git.bioconductor.org; starting a build for commit id: 5d40c695f25e5d6bf3ba8d6b618095e89a2e1f58 |
Dear Package contributor, This is the automated single package builder at bioconductor.org. Your package has been built on the Bioconductor Single Package Builder. Congratulations! The package built without errors or warnings Please see the build report for more details. The following are build products from R CMD build on the Single Package Builder: Links above active for 21 days. Remember: if you submitted your package after July 7th, 2020, |
I think now the vignette is quite illustrative and straightforward to follow. Looking forward to your review. |
Your package has been accepted. It will be added to the Thank you for contributing to Bioconductor! Reviewers for Bioconductor packages are volunteers from the Bioconductor |
The default branch of your GitHub repository has been added to Bioconductor's To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/elisagdelope.keys is not empty), then no further steps are required. Otherwise, do the following: See further instructions at https://bioconductor.org/developers/how-to/git/ for working with this repository. See especially https://bioconductor.org/developers/how-to/git/new-package-workflow/ to keep your GitHub and Bioconductor repositories in sync. Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at https://bioconductor.org/checkResults/ (Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using https://bioconductor.org/packages/funOmics If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further. |
Update the following URL to point to the GitHub repository of
the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
I understand that by submitting my package to Bioconductor,
the package source and all review commentary are visible to the
general public.
I have read the Bioconductor Package Submission
instructions. My package is consistent with the Bioconductor
Package Guidelines.
I understand Bioconductor Package Naming Policy and acknowledge
Bioconductor may retain use of package name.
I understand that a minimum requirement for package acceptance
is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS.
Passing these checks does not result in automatic acceptance. The
package will then undergo a formal review and recommendations for
acceptance regarding other Bioconductor standards will be addressed.
My package addresses statistical or bioinformatic issues related
to the analysis and comprehension of high throughput genomic data.
I am committed to the long-term maintenance of my package. This
includes monitoring the support site for issues that users may
have, subscribing to the bioc-devel mailing list to stay aware
of developments in the Bioconductor community, responding promptly
to requests for updates from the Core team in response to changes in
R or underlying software.
I am familiar with the Bioconductor code of conduct and
agree to abide by it.
I am familiar with the essential aspects of Bioconductor software
management, including:
months, for bug fixes.
(optionally via GitHub).
For questions/help about the submission process, including questions about
the output of the automatic reports generated by the SPB (Single Package
Builder), please use the #package-submission channel of our Community Slack.
Follow the link on the home page of the Bioconductor website to sign up.
The text was updated successfully, but these errors were encountered: