Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tRNAscan2GRanges #607

Closed
8 tasks done
FelixErnst opened this issue Jan 11, 2018 · 38 comments
Closed
8 tasks done

tRNAscan2GRanges #607

FelixErnst opened this issue Jan 11, 2018 · 38 comments

Comments

@FelixErnst
Copy link

FelixErnst commented Jan 11, 2018

Update the following URL to point to the GitHub repository of
the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

  • I understand that by submitting my package to Bioconductor,
    the package source and all review commentary are visible to the
    general public.

  • I have read the Bioconductor Package Submission
    instructions. My package is consistent with the Bioconductor
    Package Guidelines.

  • I understand that a minimum requirement for package acceptance
    is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS.
    Passing these checks does not result in automatic acceptance. The
    package will then undergo a formal review and recommendations for
    acceptance regarding other Bioconductor standards will be addressed.

  • My package addresses statistical or bioinformatic issues related
    to the analysis and comprehension of high throughput genomic data.

  • I am committed to the long-term maintenance of my package. This
    includes monitoring the support site for issues that users may
    have, subscribing to the bioc-devel mailing list to stay aware
    of developments in the Bioconductor community, responding promptly
    to requests for updates from the Core team in response to changes in
    R or underlying software.

I am familiar with the essential aspects of Bioconductor software
management, including:

  • The 'devel' branch for new packages and features.
  • The stable 'release' branch, made available every six
    months, for bug fixes.
  • Bioconductor version control using Git
    (optionally via GitHub).

For help with submitting your package, please subscribe and post questions
to the bioc-devel mailing list.

@bioc-issue-bot
Copy link
Collaborator

Hi @FelixErnst

Thanks for submitting your package. We are taking a quick
look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: tRNAscan2GRanges
Title: Imports a tRNAscan-SE result file as GRanges object
Version: 0.99.0
Authors@R: person("Felix G.M.", 
        "Ernst", 
        email = "felix.gm.ernst@outlook.com", 
        role = c("aut", "cre"))
Description: The package imports the result of tRNAscan-SE as a GRanges object.
Depends: R (>= 3.5)
License: GPL-3 + file LICENSE
Encoding: UTF-8
LazyData: true
biocViews:
  Software,
  WorkflowStep,
  Preprocessing
Imports:
  BiocGenerics,
  Biostrings,
  GenomicRanges,
  GenomeInfoDb,
  assertive,
  S4Vectors
Collate:
  tRNAScan2GRanges.R
RoxygenNote: 6.0.1
Suggests: 
  BiocStyle, 
  knitr,
  rmarkdown,
  testthat
VignetteBuilder: knitr

@bioc-issue-bot
Copy link
Collaborator

Your package has been approved for building. Your package is
now submitted to our queue.

IMPORTANT: Please read the instructions for setting
up a push hook on your repository, or further changes to your
repository will NOT trigger a new build.

@bioc-issue-bot bioc-issue-bot added 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place and removed 1. awaiting moderation labels Jan 11, 2018
@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

@lawremi
Copy link

lawremi commented Jan 11, 2018

There is a lot of low-hanging fruit for optimization and cleanup. The vignette needs to be developed: what is the tRNA-SE format, where is it encountered, what are some typical things people do with the data once they've imported it? The vignette section on the "Literature" seems to contain the wrong content.

@FelixErnst
Copy link
Author

FelixErnst commented Jan 12, 2018

@lawremi I added some more detailed information in the vignette. Could you specify, what you mean with cleanup?

Apart from the little example case scenario, which is now part of the vignette, a typical usage case will be part of a follow up package, but I wanted to split this functionality since it is straight forward and contained.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

be41728 version bump

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details.

@bioc-issue-bot bioc-issue-bot added OK and removed ERROR labels Jan 12, 2018
@lawremi
Copy link

lawremi commented Jan 12, 2018

By cleanup I mean the parser code. It should be vectorized, rather than looping over rows and blocks in R.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

abcc7b8 vectorized text block parsing

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details.

@FelixErnst
Copy link
Author

Thanks for insisting. I previously didn't come up with a vectorized solution. Learned something :)

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

bdc433b version bump
f543b13 Merge branch 'master' of https://github.com/FelixE...
ed62486 version bump

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

adac41a version bump

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

c8d668d version bump

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

3697940 version bump

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

ed0d0e0 added tRNAscan2GFF function to produce gff3 compli...
6272eb6 removed clashes with gff3 defined columns

@FelixErnst
Copy link
Author

I added some visualization to the vignette and removed some clashes with gff3 naming conventions. Currently I don't have anything more to add.

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details.

@FelixErnst
Copy link
Author

FelixErnst commented Jan 25, 2018

@hpages @lawremi
Hi. just wanted to ask if you have any additional points I should take care of. Thanks

@hpages
Copy link
Contributor

hpages commented Jan 26, 2018

Hi @FelixErnst ,
I started to look at tRNAscan2GRanges and will provide some feedback here today.
H.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

04bf25f corrected spelling mistakes in the vignette

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

b3a1fb2 - vignette text changes - fixed spelling mistakes

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details.

@FelixErnst
Copy link
Author

Hi @hpages, any news?

@FelixErnst
Copy link
Author

Hi @hpages
I am sorry to bother you again. Is there any feedback you can provide? Thanks!
Felix

@hpages
Copy link
Contributor

hpages commented Feb 10, 2018

Hi @FelixErnst,

I got distracted, sorry.

I took a first look at tRNAscan2GRanges and have some feedback for you. See below.

Please let me know or ask on the bioc-devel mailing list if you have any question or concern about this.

Thanks,
H.

  1. Interface design and naming:

    • tRNAscan2GFF() does something like some_transformation(tRNAscan2GRanges()) i.e. it transforms the GRanges object returned by tRNAscan2GRanges() before returning it to the user. Have you considered making this transformation available as a separate function? This would allow the user who called tRNAscan2GRanges() to switch to the GFF3-compliant format without having to reload the data from the file, which could be costly if the file is big. Also, in contexts where the original file is not accessible anymore, one could still transform an object that was previously loaded with tRNAscan2GRanges().

    • The functionality provided by tRNAscan2GFF() could be merged into tRNAscan2GRanges() by adding a toggle to the former e.g. an as.GFF argument (FALSE by default).

    • The name of the tRNAscan2GRanges() function makes it sound like it performs some kind of coercion when it's in fact an I/O function. read_tRNAscan() or import_tRNAscan() would be better names for it. Or read_tRNAscanAsGRanges() if you want the name to convey the type of object that is returned (but is that important?)

    • Now that the tRNAscan2GFF name is available, it could be used for the function that transforms the GRanges object returned by read_tRNAscan() to the GFF3-compliant format.

    • To summarize, read_tRNAscan() would do:

        tRNAscan <- .read_tRNAscan_from_file(...)  # Does what current tRNAscan2GRanges()
                                                   # does but is now internal
        if (as.GFF)
            tRNAscan <- tRNAscan2GFF(tRNAscan)
        return(tRNAscan)
      
    • So the user still sees 2 functions, read_tRNAscan() and tRNAscan2GFF(), but now they provide orthogonal functionalities and they also have better names.

    • Since the tRNAscan2GRanges package was named after its central function, it feels like maybe it should also be renamed. One obstacle though is that underscores are not allowed in package names, and readtRNAscan is not very readable. So what about something like tRNAscanImporter or tRNAscanReader?

    "There are only two hard things in Computer Science: cache invalidation and naming things."
    http://www.meerkat.com/2017/12/naming-things-hard/

  2. The automatic names on the returned GRanges object are not helpful or meaningful so should be removed.

  3. Use the right storage for the right type of data:

    • Metadata columns phase, tRNA_length, tRNA_anticodon.start, tRNA_anticodon.end, tRNAscan_intron.start, tRNAscan_intron.end, tRNAscan_intron.locstart, and tRNAscan_intron.locend, should be integer vectors (so with NAs for missing values).

    • The source and type metadata columns should be factors.

    • The score metadata column should be a numeric column (so with NAs instead of dots for missing scores).

    • Note that this will bring the GFF3-compliant GRanges object closer to what rtracklayer::import.gff3() returns. It's actually interesting to compare the object one gets after exporting and re-importing it (with rtracklayer::export.gff3() / rtracklayer::import.gff3()) with the original object.

  4. Coding:

    • Something already mentioned by Michael is that your code could avoid loops and be more efficient by taking advantage of the fact that many things in R are vectorized and fast. For example:

        ## Instead of:
        vapply(intron_length, function(l) if (l > 0) {l + 1} else 0, numeric(1))
        ## do:
        ifelse(intron_length > 0, intron_length + 1, 0)
        ## which is much faster.
      
    • Don't place curly braces around the call to standardGeneric() in the definition of your S4 generics. This makes them look as non-standard generic functions. I don't think that's what you want.

    • Please do not use :: inside your own package for calling things defined inside it. This is not needed and is actually discouraged.

  5. When opening a man page with ? in a terminal, text is not properly formatted which makes it hard to read e.g.:

         Description:
         
           ‘gettRNAscanSummary()’: creates an DataFrame with aggregates of
           information ‘plottRNAscanSummary()’: If ggplot2 is installed a
           plot of the data is generated
         
         ...
         
         Value:
         
           ‘gettRNAscanSummary()’: DataFrame ‘plottRNAscanSummary()’: list of
           ggplots per column of data
    
    • The plottRNAscanSummary() example should really show a plot. Right now it doesn't.
  6. Vignette:

    • "Importing as GRanges" would be more accurate than "Converting to GRanges".

    • Explicitly do library(Biostrings) before using its functionalities (doing library(rtracklayer) does not attach Biostrings to the search() path).

    • Once you've taken care of explicitly calling library() on each package you're going to use, you don't need (and should not) use fully qualified names like Biostrings::writeXStringSet. They're distracting in a vignette. Same thing for the man pages.

    • When opening the HTML vignette in a browser, the output of head(gr, 2) is too wide and doesn't fit in the grey box so is wrapped. Making the browser window wider doesn't help (it doesn't make the body of the vignette wider, which is unfortunate). Maybe report this issue or ask for help to the BiocStyle people: https://github.com/Bioconductor/BiocStyle

@FelixErnst
Copy link
Author

@hpages Thanks for the feedback.

In any case: After renaming the package, the github URL would change. Would I need to resubmit it under the new name?

@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

64b9620 restructuring based on BioC submission comments

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

@bioc-issue-bot bioc-issue-bot added ERROR and removed OK labels Feb 10, 2018
@bioc-issue-bot
Copy link
Collaborator

Received a valid push; starting a build. Commits are:

f6bd49e BiocCheck fixes

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

@FelixErnst
Copy link
Author

First things first: Thanks again for the feedback. First submissions are always tricky and the feedback is so important to learn from ones "mistakes".

  1. I renamed the package to tRNAscanImport and the main function import.tRNAscanAsGRanges analogous to import.gff3 from the rtracklayer package . As suggested a direct conversion to a gff3 compatible GRanges object can be triggered by setting as.gff3 = TRUE and tRNAscan2GFF expects a GRanges as input, so it can be converted from compatible input independently of the origin of the GRanges object.

  2. Do you mean the ID column? That is a bit of functionality, which one has to have, since tRNA have sometimes multiple copies per chromosome. I followed the naming convention SGD uses. In my opinion, that is the most robust naming scheme, to ensure unique IDs.

  3. Done on the types. Apart from the numeric/integer values, the export.gff3/import.gff3 looks much alike the output of tRNAscan2GFF.

  4. Done

  5. Done

  6. Regarding the width issue: I will open an issue one the BiocStyle repo. The reason for the wrapping probably is, that the character vector in tRNA_str does not have a space in in it. So I don't know, how that can be resolved in BiocStyle. It could be solved by extending the XStringSet for something like DotBStringSet (dot bracket annotation of nucleotide secondary structure), which would than behave like DNAStringSet masking longer sequences.

Since the build broke because of some path not found, I assume it was because of the renaming. The Webhook also does not work anymore, since the BioC build system does not know about the package with the new name. ("Sorry, you haven't told us about this repository, please go to https://github.com/Bioconductor/Contributions/issues/new .")

I opened another issue with renamed package: #645
Is that the right procedure or should/can I do something else?

@hpages
Copy link
Contributor

hpages commented Feb 10, 2018

Yes, opening a new issue is the right procedure. I'll close this one and follow-up on the new one. Thanks!

@FelixErnst FelixErnst mentioned this issue Feb 23, 2018
8 tasks
@lshep lshep removed the 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place label Mar 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants