Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(inactive) rd4 initial submission #2732

Closed
10 tasks done
pamelarussell opened this issue Jul 28, 2022 · 40 comments
Closed
10 tasks done

(inactive) rd4 initial submission #2732

pamelarussell opened this issue Jul 28, 2022 · 40 comments
Assignees
Labels
3c. inactive no activity from submitter for extended period of time and package not in a place to be accepted WARNINGS

Comments

@pamelarussell
Copy link

Update the following URL to point to the GitHub repository of
the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

  • I understand that by submitting my package to Bioconductor,
    the package source and all review commentary are visible to the
    general public.

  • I have read the Bioconductor Package Submission
    instructions. My package is consistent with the Bioconductor
    Package Guidelines.

  • I understand Bioconductor Package Naming Policy and acknowledge
    Bioconductor may retain use of package name.

  • I understand that a minimum requirement for package acceptance
    is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS.
    Passing these checks does not result in automatic acceptance. The
    package will then undergo a formal review and recommendations for
    acceptance regarding other Bioconductor standards will be addressed.

  • My package addresses statistical or bioinformatic issues related
    to the analysis and comprehension of high throughput genomic data.

  • I am committed to the long-term maintenance of my package. This
    includes monitoring the support site for issues that users may
    have, subscribing to the bioc-devel mailing list to stay aware
    of developments in the Bioconductor community, responding promptly
    to requests for updates from the Core team in response to changes in
    R or underlying software.

  • I am familiar with the Bioconductor code of conduct and
    agree to abide by it.

I am familiar with the essential aspects of Bioconductor software
management, including:

  • The 'devel' branch for new packages and features.
  • The stable 'release' branch, made available every six
    months, for bug fixes.
  • Bioconductor version control using Git
    (optionally via GitHub).

For questions/help about the submission process, including questions about
the output of the automatic reports generated by the SPB (Single Package
Builder), please use the #package-submission channel of our Community Slack.
Follow the link on the home page of the Bioconductor website to sign up.

@bioc-issue-bot
Copy link
Collaborator

Hi @pamelarussell

Thanks for submitting your package. We are taking a quick
look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: rd4
Title: R Bindings for Reading and Querying Dense Depth Data Dump (D4) Files
Version: 0.99.0
Authors@R: c(
        person("Pamela", "Russell", , 
     "pam@fulcrumgenomics.com", role = c("aut", "cre")), 
        person("Seth", "Stadick", , "sstadick@gmail.com", role = c("aut"))
    )
Description: The Dense Depth Data Dump (D4) format provides fast analysis and 
    compact storage of quantitative genomics datasets. rd4 provides R bindings 
    for reading and querying D4 files. For full details on the format, see Hou 
    et al. (https://doi.org/10.1038/s43588-021-00085-0).
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.0
Depends:
    GenomicRanges,
    wrapr
Suggests: 
    knitr,
    rmarkdown,
    data.table,
    testthat (>= 3.0.0)
SystemRequirements: GNU make
biocViews: DataRepresentation, DataImport, Sequencing, Coverage
VignetteBuilder: knitr

@bioc-issue-bot bioc-issue-bot added the 1. awaiting moderation submitted and waiting clearance to access resources label Jul 28, 2022
@vjcitn
Copy link
Collaborator

vjcitn commented Aug 4, 2022

Is this really necessary when using/checking the package:

* building 'rd4_0.99.0.tar.gz'

1/11 packages newly attached/loaded, see sessionInfo() for details.
* installing to library '/home/stvjc/R-4-2-bc316-dist/lib/R/library'
* installing *source* package 'rd4' ...
** using staged installation
** libs
rm -Rf rd4.so ./rust/target/release/librd4.a entrypoint.o
gcc -I"/home/stvjc/R-4-2-bc316-dist/lib/R/include" -DNDEBUG   -I/usr/local/include   -fpic  -g -O2  -c entrypoint.c -o entrypoint.o
# In some environments, ~/.cargo/bin might not be included in PATH, so we need
# to set it here to ensure cargo can be invoked. It is appended to PATH and
# therefore is only used if cargo is absent from the user's PATH.
export PATH="/home/stvjc/perl5/bin:/home/stvjc/GCLOUD/google-cloud-sdk/bin:/home/stvjc/.local/bin:/home/stvjc/bin:/home/stvjc/perl5/bin:/home/stvjc/GCLOUD/google-cloud-sdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/stvjc/.dotnet/tools:/home/stvjc/bin:/home/stvjc/.local/bin:/home/stvjc/bin:/home/stvjc/.local/bin:/home/stvjc/.cargo/bin" && \
	cargo build --lib --release --manifest-path=./rust/Cargo.toml --target-dir ./rust/target
   Compiling libc v0.2.127
   Compiling autocfg v1.1.0
   Compiling cfg-if v1.0.0
   Compiling once_cell v1.13.0
   Compiling proc-macro2 v1.0.43
   Compiling unicode-ident v1.0.3
   Compiling quote v1.0.21
   Compiling log v0.4.17
   Compiling memchr v2.5.0
...

I like the rest of the dev core have no experience with rust but I would hope that one could configure
the system in a reasonably persistent way for repeated use of the package. Is this not the case?

@pamelarussell
Copy link
Author

Hi @vjcitn, this comes from code that is auto-generated by rextendr and only executes during rd4 installation. This adds cargo to the PATH (if it isn't already, which is not typical) within the active shell so required crates can be fetched. The modified PATH does not persist outside that shell. Hope this answers the concern; please let me know if not.

@jwokaty
Copy link

jwokaty commented Aug 12, 2022

@vjcitn I'll add rust to the linux builder and see how it goes. I'll follow up when it's done.

@jwokaty
Copy link

jwokaty commented Aug 12, 2022

Rust is installed. That was pretty quick! I didn't do any special configurations, though.

@vjcitn
Copy link
Collaborator

vjcitn commented Aug 12, 2022

That's great @jwokaty . Can we also see if rextendr from CRAN will install? I will continue working on this submission, it is not ready to go into build system but if rextendr installs ok we will be ready.

@jwokaty
Copy link

jwokaty commented Aug 12, 2022

rextendr installed without issue.

@vjcitn
Copy link
Collaborator

vjcitn commented Aug 13, 2022

@vjcitn vjcitn added the 3e. pending pre-review changes review has indicated blocking concern that needs attention label Aug 13, 2022
@pamelarussell
Copy link
Author

Thanks @vjcitn , I have updated the vignette per the guidelines.

@vjcitn vjcitn added pre-check passed pre-review performed and ready to be added to git and removed 3e. pending pre-review changes review has indicated blocking concern that needs attention labels Aug 15, 2022
@hpages
Copy link
Contributor

hpages commented Aug 24, 2022

Hi @pamelarussell ,

rd4 requires Rust and Cargo to be present on the system so this needs to be reported in the DESCRIPTION file in the SystemRequirements field. On Ubuntu these can easily be installed with sudo apt-get install rustc cargo so we will do this on our Linux builder.

In addition to listing the external system requirements in the SystemRequirements field, we ask package contributors to add a small INSTALL file in the top-level folder of their package where they explain how to install those system requirements for the various OS's that they wish to support (typically Debian-based and RedHat-based systems, Windows, and Mac). Ideally the file should provide simple commands that the user can copy/paste on their machine to do the job (e.g. sudo apt-get install rustc cargo on a Debian-based system).

Thanks,
H.

@hpages
Copy link
Contributor

hpages commented Aug 24, 2022

@vjcitn @jwokaty It doesn't seem that rd4 actually depends or needs CRAN package rextendr. My understanding is that rextendr was used at some point by the developers of rd4 to generate static R code that is included in the package. A little bit in the same way that developers use Autoconf to generate a configure file. Not something that the end user needs.

@hpages
Copy link
Contributor

hpages commented Aug 24, 2022

@pamelarussell

The src/python/requirements.txt file in your package contains the following:

numpy==1.22.3
pyd4==0.3.6

Does this mean that rd4 also requires Python and those Python modules to be available on the user machine? If that's the case then these requirements should also be listed in SystemRequirements.

Thanks,
H.

@pamelarussell
Copy link
Author

@hpages thanks, I have updated SystemRequirements and added INSTALL. I have also removed src/python which is not needed.

You are correct that rextendr was used in the development process and is not needed for package installation.

@hpages
Copy link
Contributor

hpages commented Aug 24, 2022

Thanks! Please make sure to remove any material that is not needed from the package.

@pamelarussell
Copy link
Author

@hpages I have gone through again and there is no other unneeded material in the repo.

@bioc-issue-bot
Copy link
Collaborator

A reviewer has been assigned to your package. Learn what to expect
during the review process.

IMPORTANT: Please read this documentation for setting
up remotes to push to git.bioconductor.org. It is required to push a
version bump to git.bioconductor.org to trigger a new build.

Bioconductor utilized your github ssh-keys for git.bioconductor.org
access. To manage keys and future access you may want to active your
Bioconductor Git Credentials Account

@bioc-issue-bot bioc-issue-bot added 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place and removed 1. awaiting moderation submitted and waiting clearance to access resources pre-check passed pre-review performed and ready to be added to git labels Aug 26, 2022
@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, skipped".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/rd4 to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@jwokaty
Copy link

jwokaty commented Aug 26, 2022

I'm looking into this.

@jwokaty
Copy link

jwokaty commented Aug 26, 2022

@pamelarussell Can I ask you to change your INSTALL such that the installation instructions use sudo apt-get install rustc cargo? I installed this way and was able to install without issue. We also typically install with apt-get, so we recommend this method instead.

# Results from nebbiolo2:
R CMD build --keep-empty-dirs --no-resave-data rd4
* checking for file ‘rd4/DESCRIPTION’ ... OK
* preparing ‘rd4’:
* checking DESCRIPTION meta-information ... OK
* cleaning src
* installing the package to build vignettes
* creating vignettes ... OK
* cleaning src
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building ‘rd4_0.99.0.tar.gz’

@hpages
Copy link
Contributor

hpages commented Aug 26, 2022

@pamelarussell As mentioned earlier people should preferably use sudo apt-get install rustc cargo on an Ubuntu/Debian system. Also please provide the yum command that they should use on an RPM-based Linux system. Once you're done, please bump the version of the package to trigger a new build. Thanks!

@pamelarussell
Copy link
Author

@hpages I have made these updates and bumped the version. For RPM, I have recapitulated the recommendation to use yum update -y to update the system followed by the curl-based rustup method.

@hpages
Copy link
Contributor

hpages commented Aug 29, 2022

Hi @pamelarussell,

  1. Please provide the exact yum command that people should use to install Rust and Cargo on an RPM-based Linux system

    You'll need to figure out the names of the corresponding rpm packages. Like apt-get on Debian-based Linux systems, yum is the standard/uniform way to install additional software on an RPM-based Linux system. This is by far the easiest/cleanest/safest way to add or remove software so is what everybody should preferably use unless there is a good reason to use something else. Besides, what would be the point of updating the database of available software with yum update -y if yum is not going to be used to install the software?

  2. The same applies to GNU Make (note the proper capitalization)

    Nobody should ever need to manually download/compile/install GNU Make from a source tarball obtained at https://ftp.gnu.org/gnu/make/. Instead, people should always use the version provided with their system. On Linux, you can safely assume that people using R/Bioconductor already have it (like they already have a C/C++ compiler), otherwise they wouldn't be able to install most CRAN or Bioconductor packages. Anyways, it seems that the only reason GNU Make is listed in SystemRequirements: is to avoid an R CMD check warning that otherwise would be triggered because some Makefiles that get processed during installation use GNU extensions. For this particular case, you don't need to provide detailed installation instructions in your INSTALL file. However, if you really want to show people how to get GNU Make, then please provide the standard command (i.e. apt-get install make or yum install make). On Mac, people should obtain the make command by installing Xcode or the Apple's Command Line Tools (a subset of Xcode).

  3. Simplify the package installation process

    When I install the package I see the following output:

    * installing to library ‘/home/hpages/R/R-4.2.r82318/library’
    * installing *source* package ‘rd4’ ...
    ** using staged installation
    ** libs
    rm -Rf rd4.so ./rust/target/release/librd4.a entrypoint.o
    /usr/bin/gcc -I"/home/hpages/R/R-4.2.r82318/include" -DNDEBUG   -I/usr/local/include   -fpic  -O3 -c entrypoint.c -o entrypoint.o
    # In some environments, ~/.cargo/bin might not be included in PATH, so we need
    # to set it here to ensure cargo can be invoked. It is appended to PATH and
    # therefore is only used if cargo is absent from the user's PATH.
    export PATH="/home/hpages/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin:/home/hpages/.cargo/bin" && \
    cargo build --lib --release --manifest-path=./rust/Cargo.toml --target-dir ./rust/target
        Updating git repository `https://github.com/38/d4-format`
        Updating git repository `https://github.com/extendr/extendr`
        Updating crates.io index
      Downloaded futures-task v0.3.24
      Downloaded socket2 v0.4.6
      Downloaded futures-util v0.3.24
      Downloaded futures-channel v0.3.24
      Downloaded futures-sink v0.3.24
      Downloaded futures-io v0.3.24
      Downloaded futures-core v0.3.24
      Downloaded 7 crates (275.6 KB) in 1.13s
       Compiling libc v0.2.132
       Compiling autocfg v1.1.0
       Compiling cfg-if v1.0.0
       Compiling once_cell v1.13.1
       Compiling proc-macro2 v1.0.43
       Compiling unicode-ident v1.0.3
       Compiling quote v1.0.21
       Compiling log v0.4.17
       Compiling memchr v2.5.0
       Compiling cc v1.0.73
       Compiling pin-project-lite v0.2.9
       Compiling syn v1.0.99
       Compiling bytes v1.2.1
       Compiling futures-core v0.3.24
       Compiling itoa v1.0.3
       Compiling glob v0.3.0
       Compiling version_check v0.9.4
       Compiling untrusted v0.7.1
       Compiling spin v0.5.2
       Compiling crossbeam-utils v0.8.11
       Compiling futures-task v0.3.24
       Compiling fnv v1.0.7
       Compiling quick-error v1.2.3
       Compiling unicode-width v0.1.9
       Compiling regex-syntax v0.6.27
       Compiling futures-util v0.3.24
       Compiling serde_derive v1.0.144
       Compiling serde v1.0.144
       Compiling bindgen v0.55.1
       Compiling strsim v0.8.0
       Compiling hashbrown v0.12.3
       Compiling vec_map v0.8.2
       Compiling httparse v1.7.1
       Compiling futures-sink v0.3.24
       Compiling rustls v0.20.6
       Compiling futures-io v0.3.24
       Compiling libR-sys v0.3.0
       Compiling ansi_term v0.12.1
       Compiling futures-channel v0.3.24
       Compiling matches v0.1.9
       Compiling tinyvec_macros v0.1.0
       Compiling pin-utils v0.1.0
       Compiling termcolor v1.1.3
       Compiling bitflags v1.3.2
       Compiling rayon-core v1.9.3
       Compiling cfg-if v0.1.10
       Compiling lazy_static v1.4.0
       Compiling percent-encoding v2.1.0
       Compiling rustc-hash v1.1.0
       Compiling lazycell v1.3.0
       Compiling scopeguard v1.1.0
       Compiling peeking_take_while v0.1.2
       Compiling shlex v0.1.1
       Compiling try-lock v0.2.3
       Compiling httpdate v1.0.2
       Compiling crc32fast v1.3.2
       Compiling encoding_rs v0.8.31
       Compiling pkg-config v0.3.25
       Compiling tower-service v0.3.2
       Compiling unicode-bidi v0.3.8
       Compiling ryu v1.0.11
       Compiling adler v1.0.2
       Compiling base64 v0.13.0
       Compiling ppv-lite86 v0.2.16
       Compiling serde_json v1.0.85
       Compiling extendr-engine v0.3.1 (https://github.com/extendr/extendr?branch=master#c6929ebe)
       Compiling ipnet v2.5.0
       Compiling either v1.8.0
       Compiling extendr-api v0.3.1 (https://github.com/extendr/extendr?branch=master#c6929ebe)
       Compiling mime v0.3.16
       Compiling smallvec v1.9.0
       Compiling paste v1.0.8
       Compiling tracing-core v0.1.29
       Compiling tokio v1.20.1
       Compiling slab v0.4.7
       Compiling memoffset v0.6.5
       Compiling indexmap v1.9.1
       Compiling crossbeam-epoch v0.9.10
       Compiling rayon v1.5.3
       Compiling num-traits v0.2.15
       Compiling libloading v0.7.3
       Compiling http v0.2.8
       Compiling nom v5.1.2
       Compiling humantime v1.3.0
       Compiling textwrap v0.11.0
       Compiling clang-sys v1.3.3
       Compiling ring v0.16.20
       Compiling tinyvec v1.6.0
       Compiling form_urlencoded v1.0.1
       Compiling miniz_oxide v0.5.3
       Compiling rustls-pemfile v1.0.1
       Compiling tracing v0.1.36
       Compiling http-body v0.4.5
       Compiling want v0.3.0
       Compiling unicode-normalization v0.1.21
       Compiling aho-corasick v0.7.18
       Compiling atty v0.2.14
       Compiling which v3.1.1
       Compiling num_cpus v1.13.1
       Compiling socket2 v0.4.6
       Compiling mio v0.8.4
       Compiling getrandom v0.2.7
       Compiling memmap v0.7.0
       Compiling crossbeam-channel v0.5.6
       Compiling flate2 v1.0.24
       Compiling clap v2.34.0
       Compiling idna v0.2.3
       Compiling regex v1.6.0
       Compiling rand_core v0.6.3
       Compiling d4-framefile v0.3.6 (https://github.com/38/d4-format?rev=871de45#871de457)
       Compiling ordered-float v2.10.0
       Compiling url v2.2.2
       Compiling cexpr v0.4.0
       Compiling rand_chacha v0.3.1
       Compiling env_logger v0.7.1
       Compiling crossbeam-deque v0.8.2
       Compiling tokio-util v0.7.3
       Compiling rand v0.8.5
       Compiling webpki v0.22.0
       Compiling sct v0.7.0
       Compiling h2 v0.3.14
       Compiling webpki-roots v0.22.4
       Compiling extendr-macros v0.3.1 (https://github.com/extendr/extendr?branch=master#c6929ebe)
       Compiling tokio-rustls v0.23.4
       Compiling hyper v0.14.20
       Compiling hyper-rustls v0.23.0
       Compiling d4-hts v0.3.6 (https://github.com/38/d4-format?rev=871de45#871de457)
       Compiling serde_urlencoded v0.7.1
       Compiling reqwest v0.11.11
       Compiling d4 v0.3.6 (https://github.com/38/d4-format?rev=871de45#871de457)
       Compiling rd4 v0.1.0 (/home/hpages/pkgreviews/rd4.review/rd4/src/rust)
        Finished release [optimized] target(s) in 2m 07s
    /usr/bin/gcc -shared -L/home/hpages/R/R-4.2.r82318/lib -L/usr/local/lib -o rd4.so entrypoint.o -L./rust/target/release -lrd4 -L/home/hpages/R/R-4.2.r82318/lib -lR
    installing to /home/hpages/R/R-4.2.r82318/library/00LOCK-rd4/00new/rd4/libs
    ** R
    ** inst
    ** byte-compile and prepare package for lazy loading
    ** help
    *** installing help indices
    ** building package indices
    ** installing vignettes
    ** testing if installed package can be loaded from temporary location
    ** checking absolute paths in shared objects and dynamic libraries
    ** testing if installed package can be loaded from final location
    ** testing if installed package keeps a record of temporary installation path
    * DONE (rd4)
    

    That is a lot of stuff that gets downloaded and compiled! Are you sure that all these things are needed for normal/proper use of the package? For example AFAIK extendr was only needed during development to generate static R code so I wonder why it shows up here during a standard end-user install. I also see that core OS components like libc and cc get downloaded and compiled. What for? No package should ever need to do that.

    Please make sure to simplify rd4 installation process so that only what's actually needed gets downloaded/compiled.

Please address and bump the version again (not sure why your earlier version bump didn't trigger a build, make sure to push your changes to git@git.bioconductor.org:packages/rd4 in addition to GitHub).

Thanks,
H.

@vjcitn
Copy link
Collaborator

vjcitn commented Aug 30, 2022

Just a note to confirm that I see the same compilation process, but I did not have any downloads on my installation attempt. Perhaps a cache is checked. Apropos libc and cc, these are name collisions -- not core OS components, but see, e.g., https://crates.io/crates/cc

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: c29f6e0268bb54f9d3eeadcaab208ccbdd7e4664

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/rd4 to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@bioc-issue-bot bioc-issue-bot added OK and removed ERROR labels Aug 31, 2022
@pamelarussell
Copy link
Author

@hpages I have updated the yum commands to install Rust and Cargo per your instructions, and removed install instructions for GNU Make.

A couple of points regarding the installation of multiple Rust crates:

  • The Rust community prefers small, tightly focused crates, hence the large number of transitive dependencies even for a relatively small package.
  • Note the distinction between rextendr, the R package that was used in development and has been removed from DESCRIPTION, vs. extendr, a Rust crate that is still needed for compilation.

@hpages
Copy link
Contributor

hpages commented Sep 8, 2022

Thanks for the clarifications and the changes to the INSTALL file. Looks like we finally got a successful install/build/check on our Linux builder 😃

Note that we strongly advice against installing Bioconductor packages from GitHub as this generally results in "version mismatches" and incompatibilities for people using the current release version of Bioconductor (most people are). This is because installing from GitHub will get them the devel version of a package which has no guarantee to be compatible with the release version of the Bioconductor ecosystem. Please remove section 2.2 from the vignette.

Here is some feedback about the functionality provided by the package:

  1. A print() method for D4Source objects would be nice. Right now, printing such object provides no valuable information about what's in it:

    > d4source
    <pointer: 0x55f05a358370>
    attr(,"class")
    [1] "D4Source"    "D4SourceEnv"
    

    At a very minimum, it could print the path to the file.

  2. Returning the chromosome names and lengths in a nested list is not "the R way":

    > chrs <- get_chroms(d4source)
    > str(chrs[1:3])
    List of 3
     $ :List of 2
      ..$ name: chr "chr1"
      ..$ size: int 248956422
     $ :List of 2
      ..$ name: chr "chr2"
      ..$ size: int 242193529
     $ :List of 2
      ..$ name: chr "chr3"
      ..$ size: int 198295559
    

    For this kind of data "the R way" is to use a named integer vector or a 2-column data.frame (a standard data.frame, not a data.table object). For even better integration to the Bioconductor ecosystem, maybe consider providing a seqinfo() getter that returns the chromosome and genome information in a Seqinfo object.

  3. Please remove reliance on the data.table package. It is not needed.

  4. The use of "0-based, half-open region coordinates" for the low-level D4Source extractors (e.g. query(), mean.D4Source(), median.D4Source()) is very unfortunate. The Bioconductor ecosystem uses the "1-based closed intervals" convention everywhere, as established in the IRanges/GenomicRanges packages. Introducing the "0-based, half-open region coordinates" convention for those extractors will confuse the end user and will be error prone.

  5. The names of the update_*() functions (update_mean, update_median, etc...) is a little bit awkward given what they do. They don't update the mean or median. Instead they compute the mean or median from the supplied D4Source object for each range in the supplied GRanges object, and store the results in a metadata column that gets added to the latter. So what actually gets updated is the GRanges object. Maybe names like update_with_mean, update_with_median, etc... would help convey what these functions really do? In addition, it's generally expected that a function that updates something would take this something as its first argument.

  6. The update_*() functions should store the mean/median/etc... in atomic metadata columns instead of metadata columns of type list:

    > granges
    GRanges object with 2 ranges and 5 metadata columns:
          seqnames            ranges strand |   mean median percentile_25
             <Rle>         <IRanges>  <Rle> | <list> <list>        <list>
      [1]     chr1 17027501-17027600      + |   2.48      0             0
      [2]     chr1 17027701-17027800      + |  42.23     41            37
          percentile_50 percentile_75
                 <list>        <list>
      [1]             0             4
      [2]            41            50
      -------
      seqinfo: 1 sequence from an unspecified genome; no seqlengths
    
  7. Finally how about providing an analog to rtracklayer::import.bed(), rtracklayer::import.bedGraph(), or rtracklayer::import.bw(), for D4 files or D4Source objects? The ability to import the full content of a D4 file as a GRanges object would be incredibly useful as it would make it possible to use a D4 file as a drop-in replacement for bed, bedGraph, or BigWig, in any Bioconductor workflow where these types of files are currently used!

Thanks,
H.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: 96ae1f69200a4d92edf7d16d65172a14a90b0656

@pamelarussell
Copy link
Author

@hpages thanks for the useful feedback. We have addressed all of these in the latest update.

  1. Remove instructions to install from GitHub
  2. print() method for D4Source class
  3. Replace get_chroms() with seqinfo() getter
  4. Remove data.table dependency
  5. Replace 0-based half-open coordinates with 1-based closed coordinates
  6. Rename update_*() methods and move object being updated to first argument
  7. update_*() methods store the metadata in atomic columns instead of list columns
  8. rtracklayer::import.*() analog: to_granges() function to convert D4Source data to a GRanges object

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings
on all platforms.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/rd4 to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@hpages
Copy link
Contributor

hpages commented Sep 16, 2022

Hi @pamelarussell,

Thanks for those changes. print.D4Source is not exported so is ignored when printing a D4Source object:

> print(d4source)
<pointer: 0x55d2175dd2b0>
attr(,"class")
[1] "D4Source"    "D4SourceEnv"

This is an S3 method so it needs to be exported using the following special syntax:

S3method(print, D4Source)

Note that this is what must appear in the NAMESPACE file. Don't know how that translates into roxygen2 syntax (I don't use roxygen2).

Also please consider turning to_granges into an S3 import() method for D4Source objects (import.D4Source()), and try to provide an API that is closer to the import.bed(), import.bedGraph(), and import.bw() methods. In particular, note the which argument provided by those methods to let the user specify which regions to load.

Finally, is the genome/assembly information stored in a D4 file? If so, it should be included in the Seqinfo object returned by seqinfo():

> seqinfo(d4source)
Seqinfo object with 25 sequences from an unspecified genome:
  seqnames seqlengths isCircular genome
  chr1      248956422       <NA>   <NA>
  chr2      242193529       <NA>   <NA>
  chr3      198295559       <NA>   <NA>
  chr4      190214555       <NA>   <NA>
  chr5      181538259       <NA>   <NA>
  ...             ...        ...    ...
  chr21      46709983       <NA>   <NA>
  chr22      50818468       <NA>   <NA>
  chrX      156040895       <NA>   <NA>
  chrY       57227415       <NA>   <NA>
  chrM          16569       <NA>   <NA>

Thanks,
H.

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: 9d2b1af2fb40253a27947f1e9db72582a8c661c9

@pamelarussell
Copy link
Author

@hpages We have pushed updates for these suggestions:

  1. Correctly export print.D4Source(); add to vignette and tests
  2. Replace to_granges() with import(), which has an optional which parameter taking a GRanges or GRangesList object, similar to the import* methods in rtracklayer.
  3. Unfortunately genome information is not stored in the D4 format, so the Seqinfo object cannot capture this.

Thanks!

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details. This link will be active
for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/rd4 to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@bioc-issue-bot bioc-issue-bot added WARNINGS and removed OK labels Oct 4, 2022
@hpages
Copy link
Contributor

hpages commented Oct 5, 2022

Thanks for the changes.

Still a few problems (8. is a major one):

  1. Performance. The example.d4 is very small (510910 bytes, 170 lines) but loading it in R with import() takes more than 3 min on my machine and consumes about 9Gb of memory! This is not good. Such poor performance makes the package unfit for real-world data. Hopefully that's something that can be addressed. Performance would need to be improved by at least an order of magnitude!

  2. Not sure why the to_granges() internal helper has the ellipsis (...) in its argument list. It's not used, not needed, and is just sitting there waiting to bite you or whoever will maintain the package in the future. Try rd4:::to_granges(d4source, "chr1", start=1001, end=999) to see what I mean 😉

  3. import() is an S4 generic function defined in the BiocIO package, with many S4 methods defined in BiocIO and rtracklayer. Defining your own import() function will clash with that i.e. it will break the import() functionality provided by BiocIO and rtracklayer if these packages are loaded before your package, or it will break your import() if they are loaded after:

    library(rd4)
    d4source <- D4Source(system.file('extdata', 'example.d4', package = 'rd4'))
    import(d4source)  # works
    
    library(rtracklayer)
    import(d4source)  # no longer works!
    # Error in (function (classes, fdef, mtable)  : 
    #   unable to find an inherited method for function ‘import’ for signature ‘"D4Source", "missing", "missing"’
    

    As I said previously, you need to provide an import() method for D4Source objects. In 5 steps:

    • Add BiocIO to your Depends field and import the import function from it.
    • Rename your own import function -> import.D4Source.
    • Fix import.D4Source's argument list so that it's compatible with the import() generic function in BiocIO. Concretely, this means that the first 3 arguments of import.D4Source must be con, format, and text. It's ok if you don't support format and text, but you must still have them. Also, if you don't support them, make sure that the function raises an error in case the user supplies them. Don't just ignore them! Finally note that the ellipsis is not needed so please remove it.
    • Define the import() S4 method for D4Source objects with:
      setMethod("import", "D4Source", import.D4Source)
      
    • Export import.D4Source and the import() S4 method for D4Source objects.
  4. Move GenomeInfoDb and GenomicRanges from Imports to Depends. Basic things like seqinfo() on a D4Source object, or strand() on the GRanges object returned by import(), are expected to work out-of-the-box after loading the rd4 package. Right now they don't i.e. the user has to explicitly load GenomeInfoDb and GenomicRanges first. Most users wouldn't even know that they need to do that and would be quite confused!

  5. The seqinfo() method for D4Source objects needs to be exported and documented.

  6. The print() method for D4Source objects needs to print one more new line character (\n). Otherwise the R prompt (>) gets printed at the end of the last line instead of in a new line by itself:

    > d4source
    <D4Source object>
    Source file: /home/hpages/R/R-4.2.r82318/library/rd4/extdata/example.d4
    Number of chromosomes: 25
    Total chromosome length: 3088286401
    Number of tracks: 2> 
    

    See the > at the end of the last line?

  7. These should work (and return an empty GRanges object with the metadata column):

    import(d4source, which=GRanges("chr1", IRanges(1001, 1000)))
    # Error in `[<-`(`*tmp*`, curr_interval_idx, 1, value = curr_start) : 
    #   subscript out of bounds
    
    rd4:::to_granges(d4source, chr="chr1", start=1001, end=1000)
    # Error in `[<-`(`*tmp*`, curr_interval_idx, 1, value = curr_start) : 
    #   subscript out of bounds
    

    This hint from BiocCheck:

    • Checking coding practice...
      • NOTE: Avoid 1:...; use seq_len() or seq_along()

    gives the clue about this bug.

  8. import() needs to do better checking and error reporting w.r.t. to the which argument. For example the error the user gets here is cryptic and not very helpful:

    import(d4source, which=1:10)
    # Error in (function (classes, fdef, mtable)  : 
    #   unable to find an inherited method for function ‘reduce’ for signature ‘"integer"’
    

    Note that the use of tryCatch() here is not good practice. In this situation it's usually better/safer to write code that is explicit about what objects are supported e.g.:

    if (is.null(which)) {
        to_granges(d4source, track = track, colname = colname)
    } else {
        if (is(which, "GRangesList")) {
            which <- unlist(which, use.names=FALSE)
        } else if (!is(which, "GRanges")) {
            stop("'which' must be NULL, or a GRanges or GRangesList object")
        }
        ranges_reduced <- reduce(which)
        ...
    }
    

    This is also consistent with what the man page says about what which should be.

Thanks,
H.

@hpages
Copy link
Contributor

hpages commented Nov 17, 2022

Hi @pamelarussell , are you planning to follow up on this submission? Thanks

@pamelarussell
Copy link
Author

Hi @hpages, thanks for following up. The remainder of the project is being managed by @tfenne and @nh13 as I have moved on to a new role. I'm happy to continue to answer questions here.

@nh13
Copy link

nh13 commented Nov 18, 2022

CC: @arq5x

@hpages
Copy link
Contributor

hpages commented Dec 13, 2022

@pamelarussell @nh13 @arq5x No follow-up so far. I'm closing this submission. Please re-open when you are ready to resume it.

Best,
H.

@hpages hpages closed this as not planned Won't fix, can't repro, duplicate, stale Dec 13, 2022
@bioc-issue-bot bioc-issue-bot removed the 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place label Dec 13, 2022
@hpages hpages added the 3c. inactive no activity from submitter for extended period of time and package not in a place to be accepted label Dec 13, 2022
@bioc-issue-bot
Copy link
Collaborator

This issue is being closed because there has been no progress
for an extended period of time. You may reopen the issue when
you have the time to actively participate in the review /
submission process. Please also keep in mind that a package
accepted to Bioconductor requires a commitment on your part to
ongoing maintenance.

Thank you for your interest in Bioconductor.

@bioc-issue-bot bioc-issue-bot changed the title rd4 initial submission (inactive) rd4 initial submission Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3c. inactive no activity from submitter for extended period of time and package not in a place to be accepted WARNINGS
Projects
None yet
Development

No branches or pull requests

6 participants