Skip to content

Fixing package infrastructure and making output portable for amRml#24

Open
amcim wants to merge 3 commits into
mainfrom
remotes-install
Open

Fixing package infrastructure and making output portable for amRml#24
amcim wants to merge 3 commits into
mainfrom
remotes-install

Conversation

@amcim
Copy link
Copy Markdown
Contributor

@amcim amcim commented May 12, 2026

Description

amRdata did not successfully run on a user install. This PR resolves 2 bugs relating to package infrastructure and relativizes the filepath for the duckdb that will be used for amRml to quickly query the parquet output.

What kind of change(s) are included?

  • Added R/imports.R which includes importing the := operator from data.table

  • Added @export flags to 2 functions retrieveMetadata and retrieveGenomes. These functions are being called in the vignette so a user should have access to them

  • Made the parquet-backed DuckDB portable: views now reference parquet files by base filename, and the build sets file_search_path to resolve them during schema inference. amRml must call DBI::dbExecute(con, sprintf("SET file_search_path='%s'", dirname(parquet_duckdb_path))) after dbConnect, and a PR in amRml was made that does precisely that.

  • Feature (adds or updates new capabilities)

  • Bug fix (fixes an issue).

  • Enhancement (adds functionality).

  • Breaking change (these changes would cause existing functionality to not work as expected).

Checklist

Please ensure that all boxes are checked before indicating that this pull request is ready for review.

  • I have read and followed the CONTRIBUTING.md guidelines.
  • I have searched for existing content to ensure this is not a duplicate.
  • I have performed a self-review of these additions (including spelling, grammar, and related).
  • I have added comments to my code to help provide understanding.
  • I have added a test which covers the code changes found within this PR.
  • I have deleted all non-relevant text in this pull request template.
  • Reviewer assignment: Tag a relevant team member to review and approve the changes.

@amcim amcim self-assigned this May 12, 2026
@amcim amcim requested a review from eboyer221 May 14, 2026 22:35
Copy link
Copy Markdown
Contributor

@eboyer221 eboyer221 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amcim I reviewed and pushed a small follow-up commit. The @export tags on retrieveMetadata and retrieveGenomes weren't actually exporting because devtools::document() hadn't been re-run, so I regenerated NAMESPACE. I also added R/imports.R with a roxygen @importFrom tag for data.table :=, otherwise the next document() run would have dropped the importFrom line from NAMESPACE.

One thing still open related to the parquet portability fix: The basename() change drops the absolute paths from the view definitions (good), but file_search_path only sticks while the connection is open, so a fresh dbConnect() doesn't remember it. So when amRml reopens the .duckdb, the views can't find the parquets and it errors out (confirmed with a small repro). I wasn't sure whether this requires additional changes in this PR, especially considering where you are taking the architecture with ExperimentHubs and Nextflow, so I'm just flagging this potential issue here to bring it to your attention in case it does warrant some resolution at this point.

@amcim
Copy link
Copy Markdown
Contributor Author

amcim commented May 15, 2026

@amcim I reviewed and pushed a small follow-up commit. The @export tags on retrieveMetadata and retrieveGenomes weren't actually exporting because devtools::document() hadn't been re-run, so I regenerated NAMESPACE. I also added R/imports.R with a roxygen @importFrom tag for data.table :=, otherwise the next document() run would have dropped the importFrom line from NAMESPACE.

One thing still open related to the parquet portability fix: The basename() change drops the absolute paths from the view definitions (good), but file_search_path only sticks while the connection is open, so a fresh dbConnect() doesn't remember it. So when amRml reopens the .duckdb, the views can't find the parquets and it errors out (confirmed with a small repro). I wasn't sure whether this requires additional changes in this PR, especially considering where you are taking the architecture with ExperimentHubs and Nextflow, so I'm just flagging this potential issue here to bring it to your attention in case it does warrant some resolution at this point.

I forgot to write in the notes of this PR that this PR works together with PR 22 in amRml (I noted this on amRml's side) JRaviLab/amRml#22

In that PR I added 4 lines of code that will resolve this as you say.

Good catch on the @importFrom tag. When the amRml PR is approved ill merge both at the same time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants