Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExperimentHubData:::makeExperimentHubMetadata should determine package name from DESCRIPTION #1

Closed
LTLA opened this issue Jan 9, 2021 · 9 comments

Comments

@LTLA
Copy link

LTLA commented Jan 9, 2021

Currently, it seems to use the pathToPackage argument to determine the package name. This leads to stuff like:

ExperimentHubData:::makeExperimentHubMetadata(".", "2.6.0/metadata-giladi-hsc.csv")
## missing or NA values for 'Coordinate_1_based set to TRUE'
## Loading valid species information.
## Error in AnnotationHubData:::.readMetadataFromCsv(pathToPackage, fileName = fileName) : 
##   RDataPath must start with package name: .

If you have the pathToPackage, you should be able to read.dcf the DESCRIPTION to get the actual package name.

To be honest, it would be even better if I could just supply the path to the CSV file, and the function would automatically backtrack through the directories to determine the package root. The current design is annoying as it precludes autocompletion of the path.

@lshep
Copy link
Contributor

lshep commented Jan 10, 2021

Except not everyone stores files on our S3 and making conventions aren't always user determined. We make this constraint to organize data but hosted elsewhere may not. Also, we still wouldn't necessarily know the subdirectory structure determined by the user when uploading the data

@LTLA
Copy link
Author

LTLA commented Jan 10, 2021

I don't understand; this has nothing to do with how the user uploads the data, but on how the metadata files within the R package are validated. For example, ExperimentHubData:::makeExperimentHubMetadata calls AnnotationHubData:::.readMetadataFromCsv, and this already assumes an R package-like subdirectory structure - for example:

https://github.com/Bioconductor/AnnotationHubData/blob/a6ef95239a917b9defeedfc4be8d457a43563173/R/AnnotationHubMetadata-class.R#L36

So if you're going to do that already, why not have

package <- basename(pathToPackage)

be a bit more intelligent and do something like:

package <- unname(read.dcf(file.path(pathToPackage, "DESCRIPTION"))[,"Package"])

@lshep
Copy link
Contributor

lshep commented Jan 10, 2021

The error you originally referred to had to do with the validation of the rdatapath in the metadata. So your saying you had the package name in the rdatapath of the metadata but because you validated with a ".", It produced an error?

@lshep
Copy link
Contributor

lshep commented Jan 10, 2021

Three rdatapath in the metadata directly relates to how the file is find on S3 so it is related to the data upload.

@LTLA
Copy link
Author

LTLA commented Jan 11, 2021

So your saying you had the package name in the rdatapath of the metadata but because you validated with a ".", It produced an error?

Yes. The key point I am trying to make is that the function should not use the basename of the path to the package on the user's file system as the package name. There is no guarantee that will be the case; for example, my directories are organized like:

scRNAseq/
    release/ # BioC-release
    devel/ # BioC-devel
    package/ # master branch
    Checks/

So there is no way to run makeExperimentHubMetadata because none of the basenames actually end with the package name. I had to do the workaround of creating a symbolic link just to keep the function happy:

scRNAseq/
    release/ # BioC-release
    devel/ # BioC-devel
    package/ # master branch
    Checks/
    scRNAseq/ # -> package/

This would not be necessary if the function just extracted the package name from file.path(pathToPackage, "DESCRIPTION").

Three rdatapath in the metadata directly relates to how the file is find on S3 so it is related to the data upload.

I have no issue with the inclusion of the package name in the RDataPath. It's just the behavior of this function that is causing me problems. You might say that I shouldn't be messing around with ::: internals, but the EHub upload instructions say otherwise.

@lshep
Copy link
Contributor

lshep commented Jan 11, 2021

Ok. Yes, now I see your point and agree it should be looked at. Thank you for the extra explanation.

@lshep
Copy link
Contributor

lshep commented Feb 1, 2021

You should now be able to run makeExpeirmentHubData on "." and it will validate the package name appropriately based on the DESCRIPTION:: package. Let me know if you have any further troubles.

@lshep lshep closed this as completed Feb 1, 2021
@lshep
Copy link
Contributor

lshep commented Feb 1, 2021

Not propagated yet -- but with most recent versions of AnnotationHubData: 1.21.3 and ExperimentHubData: 1.17.1

@LTLA
Copy link
Author

LTLA commented Feb 1, 2021

Okay great. I'll test this out with the next batch of uploads in scRNAseq.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants