Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copying package data, etc. to official server repo #24

Closed
anngvu opened this issue Jun 7, 2019 · 1 comment
Closed

copying package data, etc. to official server repo #24

anngvu opened this issue Jun 7, 2019 · 1 comment

Comments

@anngvu
Copy link
Owner

anngvu commented Jun 7, 2019

Currently, the app directory is in the package, so once the package is installed the app can be served by shiny::runApp(system.file(package = "DIVE"). The app uses the data that comes with the package; some of this data needs to be able to be updated from the curator module, but package data is usually updated by updating the package and it's not good practice to overwrite the data.

Relevant threads:
https://stackoverflow.com/questions/4018519/update-the-dataset-in-an-installed-package
https://stackoverflow.com/questions/14711277/modifying-r-package-data

So all data in a package needs to be copied to a repo where it data can be read/re-written. The data in package will serve as the last official version.

@anngvu
Copy link
Owner Author

anngvu commented Aug 30, 2019

Refinement to issue definition: It's not a good idea to include all the primary data as part of the package considering the following advice.

What I would try to do is to put a mechanism to acquire this data in the package, but separate the (changing ?) data from the code.
Packages are not first and foremost a means to direct data acquisition, in particular for changing data sets. Most packages include fixed data to demonstrate or illustrate a method or implementation.

Thus, there are real scalability issues.

  1. The data will eventually scale up beyond the typical idea of data that can be included in an R package. Though there are data packages with large datasets, i.e. https://github.com/waldronlab/curatedTCGAData, these types of packages won't eventually double the number of high-throughput data.

  2. So far including the batch-curated data has been OK to show how to implementation of certain modules would work, but in the future data updates will happen relatively frequently and also force package updates.

@anngvu anngvu closed this as completed Dec 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant