Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEV/TST Unbundle and remove data files #336

Closed
drewejohnson opened this issue Sep 8, 2019 · 1 comment
Closed

DEV/TST Unbundle and remove data files #336

drewejohnson opened this issue Sep 8, 2019 · 1 comment
Labels
developer Developer-focused changes testing Related to our test suite
Projects
Milestone

Comments

@drewejohnson
Copy link
Collaborator

Currently, all of our test files and example files are distributed with the python package

serpent-tools/setup.py

Lines 63 to 66 in 88b5707

'package_data': {
'serpentTools.data': ['data/{}'.format(ext) for ext in DATA_EXTS],
},
'include_package_data': True,

meaning people receive a lot of matlab files that they may never need with every version. Bundling these files inside the python package really only serves to make the example notebooks work well, through the serpentTools.data.getFile function. However, since the matlab files have a lot of lines, this has the effect of making this project appear to those on GitHub that we are a Matlab project. This is more a personal annoyance, but the main issue is that we do not need to bundle these files inside the python package.

I propose that we remove these files from the python package and from the repository entirely. I propose we do this by making a new repository CORE-GATECH-GROUP/serpent-data or something like that, and include this as a git submodule. This is not unheard of for python packages. The seaborn visualization package uses the seaborn-data repository to fetch files for examples. Although seaborn downloads the files with searborn.load_dataset into a dedicated directory rather than using a submodule.

After adding this new repository as a git submodule, we will have access to all the data files, just in a different directory. This keeps the content of this repository strictly focused on the python side, while requiring developers to fetch a submodule for full testing. Thankfully git makes this pretty easy; developers can obtain submodules during the cloning process:

$ git clone --recurse-submodules

or, for existing developers who already have cloned serpentTools:

$ git submodule init
$ git submodule update

Impact

The serpentTools.data module will be removed. This is heavily used in our testing and in the example notebooks. We can use a pytest fixture to ensure that all the files are loaded, e.g. the developer has fetched the submodule and obtained the correct test files. We can fix the notebooks by reading the files as if they appeared in the working directory and using serpentTools.read on them. This will clear up some of the prefacing we include in our example notebooks telling people not to use serpentTools.readDataFile.

For travis testing, we can simply link the files from the submodule into the temporary directory so that commands like serpentTools.read("example_det0.m") will be valid.

We should also add some information in our developer guide on adding new test files and updating this submodule. Since we nearly have all the readers done, we shouldn't have to add extra files.

The biggest frustration I can see is as follows: I add a new reader that needs a new file. I then have to add the file to serpent-data through a PR, get that PR approved, update the submodule for serpentTools to use the updated serpent-data repository before I submit the new PR so that Travis can use the newest data repository.

Related reading

@drewejohnson drewejohnson added testing Related to our test suite developer Developer-focused changes labels Sep 8, 2019
@drewejohnson drewejohnson added this to To do in Main Sep 8, 2019
@drewejohnson drewejohnson added this to the 0.10.0 milestone Nov 25, 2019
drewejohnson added a commit to drewejohnson/serpent-tools that referenced this issue Jan 17, 2020
The serpentTools.data.getFile and readDataFile functions now
rely on the evnironment variable SERPENT_TOOLS_DATA. This will allow
the files to be stored in any location, including outside the
repository, but more importantly, outside the python package.

Since this change is done inside the getFile function, none of the
tests have to be modified at all. The travis build commands will
have to be slightly modified to use this directory, and those
changes will come shortly.

Notes have been added to the developer guide to indicate the use
of this variable, and how to set it.

Related: GH Issue CORE-GATECH-GROUP#336
@drewejohnson drewejohnson moved this from To do to In progress in Main Jan 17, 2020
@drewejohnson
Copy link
Collaborator Author

Going to consider this closed with #376 and (to a lesser degree) #404. Submodules may be the way to go in the future, but that is yet to be seen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
developer Developer-focused changes testing Related to our test suite
Projects
Main
In progress
Development

No branches or pull requests

1 participant