Ultimately this project will be installable via pip with the following command. Note that this is not yet implemented.
pip install dlgwas
This project is currently pre-release. Instructions on use will be added in the future.
The files of primary interest in this project are notebooks, external
data, processed data, models, and reports. Additional files are produced
by nbdev
for building documentation, testing, and other tasks. This
project is intended to applicable to multiple organisms. This long term
goal motivates encapsulating data relevant to a species in a species
subfolder even though only Zea mays is considered at present.
Below is an outline to illustrate the project’s target structure.
- Notebooks (
nbs
)- Where possible, analysis will be done in jupyter notebooks.
- Notebooks are named in snake case with
- The expected run order
- The species if applicable (using KEGG naming conventions)
- A brief description
- Duplicate notebook numbers are allowed for now (e.g. 01_zma_kegg_download and 01_taes_kegg_download) for parallel tasks but reserving a block of notebook numbers may end up being better (e.g. 01-10 for zma, 11-20 for taes).
- External Data (
ext_data
)- Subfolders for different species (Arabidopsis
ath
, Wheattaes
, and Maizezma
shown here) - Subfolders may contain data from public databases (cyverse, panzea, kegg, etc.) or data from specific studies.
- Study data should be named according to the citation
(e.g.
buckler_et_al_2009
) rather than the repository that the data is stored in (e.g. figshare, zenodo, etc.).
- Subfolders for different species (Arabidopsis
- Data (
data
)- Cleaned or otherwise transformed data from
ext_data
should be kept here. - Computational artifacts (e.g. pickled objects) that are expensive to recompute should also be stored here.
- Data storage isn’t set yet. It will either aim:
- To make the origin of produced objects clear, folders will have names matching the notebooks that created them.
- To make the use of produced objects easy, the folders will have
names matching those in
ext_data
.
- Cleaned or otherwise transformed data from
- Models (
models
)- Computationally expensive models are to be saved here.
- Folders will have names matching the notebooks that created them
- Reports (
reports
)- Figures, tables, and other human readable artifacts are to be stored here.
- Folders will have names matching the notebooks that created them
Illustrative Directory Structure:
.
├── nbs
│ ├── 00_core.ipynb
│ ├── 01_zma_kegg_download.ipynb
│ ├── ...
│ └── index.ipynb
├── ext_data
│ ├── ath
│ ├── taes
│ └── zma
│ ├── buckler_et_al_2009
│ ├── e2p2_computed
│ ├── ensemble
│ ├── kegg
│ ├── panzea
│ └── plant_reactome
├── data
│ ├── zma ?
│ │ └── ...
│ └── 01_zma_kegg_download ?
│ └── ...
├── models
│ └── 01_zma_kegg_download
│ └── ...
└── reports
└── 01_zma_kegg_download
└── ...