dlgwas

Ultimately this project will be installable via pip with the following command. Note that this is not yet implemented.

Install

pip install dlgwas

How to use

This project is currently pre-release. Instructions on use will be added in the future.

Project Structure

The files of primary interest in this project are notebooks, external data, processed data, models, and reports. Additional files are produced by nbdev for building documentation, testing, and other tasks. This project is intended to applicable to multiple organisms. This long term goal motivates encapsulating data relevant to a species in a species subfolder even though only Zea mays is considered at present.

Below is an outline to illustrate the project’s target structure.

Notebooks (nbs)
- Where possible, analysis will be done in jupyter notebooks.
- Notebooks are named in snake case with
  1. The expected run order
  2. The species if applicable (using KEGG naming conventions)
  3. A brief description
- Duplicate notebook numbers are allowed for now (e.g. 01_zma_kegg_download and 01_taes_kegg_download) for parallel tasks but reserving a block of notebook numbers may end up being better (e.g. 01-10 for zma, 11-20 for taes).
External Data (ext_data)
- Subfolders for different species (Arabidopsis ath, Wheat taes, and Maize zma shown here)
- Subfolders may contain data from public databases (cyverse, panzea, kegg, etc.) or data from specific studies.
- Study data should be named according to the citation (e.g. buckler_et_al_2009) rather than the repository that the data is stored in (e.g. figshare, zenodo, etc.).
Data (data)
- Cleaned or otherwise transformed data from ext_data should be kept here.
- Computational artifacts (e.g. pickled objects) that are expensive to recompute should also be stored here.
- Data storage isn’t set yet. It will either aim:
  - To make the origin of produced objects clear, folders will have names matching the notebooks that created them.
  - To make the use of produced objects easy, the folders will have names matching those in ext_data.
Models (models)
- Computationally expensive models are to be saved here.
- Folders will have names matching the notebooks that created them
Reports (reports)
- Figures, tables, and other human readable artifacts are to be stored here.
- Folders will have names matching the notebooks that created them

Illustrative Directory Structure:

.
├── nbs
│   ├── 00_core.ipynb
│   ├── 01_zma_kegg_download.ipynb
│   ├── ...
│   └── index.ipynb
├── ext_data
│   ├── ath
│   ├── taes
│   └── zma
│       ├── buckler_et_al_2009
│       ├── e2p2_computed
│       ├── ensemble
│       ├── kegg
│       ├── panzea
│       └── plant_reactome
├── data
│   ├── zma                  ?
│   │   └── ...
│   └── 01_zma_kegg_download ?
│       └── ...
├── models
│   └── 01_zma_kegg_download
│       └── ...
└── reports
    └── 01_zma_kegg_download
        └── ...

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
dlgwas		dlgwas
nbs		nbs
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
settings.ini		settings.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

dlgwas

dlgwas

nbs

nbs

.gitignore

.gitignore

LICENSE

LICENSE

MANIFEST.in

MANIFEST.in

README.md

README.md

settings.ini

settings.ini

setup.py

setup.py

Repository files navigation

dlgwas

Install

How to use

Project Structure

About

Releases

Packages

Languages

License

DanielKick-USDA/dlgwas

Folders and files

Latest commit

History

Repository files navigation

dlgwas

Install

How to use

Project Structure

About

Resources

License

Stars

Watchers

Forks

Languages