Skip to content

This project aims to explore using deep learning for a gwas like analysis (leveraging the NAM population). This has been suspended to favor `EnvDL` (my NIFA funded project).

License

Notifications You must be signed in to change notification settings

DanielKick-USDA/dlgwas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dlgwas

Ultimately this project will be installable via pip with the following command. Note that this is not yet implemented.

Install

pip install dlgwas

How to use

This project is currently pre-release. Instructions on use will be added in the future.

Project Structure

The files of primary interest in this project are notebooks, external data, processed data, models, and reports. Additional files are produced by nbdev for building documentation, testing, and other tasks. This project is intended to applicable to multiple organisms. This long term goal motivates encapsulating data relevant to a species in a species subfolder even though only Zea mays is considered at present.

Below is an outline to illustrate the project’s target structure.

  • Notebooks (nbs)
    • Where possible, analysis will be done in jupyter notebooks.
    • Notebooks are named in snake case with
      1. The expected run order
      2. The species if applicable (using KEGG naming conventions)
      3. A brief description
    • Duplicate notebook numbers are allowed for now (e.g. 01_zma_kegg_download and 01_taes_kegg_download) for parallel tasks but reserving a block of notebook numbers may end up being better (e.g. 01-10 for zma, 11-20 for taes).
  • External Data (ext_data)
    • Subfolders for different species (Arabidopsis ath, Wheat taes, and Maize zma shown here)
    • Subfolders may contain data from public databases (cyverse, panzea, kegg, etc.) or data from specific studies.
    • Study data should be named according to the citation (e.g. buckler_et_al_2009) rather than the repository that the data is stored in (e.g. figshare, zenodo, etc.).
  • Data (data)
    • Cleaned or otherwise transformed data from ext_data should be kept here.
    • Computational artifacts (e.g. pickled objects) that are expensive to recompute should also be stored here.
    • Data storage isn’t set yet. It will either aim:
      • To make the origin of produced objects clear, folders will have names matching the notebooks that created them.
      • To make the use of produced objects easy, the folders will have names matching those in ext_data.
  • Models (models)
    • Computationally expensive models are to be saved here.
    • Folders will have names matching the notebooks that created them
  • Reports (reports)
    • Figures, tables, and other human readable artifacts are to be stored here.
    • Folders will have names matching the notebooks that created them

Illustrative Directory Structure:

.
├── nbs
│   ├── 00_core.ipynb
│   ├── 01_zma_kegg_download.ipynb
│   ├── ...
│   └── index.ipynb
├── ext_data
│   ├── ath
│   ├── taes
│   └── zma
│       ├── buckler_et_al_2009
│       ├── e2p2_computed
│       ├── ensemble
│       ├── kegg
│       ├── panzea
│       └── plant_reactome
├── data
│   ├── zma                  ?
│   │   └── ...
│   └── 01_zma_kegg_download ?
│       └── ...
├── models
│   └── 01_zma_kegg_download
│       └── ...
└── reports
    └── 01_zma_kegg_download
        └── ...    

About

This project aims to explore using deep learning for a gwas like analysis (leveraging the NAM population). This has been suspended to favor `EnvDL` (my NIFA funded project).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published