Skip to content

Commit

Permalink
development docs draft (#396)
Browse files Browse the repository at this point in the history
* development docs draft

* added intro for docker_style_guide.md

* add to docker_style_guide.md

* add to wdl_style_guide.md

* renamed dev_guide to development_guide

* rewrote overview, added visuals for directory structure

* Minor fixes, Added Workflow Deployment in README.md

---------

Co-authored-by: bshifaw <bshifaw@broadinstitute.com>
  • Loading branch information
bshifaw and bshifaw committed May 3, 2023
1 parent 261a3b9 commit e4fb5c3
Show file tree
Hide file tree
Showing 4 changed files with 408 additions and 0 deletions.
69 changes: 69 additions & 0 deletions docs/development_guide/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Development Guide Overview

This development guide provides information on the structure of the repository, testing infrastructure,
style guides, and contributing guidelines. The hope is that this guide will help developers
create and maintain code that is consistent with the rest of the repository.

## Repository Structure

The repository includes files such as the LICENSE and README.md files, which provide legal and informational
overviews of the repository. Other directories such as docker, docs, resources, scripts,
site, test, and wdl contain various files and directories that are important for
building and testing the software in the repository.
See [Repository Structure](./repo_structure.md) documentation for further
details and [Contributing Guidelines](#contributing-guidelines) for
information on how to contribute to the repository.

## Workflow Scripts

All workflow scripts are located in the `/wdl` directory and are written in WDL 1.0
and intended for use with Google Cloud Platform via the scientific workflow engine, Cromwell.
The WDL scripts are divided into three subdirectories: `tasks`, `structs`, and `pipelines`;
then further divided by sequencing platform and analysis type.
See [Repository Structure](./wdl_style_guide.md) for more information on directory structure.

## Docker Containers

The WDL workflows in this repository are designed to be run using Docker containers.
This provides a number of advantages, including the ability to run the workflows on a
variety of platforms and the ability to isolate the workflows from each other.
Many of these workflows use specialized containers that are built from the Dockerfiles in the
docker directory. The docker directory contains Dockerfiles and other scripts for
building containers for several tools. The docker containers are pushed to
Google Container Registry (GCR) called `us.gcr.io/broad-dsp-lrma`, for internal use. External audiences interested in running workflows
using these containers should build and if needed push them to repository they have access to.

## Testing

Scripts for running tests are located in the test directory. The test directory contains
scripts for running tests using Tox. Tox is a Python-based test automation tool that
can be used to run tests in a variety of environments. The scripts are run through
GitHub Actions, which are configured in the .github/workflows directory. The GitHub Actions
are triggered by pushes to the repository and pull requests.

## Workflow Deployment

The workflows in this repository are deployed to Terra using Dockstore. Dockstore is a
platform for sharing Docker containers and workflows. The workflows are registered using the
`dockstore.yml` file in the root directory of the repository. The `dockstore.yml` file
contains information about the workflows, including the location of the WDL files and if
available the location of example input JSON files. The workflows published in Dockstore
are automatically updated when changes are made to the repository. If you would like to
add a new workflow to this repository and have it published in Dockstore, please
update the `dockstore.yml` file in your feature branch. This should be enough for
Dockstore github app to automatically add your workflow branch version to
the Dockstore repository.


## Contributing Guidelines

Please adhere to the following best practices if contributing to this repository:

1. **Read Style Guide**: Before making any changes to the code, it's important to read the style guide. The style guide contains information on how to write code that is consistent with the rest of the codebase. See [WDL Style Guide](./wdl_style_guide.md) and [Docker Style Guide](./docker_style_guide.md) for more information.
2. **Create a new branch**: When making contributions to a repository, it's important to create a new branch for each change you make. The name of the branch should begin with your initials followed by an underscore and a short description of the change. For example, if Janet Sully is making a change to the README file, the name might be `js_update_readme`.
3. **Keep commits small and focused**: When making changes to the code, it's important to keep your commits small and focused on a specific task. This makes it easier for others to review your changes and also makes it easier to roll back changes if necessary.
4. **Write clear commit messages**: When committing changes to the repository, it's important to write clear and concise commit messages that describe what changes were made. This helps others understand the changes you made and why you made them.
5. **Test your changes**: Before submitting your changes, make sure to test them thoroughly to ensure they work as intended. This helps reduce the chance of introducing bugs or issues into the codebase.
6. **Submit a pull request**: Once you have made your changes and tested them, submit a pull request to the main repository. Make sure to include a clear description of the changes you made and why you made them. This makes it easier for others to review and merge your changes into the main repository.
7. **Add reviewers**: Once you have submitted your pull request, add reviewers to the pull request. This will notify them that you have submitted a pull request and they should review it. It's important to add at least one reviewer to your pull request.
8. **Merging pull requests**: Once your pull request has been reviewed and approved, it can be merged into the main repository. It's important to merge pull requests using the "Squash and merge" option. This will squash all commits in the pull request into a single commit, which makes it easier to track changes in the repository.
85 changes: 85 additions & 0 deletions docs/development_guide/docker_style_guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Docker Style Guide
This document will provide a guide for writing Dockerfiles for the pipelines.
The guide will provide a list of best practices for writing Dockerfiles, and will also
provide a list of common mistakes to avoid. The guide is for those wanting to contribute
Dockerfiles to the pipelines repository.


## Dockerfiles Organization
All docker related resources should be placed in the `docker` directory of the repository. The `docker` directory
contains a subdirectory for each Dockerfile and its related resources. The subdirectory name should
start with an abbreviation of data type the docker tool will process followed by the name of the tool.
For example, the docker image for the `bwa` aligner that will process long reads would be placed in the `docker/lr-bwa` directory.

## Docker Subdirectory Folder
Each Docker subdirectory should contain the following files and folders:

- `Dockerfile`: The Dockerfile for the Docker image.
- `Makefile`: A Makefile for building the Docker image.

Optionaly the subdirectory may contain the following files and folders:

- `README.md`: A README file for the Docker image.
- `enironment.yml`: A conda environment file for installing dependencies.
- Any resource files (e.g. python script) needed to build the Docker image.

Example Directory Tree:

```Text
docker
|__lr-bwa
| |__Dockerfile
| |__Makefile
| |__README.md
|__lr-pb
|__lr-ont
```


## Dockerfile Guidelines
This section outlines the guidelines for creating Dockerfiles, including the format, structure, and best practices for creating efficient and maintainable Docker images.
Docker Docs provides a valuable resource for learning about Dockerfiles. The following
links provide a good starting point for creating Dockerfiles using general best practices: [Docker Best Practices](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/)
In addition to the Docker best practices, use the following guidelines when creating Dockerfiles for the pipelines repository.

- When appropriate be sure to add a comment proceeding Docker instructions to explain its purpose.
```Dockerfile
# copy other resources
COPY ./environment.yml /

# install conda packages
RUN conda env create -f /environment.yml && conda clean -a
ENV PATH=/opt/conda/envs/lr-pb/bin/:/root/google-cloud-sdk/bin/:${PATH}

# install gsutil
RUN apt install -y curl git-lfs time datamash
RUN curl https://sdk.cloud.google.com | bash
```

- Specify a `MAINTAINER` for the Docker image.
```Dockerfile
FROM continuumio/miniconda3

MAINTAINER Barbra Mills
```

- Specify version numbers for all packages installed in the Docker image.
```Dockerfile
RUN conda install python=3.6.9
RUN conda create -n venv python=3.6.9
```


## Image Naming and Tagging Guidelines
This section outlines the guidelines for naming and tagging Docker images, including the format, structure, and best practices for creating consistent and descriptive image names and tags.

* Use descriptive names: Choose a name that clearly identifies the image and its purpose. Avoid using generic names like "docker-image" or "latest".
* Name should match directory name: When possible the name of the Docker image should match the name of the docker subdirectory it is located in.
* Use lowercase letters: Docker image names should be in lowercase letters.
* Use semantic versioning: Follow the semantic versioning pattern (major.minor.patch) to ensure consistency and compatibility between different versions of the image.
* Avoid special characters: Avoid using special characters in the image name or tag, as it may cause issues with some systems or platforms.


## Testing and CI/CD Guidelines
_This section should provide guidelines for testing Docker images and containers, including best practices for creating automated tests and integrating with CI/CD pipelines to ensure consistent builds and deployments.
TBD: This section is still under development._
129 changes: 129 additions & 0 deletions docs/development_guide/repo_structure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Repository Structure

```angular2html
├── cloudbuild.yaml
├── LICENSE
├── README.md
├── mkdocs.yml
├── requirements.txt
├── tox.ini
├── VERSION
├── docker
│ └── ...
├── docs
│ └── ...
├── resources
│ └── ...
├── scripts
│ └── ...
├── terra
│ └── ...
├── test
│ └── ...
└── wdl
└── ...
```


The repository file and directory is as follows:

* LICENSE: The license for the repository.
* README.md: This document, which provides an overview of the repository.
* VERSION: The version number of the repository.
* cloudbuild.yaml: A Cloud Build configuration file that defines how the repository is built.
* docker: Contains Dockerfiles for building docker images used by pipelines.
* docs: Contains documentation for the pipelines and a developer's guide.
* requirements.txt: A file listing the Python dependencies for the pipelines.
* resources: A directory containing resources used by the pipelines.
* mkdocs.yml: A configuration file for the mkdocs documentation generator.
* scripts: Contains scripts used by the repository (e.g. webpage creation).
* test: Contains tests for the pipelines.
* tox.ini: A configuration file for the tox test runner.
* wdl: Contains WDL files.

## WDL Directory Structure

```angular2html
└── wdl
└── pipelines
│ └── ...
└── structs
│ └── ...
└── tasks
└── ...
```
The WDL directory is further divided into subdirectories. The subdirectories are as follows:

* tasks: Contains WDL files with a list of tasks to be imported and used by pipeline WDLs.
* pipelines: Contains WDL files with workflow blocks.
* structs: Contains WDL structs for the pipelines.

### Tasks Directory Structure
The task directory has an additional subdirectory to organize wdl tasks by analysis type. The subdirectories are as follows:

```angular2html
└── wdl
└── tasks
│ └── alignment
│ │ └── ...
│ └── annotation
│ │ └── ...
│ └── assembly
│ │ └── ...
│ └── epigenomics
│ │ └── ...
│ └── preprocessing
│ │ └── ...
│ └── qc
│ │ └── ...
│ └── transcriptomics
│ │ └── ...
│ └── utility
│ │ └── ...
│ └── variantcalling
│ │ └── ...
│ └── visualization
│ │ └── ...
```

### Pipelines Directory Structure
The pipelines directory has two additional subdirectories to organize wdl workflows, first by platform then by analysis type.

The first level subdirectories are as follows:

```angular2html
└── wdl
└── pipelines
│ └── Illumina
│ │ └── ...
│ └── PacBio
│ │ └── ...
│ └── ONT
│ │ └── ...
│ └── TechAgnostic
│ │ └── ...
```

The second level subdirectories are as follows:

```angular2html
└── wdl
└── pipelines
│ └── Illumina
│ │ └── alignment
│ │ │ └── ...
│ │ └── annotation
│ │ │ └── ...
│ │ └── assembly
│ │ │ └── ...
│ │ └── epigenomics
│ │ │ └── ...
│ │ └── multianalysis
│ │ │ └── ...
│ │ └── preprocessing
│ │ │ └── ...
│ │ └── utility
│ │ │ └── ...
│ │ └── variantcalling
│ │ │ └── ...
```
Loading

0 comments on commit e4fb5c3

Please sign in to comment.