Skip to content

swheelan/python-practice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reproducible analysis example - Python

This is an example project repository to illustrate what a reproducible analysis might look like as discussed in more detail in the Reproducibility in Cancer Informatics course.
It can be used as a template or otherwise borrowed from.

This example analysis:

It also has its own Docker image and GitHub actions to aid reproducibility.

Table of Contents generated with DocToc

Requirements

To run this analysis you will need git and Docker installed on your computer. These are two platforms that are very useful for reproducibility so they will be useful for you far beyond this repository.

How to run the analysis

To re-run this analysis within its Docker image, open up your Terminal/Command Prompt.

  1. First you can obtain a local copy of this repository by git clone-ing it.
git clone https://github.com/jhudsl/reproducible-python-example.git
  1. Now navigate to the top of this repository.
cd reproducible-python-example
  1. Use the following command to run the analysis:
docker run \
--mount type=bind,target=/home/jovyan/work,source=$PWD \
  jhudsl/reproducible-python \
  jupyter nbconvert --execute work/make_heatmap.ipynb --to notebook --inplace

make_heatmap.ipynb

Input

The dataset used by this analysis is downloaded already processed and quantile normalized from refine.bio using their API. It is RNA-seq data from 19 acute myeloid leukemia (AML) mice models.

Output

Two directories are created by this analysis and hold the output:

plots/ - contains the heatmap png: aml_heatmap.png results/ - contains the TSV file list of most variant genes: top_90_var_genes.tsv

conda

Package management for this project is done with conda. If you don't have conda, you will need to install that first. This article is a great short introduction to conda. You can create your conda environment by using this command at the top of your repository:

conda env create --file environment.yml

Then you can activate your conda environment using this command:

conda activate reproducible-python

Now you can start up JupyterLab again using this command:

jupyter lab

Working from JuptyerLab, use the "Reproducible Python" Kernel. Develop and install new packages as you need them, to update the conda environment with the new packages you installed, run this command:

conda env export --from-history

Be sure to add the environment.yml file to any commits and pull requests since that's what has stored the package changes to your environment!

Docker

Running the Python docker image for development purposes

With your current directory being the top of this repository, run this command in your Terminal:

docker run --rm -v $(pwd):/home/jovyan/work -e JUPYTER_ENABLE_LAB=yes -p 8888:8888 jhudsl/reproducible-python

Then navigate to the port that the output tells you (you may have to try both links, sometimes only one of them works). This command will pull the most recent docker image from Dockerhub if you do not have it locally.

Rebuilding the docker image locally

If you prefer to build the image locally, or have otherwise modified the Dockerfile and want to test if it builds, you can run this command from the top of the repository:

docker build -f docker/Dockerfile . -t jhudsl/reproducible-python

Running docker ps should show you the jhudsl/reproducible-python listed with your images

Github actions

There are two main GitHub actions in this repository:

  • docker-management.yml - Tests the building of the docker image upon changes to the Dockerfile being added to a pull request.
  • run-py-notebook.yml - Re-runs the analysis by running make_heatmap.ipynb within the docker image (using the command described above).

Both GitHub actions have the option to be run manually. The Docker management GitHub actions also has the option to push the re-built Docker image to Dockerhub by setting dockerhubpush to true.

Styling with Black

The Docker container and conda environment are equipped with python black for styling purposes. To run on each python file here, use these commands:

python -m black make_heatmap.ipynb
python -m black util/color_key.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages