"Upstream" includes the necessary steps to go from raw data output (usually fastq files) to a format which is visually interpretable by a researcher (e.g., bigwigs). These upstream pipelines allow wet-lab scientists to reproducibly analyse their own data without needing any prior knowledge of bioinformatics. These pipelines are built using the snakemake framework and designed to be both user-friendly and to combat the issue of reproducibility in genomic data analysis.
Further information is supplied in the README files for each of these pipelines.
Reference genomes
This is designed to streamline the download and index of reference genomes for use in the other pipelines.
Build Calibration Genome
This is designed to streamline the catenation and indexing of a reference genome with a spike-in genome for use in the Calibrated ChIP-seq.
genetics/CATCH-UP
Designed for the upstream analysis of bulk ChIP-seq and ATAC-seq data.
genetics/Calibrated ChIP-seq
This is specifically designed for the analysis of ChIP-seq data across different experimental and biological conditions in which rigorous normalisation is required for comparison across conditions.
genetics/tCaptureC
A pipeline which can be used for the analysis of both Capture-C and Tiled Capture-C data. This incorporates the previously published HiCPro and HiCPlotter tools into one streamlined analysis.
transcriptomics/Bulk-RNA-Seq
Designed for the analysis of bulk RNA-seq data using the STAR RNA-seq mapping tool.
All of the upstream pipelines can be run using the upstream conda environment, please follow the installation instruction detailed below. In doing so, the anlaysis is highly reproducible.
git clone git@github.com:Genome-Function-Initiative-Oxford/UpStreamPipeline.git
cd UpStreamPipeline
Check if Anaconda, Miniconda, or Mambaforge is installed, using:
which conda
If installed, the output should be:
~/anaconda3/condabin/conda
If Anaconda, Miniconda, or Mambaforge is not installed, we recommend to install Mambaforge, since it has already integrated mamba for a fast and parallelisable installation.
Download Mambaforge (Anaconda or Miniconda):
wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
- Run the installer as follows, and follow the on-screen commands.
sh Mambaforge-Linux-x86_64.sh
Activate the conda 'base' environment (if not active):
conda activate base
There are two ways to create the conda env upstream environment:
- Using mamba (if Mambaforge was installed), and follow the on screen instructions:
mamba env create --file=envs/upstream.yml
- Using conda, and follow the on screen instructions.
conda env create --file=envs/upstream.yml
Now, the upstream environment is created it needs to be activated:
conda activate upstream
You can then use all of our upstream pipelines using this environment, enjoy!
CATCH-UP has been successfully tested for the following operating systems: Ubuntu, CentOS, macOS (Intel CPU), and Windows. Unfortunately, it is not possible to install on macOS with M CPUs at the moment. For any error in the installation step, please open an issue so we can give a general solution for users.
If required for publication, package versions within the environment can be exported as follows:
conda env export > upstream_environment_versions.yml
If any changes are made to the pipelines, it is possible to update the repository by entering the main folder and pulling the update using:
# Enter the main folder
cd UpStreamPipeline
# Pull updates
git pull
Alternatively, remove the cloned repository and then re-clone the repository as described above.
Warning: use rm carefully!
rm -rf UpStreamPipeline
When using this repository, use the default terminal and do not load any module in the server (if logged-in).
If you have any suggestions, spot any errors, or have any questions regarding the pipelines, please do no hesitate to contact us anytime.