This repository provides analysis of the human cleavage site annotation resulting from running the pipeline at https://github.com/Mayrlab/hcl-utrome. It characterizes cleavage sites with respect to number of cell types in which they are detected, presence of cleavage factor motifs, PhastCons conservation scores, and APARENT2 scores. It is particularly concerned with comparing cleavage sites in common between GENCODE annotations and detected in Microwell-seq data, and those only found in either GENCODE or Microwell-seq data.
It also contains some preprocessing code that created the metadata tables used in the pipeline.
A preprint of the results are reported in Fansler et al., bioRxiv, 2023.
The folders in the repository have the following purposes:
analysis
- primary source code and rendered HTMLs of R Markdown or IPython notebooksenvs
- Conda environment YAML files for recreating the execution environmentimg
- output imagesmetadata
- output tables used in pipelineqc
- quality control data from the pipeline outputsscripts
- miscellaneous scripts for data format conversions
All code is expected to be executed with this repository as the present working directory. If opening as an R Project in RStudio, make sure to set the Project folder as the working directory.
The primary source code is found in the analysis
folder.
Files are numbered in the original order of execution, though the order does not
imply strict necessity (most analyses here can be independently executed).
The analysis/processing/reformat_annots.Rmd
was run before
the pipeline, and
generated metadata
outputs that were used in the pipeline.
The R instances used to execute the files was captured both in the rendered RMDs themselves
(see Runtime Details section in HTMLs) and provided as YAML files in the envs
folder.
To recreate on arbitrary platforms (Linux or MacOS), we recommend using
Micromamba
and the minimal YAML (*.min.yaml
):
micromamba create -n bioc_3_16 -f envs/bioc_3_16.min.yaml
micromamba activate bioc_3_16
A fully-solved environment capture is also provided (*.full.yaml
). This is only
expected to recreate on the osx-64 platform and is primarly intended for exact
replication and a statement of record.