This is the repository of the paper A comprehensive long-read isoform analysis platform and sequencing resource for breast cancer, published in Science Advances (2022).
The repository contains the pipeline used to analyze RNA isoforms and ORF products sequenced from breast cancer samples using PacBio long-read sequencing.
- GTF file containing isoforms passing quality control
- FASTA file with isoform sequences
- SQANTI2 annotation of isoforms
- Open read frames predicted by Transdecoder
Quality control performed using custom scripts, including redudancy removal, minimal junction coverage and 3'end realiability. See 1-Quality_control.
- Original ToFU GTF
- SQANTI2 GTF file with indel correction
- SQANTI2 annotation all isoforms, with isoform and junction tables
- Full-length counts per sample - table.
The data and source code (pipeline V.1) can also be downloaded from Zenodo using this link.