# Examples of alternative polyadenylation (APA) usage

This notebooks uses `Integrative Transcriptomics Viewer`, a modified version of the `genomeview` package with customizations for displaying transcriptome data. It can be installed from [`https://github.com/MethodsDev/ITV`](https://github.com/MethodsDev/ITV)

Here we're plotting three examples of APA sites where a known transcript is expressed, except with much shorter 3' UTR. This APA site is seen in the reference associated to other transcripts, but this specific structure is not annotated and these reads are assigned to the longer isoform. This causes a 5' enrichment of transcript coverage. We are plotting three ribosomal genes, simply because they were three of the highest-expressing examples of this phenomenon.

In [None]:
from pathlib import Path

from integrative_transcriptomics_viewer.convenience import Configuration
from integrative_transcriptomics_viewer.export import save

from mdl.sc_isoform_paper import today
from mdl.sc_isoform_paper.constants import MASSEQ_FILENAMES


In [None]:
root_dir = Path.home()
sh_dir = root_dir / "sh_scripts"

data_path = root_dir / "data" / "masseq"
annotated_path = data_path / "20250124_annotated"

If you need to sort and index a BED file:

```
sort -k1,1 -k2,2n unsorted.bed > sorted.bed
bgzip sorted.bed
tabix sorted.bed.gz
```

In [None]:
reference_path = root_dir / "reference"

genome_path = reference_path / "GRCh38" / "GRCh38.fasta"
gtf_path = reference_path / "GRCh38.gencode.v39.annotation.basic.gtf"
bed_gencode = reference_path / "GRCh38.gencode.v39.annotation.basic.sorted.bed.gz"

figure_path = root_dir / "202501_figures"

In [None]:
%%time
human_ref = Configuration(
    genome_fasta = genome_path,
    bed_annotation = [str(bed_gencode)],
    gtf_annotation = gtf_path
)


### Merging BAM files

ITV requires one file for each sample, so we must finally merge the BAMs we've been working with up to now.

In [None]:
merged_out_dir = data_path / f"{today}_merged"
merged_out_dir.mkdir(exist_ok=True)

with open(sh_dir / f"{today}_merge_cmds.sh", "w") as out:
    for i in MASSEQ_FILENAMES:
        print(f"samtools merge --threads 12 -o {merged_out_dir / MASSEQ_FILENAMES[i]}.bam", *annotated_path.glob(f"*.skera.{i}.*bam"), file=out)
        print(f"samtools index {merged_out_dir / MASSEQ_FILENAMES[i]}.bam", file=out)


In [None]:
# after merging
bams_dict = {
    'pipseq_8x': merged_out_dir / "pipseq_8x.bam",
    '10x_5p': merged_out_dir / "10x_5p.bam",
    '10x_3p': merged_out_dir / "10x_3p.bam",
}
bams_dict

In [None]:
save(
    human_ref.plot_exons(
        bams_dict=bams_dict,
        feature="RPLP1",
        with_reads=False,
        with_coverage=True,
    ),
    figure_path / "supp_fig10a_rplp1.svg",
    output_format="svg",
)

In [None]:
save(
    human_ref.plot_exons(
        bams_dict=bams_dict,
        feature="RPL11",
        with_reads=False,
        with_coverage=True,
    ),
    figure_path / "supp_fig10b_rpl11.svg",
    output_format="svg",
)

In [None]:
save(
    human_ref.plot_exons(
        bams_dict=bams_dict,
        feature="RPL13",
        with_reads=False,
        with_coverage=True,
    ),
    figure_path / "supp_fig10c_rpl13.svg",
    output_format="svg",
)

## 