## Identification of TSMs within human microRNA genes

A retrieval of the corresponding sequences to human microRNAs from other mammalian genomes. Coordinates are from the downloaded mirbase and .maf alignments for 99 vertebrates from the UCSC database (https://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz100way/maf/; the last modified in 6th of March 2015). The species file edited from   http://hgdownload.cse.ucsc.edu/goldenPath/hg38/multiz100way/README.txt. The retrieved combined sequences have to be at least 30 bp long and be 80% identical with some other sequence in minimum (species outside human).

In [None]:
python3 scripts/python/main/download_maf_alignment.py data/210420_microRNAs_loc.txt data/species_mammals.txt data/chr/  results/ucsc_seq/ results/maf_sequence/ results/sequence_info/

Convert gzip genome files to bgzip files.

In [None]:
bash scripts/bash/bgzip.sh

Index genome files.

In [None]:
bash scripts/bash/faidx.sh

As .maf alignments are presented as synteny blocks - not continuous sequences - the continuous sequences are extracted for microRNA loci based on the starting and ending coordinates. Additional 50 bases are added to both ends. 

In [None]:
python3 scripts/python/main/continuous_sequences.py results/ucsc_seq/ data/species_mammals.txt results/sequence_info/

The nodes in the given phylogenetic tree need to be renamed to match headers in fasta sequence files. A separate tree file is created for every alignment file.

In [None]:
mkdir results/alignments1
python3 scripts/python/main/rename.py data/species_tree.tree results/mafseqs_full/ results/alignments1/ results/alignments1/

Align the sequences and remove those taxa from the tree that are not present in the alignment file. The script is run in parallel.

In [None]:
bash scripts/bash/run_alignments1.sh

Go through alignments and remove sequences that are very diverse compared to the other sequences. Sequences are left-out if 1) they are more than 1.3 times longer than the human sequence, 2) they contain ten or more Ns in row, or 3) their sequence identity to human is <50% or <60% when using, respectively, the human and the target sequence as the reference. 

In [None]:
mkdir results/alignments2
python3 scripts/python/main/clean_and_copy_seqs.py results/alignments2/ results/alignments1/ data/210420_microRNAs_loc.txt

Realign the cleaned preliminary alignments, remove unnecessary branches from trees, and infer ancestral sequences for each alignment.

In [None]:
bash scripts/bash/alignments2.sh

Identify TSM patterns by running fpatool for all alignments2/*_pagan.fas files. The script systematically traverses trees and returns each parent-child node pair. TSMs are identified between these pairs by running fpa. The target and source TSMs have to share at least 80% sequence identity and the TSM solution has to be better than the standard forward solution. The minimum number for mismatches in a mutation cluster is five and minimum length is 15.


In [None]:
mkdir results/fpa_microRNA
bash scripts/bash/run_fpa_microRNA.sh

Run a quality control for identified TSMs, collect information and save TSM cases into a mongo database. The fpa solution alignments are saved into the directory results/fpa_solutions/, so that file names correspond to the Ensembl identifier. The manually edited alignments are saved in fpa_solution_pub. 
The quality control is run between the target regions of reference-sister (80%), reference-great grand-parent (80%) and great grand-parent -reference sister nodes (70%). The quality control for the source regions of query-reference (80% seq identity), reference-great grandparent (70% id), reference-sister (80% id), great grand-parent-reference  sister nodes (70%). Also, if the query node has children, the target regions between query and child nodes are compared (id80%) to ensure the heritance of the TSM pattern.

In [None]:
mkdir results/fpa_solutions/
python3 scripts/python/main/TSM_information_microRNAs.py ./results/fpa_microRNA/ TSMs_microRNA ./results/alignments2/ ./results/alignments2/ ./data/210420_microRNAs_loc.txt ./results/fpa_solutions/

Secondary structure predictions for all the studied node sequences with RNAfold.

In [None]:
python3 scripts/python/main/predict_structures.py results/alignments2/ results/structures/

Draw TSM cases in microRNAs, which are at least 15 bases long and contain 5 or more mismatches.

In [None]:
python3 scripts/python/main/draw_figs.py results/figs/ TSMs_microRNA results/alignments2/ results/alignments2/ results/structures/ data/210420_microRNAs_loc.txt data/microRNA_products.txt

Collect information on microRNAs (location in relation to genes) from .gff3 file.

In [None]:
python3 scripts/python/main/microRNA_gene_context.py data/TSM_microRNAs.txt data/Homo_sapiens.GRCh38.105.gff3 >& data/microRNA_genetic_context.txt

Collect information for microRNAs. The TSms were manually checked and edited. The manually edited TSM list is saved to a file results/TSM_expl_in_microRNAs.txt

In [None]:
python3 scripts/python/main/collect_TSM_microRNA_info.py data/miRNA.dat data/210420_microRNAs_loc.txt results/TSMs_in_microRNA.txt TSMs_microRNA data/microRNA_genetic_context.txt

Figures from secondary structures using RNAplot. The TSM target and source sites are colored.

In [None]:
python3 scripts/python/main/draw_structures.py results/struct_figs/ results/structures/ TSMs_microRNA results/alignments2/

Count how many of microRNA loci are primate-specific (in maximum 90% identical sequence to human microRNA sequence) and how many of TSM-origin microRNA loci have acquired mutations after emergence.

In [None]:
python3 scripts/python/main/primate_specific_microRNAs.py data/210420_microRNAs_loc.txt results/TSM_expl_in_microRNAs.txt results/alignments2/ TSMs_microRNA


#### Create alignment for hsa-mir-7856 including the upstream LINE2 insertion

We wanted to know the genomic context around the microRNA locus including LINE2 insertion, the criteria used for other microRNAs were not suitable.

Download a sequence

In [None]:
python3 scripts/python/main/download_maf_alignment_hsa_mir_7856.py data/210420_microRNAs_loc.txt data/species_mammals.txt data/chr/ results/ucsc_seq_hsa_mir_7856/  results/maf_sequence_hsa_mir_7856/ results/sequence_info_hsa_mir_7856/

Get the full length sequence

In [None]:
python3 scripts/python/main/continuous_sequences_hsa_mir_7856.py results/ucsc_seq_hsa_mir_605/ data/species_mammals.txt results/sequence_info_hsa_mir_605/

Rename tree nodes

In [None]:
python3 scripts/python/main/rename.py data/species_tree.tree results/maf_sequence_full_hsa_mir_7856/ results/alignments1_hsa_mir_7856/ results/alignments1_hsa_mir_7856/

Align sequences. Sequence set was manually edited, so that only primates were left. Pongo abelii and *Chlorocebus sabateus* were removed as the first one likely aligned with some other LINE and *Chlorocebus sabaeus* was very short. 

In [None]:
python3 scripts/python/main/alignments1.py results/alignments1_hsa_mir_7856/ results/alignments1_hsa_mir_7856/ results/alignments1_hsa_mir_7856/ ENSG00000278281

The alignment was cut from sites 59 and 2944. The sequence between these was analysed.

In [None]:
python3 scripts/python/main/run_fpa.py ENSG00000278281 results/fpa_hsa_mir_7856/ results/alignments2_hsa_mir_7856/ results/alignments2_hsa_mir_7856/

Predict secondary structures.

In [None]:
python3 scripts/python/main/predict_structures.py results/alignments2_hsa_mir_7856/ results/structures_hsa_mir_7856/

Draw figure.

In [None]:
python scripts/python/main/draw_figs_from_database_hsa_mir_7856.py results/fig_hsa_mir_7856/ results/alignments2_hsa_mir_7856/ results/alignments2_hsa_mir_7856/ results/structures_hsa_mir_7856/ data/210420_microRNAs_loc.txt data/microRNA_products.txt

#### Create alignment for hsa-mir-605

Download sequences. As the hsa-mir-605 has a long insertion next to a microRNA locus, we take more genomic context than 50 bases.

In [None]:
python3 scripts/python/main/download_maf_alignment_hsa_mir_605.py data/210420_microRNAs_loc.txt data/species_mammals.txt data/chr/ results/ucsc_seq_hsa_mir_605/  results/maf_sequence_hsa_mir_605/ results/sequence_info_hsa_mir_605/

Get continuous sequences

In [None]:
python3 scripts/python/main/continuous_sequences_hsa_mir_605.py results/ucsc_seq_hsa_mir_605/ data/species_mammals.txt results/sequence_info_hsa_mir_605/

Rename tree nodes. Sequences *Otolemur garnettii*, *Heterocephalus glaber*, *Bos taurus*, *Eptesicus fuscus*, *Odebenus rosmanus divergens*, *Leptonychotes_weddelii*, *Canis lupus*, *Oryctolagus cuniculus* were removed. 

In [None]:
python3 scripts/python/main/rename.py data/species_tree.tree results/maf_sequence_full_hsa_mir_605/ results/alignments1_hsa_mir_605/ results/alignments1_hsa_mir_605/

Align sequences.

In [None]:
python3 scripts/python/main/alignments1.py results/alignments1_hsa_mir_605/ results/alignments1_hsa_mir_605/ results/alignments1_hsa_mir_605/ ENSG00000207813

Alignment was cut from sites 316 and 530. The sequence between these was analysed.

In [None]:
python3 scripts/python/main/run_fpa.py  ENSG00000207813 results/fpa_hsa_mir_605/ results/alignments2_hsa_mir_605/ results/alignments2_hsa_mir_605/

Predict secondary structures.

In [None]:
python3 scripts/python/main/predict_structures.py results/alignments2_hsa_mir_605/ results/structures_hsa_mir_605/

Draw figure.

In [None]:
python scripts/python/main/draw_figs_from_database_hsa_mir_605.py results/fig_hsa_mir_605/ results/alignments2_hsa_mir_605/ results/alignments2_hsa_mir_605/ results/structures_hsa_mir_605/ data/210420_microRNAs_loc.txt data/microRNA_products.txt

#### Create alignment for hsa-mir-3688

Download sequences

In [None]:
python3 scripts/python/main/download_maf_alignment_hsa_mir_3688.py data/210420_microRNAs_loc.txt data/species_mammals.txt data/chr/ results/ucsc_seq_hsa_mir_3688/  results/maf_sequence_hsa_mir_3688/ results/sequence_info_hsa_mir_3688/

Get continuous sequences

In [None]:
python3 scripts/python/main/continuous_sequences_hsa_mir_3688.py results/ucsc_seq_hsa_mir_3688/ data/species_mammals.txt results/sequence_info_hsa_mir_3688/

Pan troglodytes, Macaca mulatta and Sus scrofa were removed. 

In [None]:
python3 scripts/python/main/rename.py data/species_tree.tree results/maf_sequence_full_hsa_mir_3688/ results/alignments1_hsa_mir_3688/ results/alignments1_hsa_mir_3688/

Align sequences.

In [None]:
python3 scripts/python/main/alignments1.py results/alignments1_hsa_mir_3688/ results/alignments1_hsa_mir_3688/ results/alignments1_hsa_mir_3688/ ENSG00000264105

Alignment was cut from sites 7 and 340. The sequence between these indexes was analysed.

In [None]:
python3 scripts/python/main/run_fpa.py  ENSG00000264105 results/fpa_hsa_mir_3688/ results/alignments2_hsa_mir_3688/ results/alignments2_hsa_mir_3688/

Predict secondary structures.

In [None]:
python3 scripts/python/main/predict_structures.py results/alignments2_hsa_mir_3688/ results/structures_hsa_mir_3688/

Draw figure

In [None]:
python scripts/python/main/draw_figs_from_database_hsa_mir_3688.py results/fig_hsa_mir_3688/ results/alignments2_hsa_mir_3688/ results/alignments2_hsa_mir_3688/ results/structures_hsa_mir_3688/ data/210420_microRNAs_loc.txt data/microRNA_products.txt

### Create a heatmap for identified TSM-associated microRNAs

Cut the sequences to match the pri-microRNA hairpin and predict the secondary structures.

In [None]:
python3 scripts/python/main/minimum_free_energy_for_hairpins.py results/TSM_expl_in_microRNAs_full.txt results/alignments2/ results/hairpin_struct/ data/210420_microRNAs_loc.txt

Create a heatmap from free energies of microRNA loci in which TSM explains the full length microRNA (Nine unexplained bases in maximum). Merges species outside primates belonging to the same taxonomic group and picks the highest free energy. Taxonomic groups, which do not contain at least ten species are left out from the analysis. 

In [None]:
python3 scripts/python/main/heatmap_merge_species_minimum.py results/TSM_expl_in_microRNAs_full.txt data/species_tree.tree results/alignments2/ results/hairpin_struct/  results/TSM_microRNA_heatmap_merge_minimum.svg results/TSM_microRNA_heatmap_merge.tree

Draw a heatmap containing all the species. Taxa are not merged. Also in this one only fully explained microRNAs are included.

In [None]:
python3 scripts/python/main/heatmap_free_energy.py results/TSM_expl_in_microRNAs_full.txt data/species_tree.tree results/alignments2/ results/hairpin_struct/  results/TSM_microRNA_heatmap_merge_minimum_all.svg results/TSM_microRNA_heatmap_merge_all.tree 


### Draw genomic context for TSM-associated microRNA genes


Collect information on the transposons within 300,000 bases around the microRNAs loci.

In [None]:
python3 scripts/python/main/identify_transposons_in_microRNAs.py results/TSM_expl_in_microRNAs.txt data/hg38.fa.out data/ results/TSM_expl_in_microRNAs.txt

Collect information on genes within +/-50,000 bases around microRNA genes and create a shorter gff3 file. Create a separate file 'transcript_symbols.txt', where Ensembl transcripts symbols are on left side column and gene symbols are on the right side column. 

In [None]:
python3 scripts/python/main/make_short_gff3.py

Visualize genomic context about the genomic context of the microRNAs.

In [None]:
Rscript scripts/R/draw_gene_figs_spescase2.R

## Identification of TSMs within mouse microRNA genes

The coordinates of the mouse pri-miRNAs and their corresponding prodcuts were downloaded from the MiRBase https://mirbase.org/download/mmu.gff3 (on 26th Sep 2023; the version mm10). As the genomic alignments were downloaded from Ensembl the coordinates were parsed and lifted over from mm10 to mm39.

In [1]:
python3 scripts/python/main/parse_mouse_coord.py data/mmu.gff3

Traceback (most recent call last):
  File "scripts/python/main/parse_mouse_coord.py", line 1, in <module>
    from liftover import get_lifter
ModuleNotFoundError: No module named 'liftover'


: 1

The genome alignments (EPO_EXTENDED) and the corresponding phylogenetic trees of the microRNA loci (+/-50 bases)for murinae taxa were downloaded from the Ensembl. Downloaded on the 27th September 2023.

In [None]:
python3 scripts/python/main/ensembl_miR_alignments_EXT.py results/mouse_mirbasedb2/mouse_mir_coords.txt mus_musculus murinae results/mouse_mirbasedb2/Ensembl_seq/ results/mouse_mirbasedb2/Ensembl_seq/

The files are copied to results/mouse_mirbasedb/alignments1/. As the headings of the sequences and trees do not match the '_+_', '_-_' in the alignments and [+] and [-] in the trees need be replaced. 

In [None]:
cp results/mouse_mirbasedb2/Ensembl_seq/* results/mouse_mirbasedb2/alignments1/.

In [None]:
sed 's/\_+\_//g' results/mouse_mirbasedb2/alignments1/*

In [None]:
sed 's/\_-\_//g' results/mouse_mirbasedb2/alignments1/*

In [None]:
sed 's/\[//g' results/mouse_mirbasedb2/alignments1/*

In [None]:
sed 's/\]//g' results/mouse_mirbasedb2/alignments1/*

In [None]:
sed 's/\+//g' results/mouse_mirbasedb2/alignments1/*

In [None]:
sed 's/\+//g' results/mouse_mirbasedb2/alignments1/*

The sequences were realigned similarly to the human sequences.

In [None]:
sbash scripts/bash/alignments1_mouse_mirgenedb.sh

Sequence quality is checked. Sequences with more than 10 'N' or where the sequence identity is less than 0.5 (number of identical bases/length of alignment) or the identity is less than 0.6 (number of identical bases/the length of the non-mouse_sequence) are rejected. In addition, the number of extra bases around the microRNA locus is checked.  

In [None]:
python3 scripts/python/main/clean_and_copy_seqs_ens.py results/mouse_mirbasedb2/alignments2/ results/mouse_mirbasedb2/alignments1/ results/mouse_mirbasedb2/microRNA_mir_info.txt mus_muscul

Sequences that passed the quality check are realigned and the ancestral sequences are inferred.

In [None]:
 bash scripts/bash/alignments2_mouse_mirgenedb.sh

The phylogenetic trees are systematically screened and each parent and child node sequence pairs is run with the FPA to identify TSM cases. 

In [None]:
bash scripts/bash/run_fpa_microRNA_mousedb.sh

The information for the identified TSMs is collected and stored in to the mongo database. However, compared to the human data, the TSMs were allowed to locate between the root and its child sequence and for this reason the quality check for the inferred ancestral sequences was not performed. First a directory is created for the fpa solutions.

In [None]:
mkdir results/mouse_mirbasedb2/fpa_solutions/

In [None]:
 python scripts/python/main/TSM_information_microRNAs_mouse.py results/mouse_mirbasedb2/fpa_results/ mousedb2_TSM  results/mouse_mirbasedb2/alignments2/  results/mouse_mirbasedb2/alignments2/  results/mouse_mirbasedb2/microRNA_mir_info.txt  results/mouse_mirbasedb2/fpa_solutions/ mus_musculus

Secondary structures were predicted for the identified TSM cases (i.e. parent, query and sister nodes)

In [None]:
thon3 scripts/python/main/predict_structures_mouse.py results/mouse_mirbasedb2/alignments2/ results/mouse_mirbasedb2/structures/

TSM cases of at least 15 bases long with at least 5 mismatches were collected and sequence alignments with phylogenetic, pri-miRNA and microRNA product information were provided. 

In [None]:
 python3 scripts/python/main/draw_figs_mirgenedb.py results/mouse_mirbasedb2/figs/ mousedb2_TSM results/mouse_mirbasedb2/alignments2/ results/mouse_mirbasedb2/alignments2/ results/mouse_mirbasedb2/structures/  results/mouse_mirbasedb2/microRNA_mir_info.txt results/mouse_mirbasedb2/microRNA_mir_info.txt  mus_musculus

The secondary structures for the data S2 were drawn.

In [None]:
python3 scripts/python/main/draw_structures_mouse.py results/mouse_mirbasedb2/struct_figs/ results/mouse_mirbasedb2/structures/ mousedb2_TSM results/mouse_mirbasedb2/alignments2/ mus_musculus 6

The gene names close the identified TSMs in mouse miRNA loci were parsed. The original gff3 file was GRCm39.109.gff3, downloaded from Ensembl: https://ftp.ensembl.org/pub/release-109/gff3/mus_musculus/Mus_musculus.GRCm39.109.gff3.gz

In [None]:
python3 scripts/python/main/make_short_gff3_mouse.py > results/mouse_mmirbasedb2/shortened_mouse.gff3

The output file of the RepeatMasker for the mouse genome v.mm39 was downloaded from the UCSC database https://hgdownload.soe.ucsc.edu/goldenPath/mm39/bigZips/mm39.fa.out.gz. The transposons within +/- 50,000 bases from the TSM-associated miRNAs. Identifiers_mir.txt contains coordinate information only for the TSM-associated pri-miRNAs.

In [None]:
python3 scripts/python/main/identify_transposons_in_microRNAs.py results/mouse_mirbasedb2/identifiers_mir.txt  data_revision/mm39.fa.out results/mouse_mirbasedb2/ results/mouse_mirbasedb2/identifiers_mir.txt

Finally figures for the genomic context were drawn. To run this script the bioconductor, use conda environment bioconductor.yml.

In [None]:
Rscript scripts/R/draw_gene_figs_spescase2_mouse.R

## Identification of TSMs across gene regions

Identify non-coding and coding gene coordinates ("ncRNA_gene and gene) from the human genome (.gff3 file). Merge overlapping genes and split gene regions into 1000 bases long chunks with 25 extra bases at both ends --> 50 bases overlap. 

In [None]:
python scripts/python/main/get_gene_chunks.py data/Homo_sapiens.GRCh38.105.gff3 results/all_genes_chunks.txt

Extract sequences for sequence chunks from .maf alignment. Extract only those sequences, which represent at least 80% sequence identity to some other sequence in the data set and are at least 100 bases long.

In [None]:
python3 scripts/python/main/download_alignments_maf3_all_genes.py results/all_genes_chunks.txt data/species_mammals.txt data/chr/ results/ucsc_seq_all/ results/maf_sequence_all/ results/sequence_info_all/

Get the full length sequences. 

In [None]:
bash scripts/bash/all_full_length_sequences.sh

Create a tree for each alignment. Rename leaf nodes to match the alignment titles and remove those leaves that are not included.

In [None]:
python3 scripts/python/main/rename.py data/species_tree.tree results/maf_sequence_full_all/ results/alignments1_all/ results/alignments1_all/

Align sequences with pagan.

In [None]:
bash scripts/bash/run_alignments1_all.sh

Go through alignments and remove sequences, which more than 30% shorter in either end of the sequence if compared to the human sequence, and if sequence contains more than 50N's, or if the sequence identity between human and target sequence is less than 30% (from the length of the target sequence matches/length of a target sequence).

In [None]:
bash scripts/bash/copy_and_clean_seqs_als.sh

Realign sequences and predict ancestral sequences

In [None]:
bash scripts/bash/run_alignments2_all.sh >& all_align.log &

Run FPA2 to aligned sequences.

In [None]:
bash scripts/bash/all_genes_fpa.sh

Quality control for TSM cases. The reference can be locate in root. The quality control is run between the target site of a reference-sister node (80%) and source site of reference-sister (80%) and reference-query (80%). The source site is ok, if either query or sister passes the control. Also, the query target and children of a query (if any) are compared (80%, one of the children has to pass)

In [None]:
python3 scripts/python/main/TSM_information_all_genes.py results/fpa_all/ fpa_all results/alignments2_all/ results/alignments2_all/ results/fpa_solutions_all/

Identify TSMs, which target and source sites overlap less than 50% with simple repeats. The simple repeats are identified by running the Dustmasker for each chunk. Write those into a .bed file.

In [None]:
python3 scripts/python/main/get_TSM_coordinates_repeats.py fpa_all TSMs results/alignments2_all/ results/all_genes_TSM_bedfiles/all_TSMs_no_repeats2.bed results/maf_sequence_full_all/

Predict secondary structures for the TSM cases (between target and source sites) for TSMs which do not overlap with a simple repeat, compare structures and classify the impact of a TSM on a secondary structure. Collects information on exon/intron/UTR and where TSM has happened in a phylogenetic tree.

In [None]:
python3 scripts/python/main/TSM_structural_analysis.py  results/all_genes_TSM_bedfiles/all_TSMs_no_repeats2.bed fpa_all TSMs results/all_genes_TSMs_structures2/ results/alignments2_all/ data/species_tree.tree results/genome_annotations/ &> results/TSM_all_genes_analysis2.txt &


## TSMs in human variation data and their association with TEs and genomic features

As the TSMs coordinates may be partially located in masked regions. The TSM coordinates are screened against variant cluster coordinates, which have to be located completely in unmasked regions. Overlaps between the variant clusters and TSMs are identified and the overlap coordinates are recorded.

In [None]:
python3 scripts/python/main/identify_unmasked_variant_regions_in_TSMs.py  data/tsms_unmasked_merged.txt data/tsms_unmasked_merged.txt results/unmasked_variant_reg_tsms.txt

Run 100 replicates of random coordinates from unmasked regions, corresponding to the structure of original TSMs.

In [None]:
bash scripts/bash/run_background_corrected.sh

Run gene region enrichment analysis

In [None]:
python3 scripts/python/main/full_genome_enrichment.py results/ results/genome_wide_background_corrected_tsms/ results/genome_annotations/ 3709 370900 unmasked_variant_reg_tsms.txt background

Analyze correlation in distances between the transposons and TSMs 

Sort TSM coordinates

In [None]:
bedtools sort -i results/unmasked_variant_reg_tsms.txt > results/unmasked_variant_reg_tsms_sorted.txt

Distances between TSMs and different transposons

In [None]:
bedtools closest -a results/unmasked_variant_reg_tsms_sorted.txt -b data/hg38_SINE.fa.out -D b > results/distance_unmasked_tsms_variants_sine.txt

In [None]:
bedtools closest -a results/unmasked_variant_reg_tsms_sorted.txt -b data/hg38_LINE.fa.out -D b > results/distance_unmasked_tsms_variants_line.txt

In [None]:
bedtools closest -a results/unmasked_variant_reg_tsms_sorted.txt -b data/hg38_LTR.fa.out -D b > results/distance_unmasked_tsms_variants_ltr.txt

Compute distances between random genome coordinates and transposons (bedtools)

In [None]:
bash scripts/bash/run_bedtools_corrected_tsms.sh

Make plots

In [None]:
Rscript scripts/R/boxplot.R

In [None]:
Rscript scripts/R/boxplot_sine.R

In [None]:
Rscript scripts/R/boxplot_ltr.R