Script for analysing "L1-seq" data generated by targeted L1-specific sequencing (Ewing and Kazazian 2010, doi:10.1101/gr.106419.110)
In order to run l1seq.py, a number of packages need to be present on your system.
pysam:
pip install pysam
numpy:
pip install numpy
align:
git clone https://github.com/adamewing/align
cd align
python setup.py build
python setup.py install
Instructions, assuming L1-seq results have been aligned to the human reference genome e.g. via bwa or bowtie2.
Optional: If multiple samples are to be analysed, merge them into a single BAM maintaining distingt read groups for each sample. This can be accomplished using samtools merge
:
samtools merge -r merged_samples.bam sample1.bam sample2.bam sampleN.bam
samtools index merged_samples.bam
-
Build mappability tabix:
cd ref ./make_human_mappability.sh
-
Run l1seq.py:
./l1seq.py \ -b l1seq.alignment.bam \ -m ref/hsMap50bp.bed.gz \ --ref ref/hg19.primate.L1.bed.gz \ --nonref ref/hg19.nonref.L1.bed.gz \ > l1seq.results.tsv
- The memory footprint may be quite large when run over a large BAM file (e.g. when many samples are merged). The
-c/--chrom
option may be used to limit the run to a single chromosome thus decreasing the memory requirement. - This tool is not intended for any other data type (e.g. capture sequencing, WGS). TEBreak [https://github.com/adamewing/tebreak] is one option for performing these analyses.