# Bioliquid Alignment

To align the latest batch of data, jump to section `Bioliquid Second Run`.

### Bioliquid alignment troubleshooting

- Problem Description: the file received from bioliquid is not properly viewed in IGV. 

- To troubleshoot, we will first replicate the 1000genomes data, then review the Bioliquid data.

- Download the 1000 genomes file
`s3cmd get s3://aretian-genomics/bams/IB002/mother.chr7.bam`

- Extract Chr7, corresponding to Cystic Fibrosis
    - CTFR Gene location: 117,120,016 to 117,308,718

# 1000 Genomes data

In [5]:
# View chr7 data
!samtools view mother.chr7.bam | head

# Create index: this takes too long
#!samtools index mother.chr7.bam

In [9]:
# This file is too large to view on IGV
# Extract selected location
!samtools view -b mother.chr7.bam "chr7:117,120,016-117,308,718" > mother_small.bam

# Create index
!samtools index mother_small.bam

In [17]:
!samtools view mother_small.bam > mother.txt

# Bioliquid First Run

In [14]:
!samtools view ../data/raw/bioliquid_3runs.sort.bam | head

b32bab1e-0d5f-468d-8224-138ab61a39d8	16	1	10001	41	9176S55M2I2M1I40M1I11M1D5M1D5M2I3M5I3M1I6M1I12M1I4M2I29M2I14M1D20M1I4M1D5M1I5M1D2M2D3M1I44M1I13M5D17M1I13M1I13M1I12M1I5M2I2M1D5M2I10M1I2M1I21M16I9M7I1M1I5M1I6M1I19M8I6M1D5M1I18M4I1M1D4M1I11M3I5M2I9M3I1M1I28M1I38M1D5M1I7M1D46M5I1M1I4M4I7M1I6M4I5M1I3M2D25M1D12M2D8M3I2M2I33M1D6M1I5M6I11M1I9M1D4M2D26M3D2M2D9M4I4M5I6M1I1M2I9M1D14M46I30M1I9M1D18M1I57M1I14M1I10M2D14M1D3M3D24M1I2M2D8M1I14M10D10M3D4M1I14M1D21M1I10M2D10M2D24M1I33M3D23M2D4M1I31M3D25M1D18M1D72M1D3M1I44M1D1M1D15M1D60M1D14M1D34M1I60M1D1M2D9M1I7M1I25M1D12M1I18M1D19M2D46M5D49M1D2M1I36M1I8M1D2M1D22M1D27M1D38M1I1M2I59M3I43M1I15M1I8M1D2M1D12M1D59M1I39M2D27M2D5M4I10M1D10M1D5M1I28M4D22M1D4M1I14M1D161M1D3M1D15M1D100M1I3M1D72M2I48M1I5M4I28M1D5M2D62M1D4M1I35M1I3M1I22M2I9M5D2M1D23M1D40M2I111M1D3M1D43M2D13M1D36M1D9M1D1M2D140M3D4M1D283M2D33M1D56M1I25M1D17M1I25M4I16M1D2M1I18M2D33M1D3M1D15M2D3M2D42M1D66M1I3M1I2M1I52M1I27M1I10M2D34M4I2M2D4M1D40M1D1M1D13M1D32M1D8M1I12M2D15M1D151M1D16M7I63M1D14M2D2M1

In the first read above, the first number is 10001, which is wrong since it should be placed in Chr17. This means that this data is not aligned to the genome

## Running the bioliquid pipeline

In [None]:
### basecalling para cada run: guppy/4.5.2-cpu
guppy_basecaller  --input_path $path --save_path BaseCall --flowcell FLO-MIN111 --kit SQK-LSK110 --min_qscore 7 -r --fast5_out --records_per_fastq 0 --cpu_threads_per_caller 8 --num_callers 4

### juntar los fastq de los 3 runs:
cat /scratch/lab_genomica/mtormo/202103*/BaseCall/pass/*fastq > bioliquid_3runs.fastq

###  mapping (lo he probado con dos versiones obteniendo los mismos resultados, 2.11 y 2.18) minimap2/2.11-foss-2016b y versión GRCh38 de Ensembl
minimap2 -x map-ont -t 32 -a /homes/users/mtormo/lab_genomica/Genomes/hsapiens_hg38-GRCh38_ensembl/Homo_sapiens.GRCh38.dna.primary_assembly.mmi bioliquid_3runs.fastq > bioliquid_3runs.sam

### Convertir a bam y ordenar: SAMtools/1.6-foss-2016b
samtools view -bSh bioliquid_3runs.sam > bioliquid_3runs.bam
samtools sort -@ 32 bioliquid_3runs.bam > bioliquid_3runs.sort.bam
samtools index bioliquid_3runs.sort.bam
samtools flagstat bioliquid_3runs.sort.bam > bioliquid_3runs.sort.bam.flag

In [None]:
samtools view -bSh bioliquid_aligned_tormo.sam > bioliquid_aligned_tormo.bam
samtools sort -@8 bioliquid_aligned_tormo.bam > bioliquid_aligned_tormo.sort.bam

# Bioliquid Second Run

Run this pipeline for the Bioliquid Second Run.

Last run: 6/4/2021

In [3]:
# Concatenate all fastq reads
cat ~/work/data/processed/basecall-high/pass/*fastq > ~/work/data/processed/basecall-high/bioliquid_run_2.fastq

## Align to genome

In [16]:
%%bash
# minimap2 -x map-ont -t 32 -a ~/work/data/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz bioliquid_3runs.fastq.gz > bioliquid_aligned_tormo.sam
minimap2 -x map-ont -a ~/work/data/raw/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz ~/work/data/processed/basecall-high/bioliquid_run_2.fastq > ~/work/data/bioliquid-run-2-aligned.sam

bash: line 2:   201 Killed                  minimap2 -x map-ont -a ~/work/data/raw/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz ~/work/data/processed/basecall-high/bioliquid_run_2.fastq > ~/work/data/bioliquid-run-2-aligned.sam


CalledProcessError: Command 'b'# minimap2 -x map-ont -t 32 -a ~/work/data/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz bioliquid_3runs.fastq.gz > bioliquid_aligned_tormo.sam\nminimap2 -x map-ont -a ~/work/data/raw/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz ~/work/data/processed/basecall-high/bioliquid_run_2.fastq > ~/work/data/bioliquid-run-2-aligned.sam\n'' returned non-zero exit status 137.

## Convert to BAM, sort, index and get stats

In [2]:
%%bash
samtools view -bSh ~/work/data/0531-aligned.sam > ~/work/data/0531.bam

In [3]:
%%bash
samtools sort -@ 32 ~/work/data/0531.bam > ~/work/data/0531.sort.bam
samtools index ~/work/data/0531.sort.bam
samtools flagstat ~/work/data/0531.sort.bam > ~/work/data/0531.sort.bam.flag

[bam_sort_core] merging from 0 files and 32 in-memory blocks...


# Read overlap analysis

In [None]:
minimap2 -x map-ont -t 8 -a /Downloads/GENOMA bioliquid_3runs.fastq > bioliquid_aligned.sam

In [None]:
# Convert to bam
samtools view -bSh bioliquid_aliegned_tormo.sam > bioliquid_aligned_tormo.bam
samtools sort bioliquid_aligned_tormo.bam > bioliquid_aligned_tormo.sort.bam
samtools index bioliquid_aligned_tormo.sort.bam

In [None]:
# View what is going on
samtools view bioliquid_aligned.tormo.sort.bam | head

In [None]:
samtools view -b bioliquid.sorted.bam "chr17:23,000,000-27,000,000" > bioliquid_chr17_pompe_tormo.bam

In [2]:
!samtools view ../data/raw/bioliquid.sorted.bam "chr7:117,120,016-117,308,718" | head

9c0a5a6e-b3fc-4065-8bb4-f00724715ed6	0	chr7	117090365	60	29S49M1I16M1I18M1D199M1D348M2I91M1I1M1I85M1D3M2I11M1I111M1I1M2I17M1D6M1I13M1I58M1I12M1D12M1I6M2I9M1D113M1D14M1D9M2I8M1D6M1I7M4I295M1D129M1D37M1D5M2I21M1D4M1D47M3I4M1I103M1D31M3D55M5D98M4D11M2D2M1I39M2I149M3I30M1I4M1I29M1D5M4D4M1D5M1D49M1D102M1D75M1D5M1I16M3I61M1D24M1I40M1I126M1D72M1D71M1D93M1D53M1I140M2D13M1D48M1D5M1I40M1D13M5D97M1D33M1D28M1D28M1D23M2D25M1I13M1I78M1I3M1D108M1D26M2D4M1D139M1D7M1I27M1D6M1I17M1I16M1I12M1I64M1D4M1D23M1D50M1D38M1D122M1D67M1D10M3I8M1D66M1D73M1I6M1D30M1D20M1D60M1I71M2D1M2D6M1I19M1D2M1D17M2D3M2D15M1I90M1D31M1I35M1I64M2I7M1I29M2I34M2D32M1D56M1I87M3I6M2I6M1D291M2D54M1D5M1I59M1D38M1I4M1D15M1D4M2D50M1I28M1I8M1I14M2D33M1I1M1I9M1I3M1D47M1I151M1I22M1I95M1D82M1I19M1I55M2I77M1D29M1I66M1D144M1I47M4D52M2I31M1D28M3D10M1D36M1D76M4D11M2D6M1D2M1I80M1I43M1I58M2D32M2I21M2D6M2D30M2I34M1I2M1I36M2D50M11I1M1I110M1D9M1D5M1I3M1D4M1D13M1D30M1D6M1I2M2D30M1I15M1I18M1I21M1D19M3D1M1D43M1D84M3D27M1D33M3D3M1D3M1D8M1D51M1I15M2D93M1D58

In [3]:
read_0 = "TATGTGTGTGCTCTTCGTTCAGTTACGTATTGCTATAACTTTCCTACCTAGAACTCTGCCCCTTGTAGTCATTTCACGTTTTCTCAAACATTGAGATACTGTGTGGAGCAGGCTTTTGCTGGGTGCTCTGTGTTGGCTGATGGAGGATTGAGACACAGTGTGTGTGCTTGTAGGGAAGAGGTTCCCAGAGCCTGGAGCACCTTGTCTTAGCAAAGTTTCTATTTCTCTTCAATGTAACTATGTGGAAGGTTTGCTTTGTGTACAAATAGACATTTATTTTTTCTTCCTGGTATTAAATAGTTTATGAGAAACAACAAACTTGTAAAATAGTCTTCAATCATAAATGAAAGCATATTTTAAATAATTAAAAATATAATAAGACTTACTTAGGTCTTTCCTCTGTTCTTGACCAGGGTGGAGATCTTCATTATCAAGTTTCCTGTAACTAAAAGGGAACAGATCATCTGAGATCTTGCTCCACTTGTCCTTGAGGGTTATTAAGGTTTGTGTGTGACTGGGCATTGGATGGTCCTTCATAATATCAGGGAAGGTTGACAGATGGCCTCCCTCCAGCAAGGGAGAAGGTGCCATGAAGACTGAACACTGCATTGTTCCCACCAATACAGTTTGCGCCAGTATAGTTTCCCCGACCACAACTGAGGAGAAAGAATGTGCTCAGCTCCACTCCTCAGCAGAGCCAGATGCCCAGTACAGCAGAAACAGTGTACCTGTAAAGTTGCCTAGGCGACATCGGAGAATTGGAAGGAAGAGTAGACAGAATGATTTATGAGGGTTTCTTGCCATAGGAATGGGTCTAAAATTATACTTCCATGTCTTTTCCTAAGAAAATAAATAGCACCAGATTTTATCAAGGGGATTGTGATAGTGAGAACCTCAAGACCTTATCAGACCCCAAGGAGACAGGGGCCAAGGGCAAGGGGAGGGGCAGCAGAAGCAACAAATGAGGAAAGAGGGAATTGCTAATATTGTGTTTTAATTCCTAATTGGAACTTGATGCTCTGCCTTGATGTAAAACTGATTTTTAAAAAACATGCAAATATTGTGTATTTAAAAATGGGACTAAAAAAAAACCACTTTACTGCAGGTCAAATGAGATATTTTATTTCCATATTAAAAAAACAGCTCCTCTCAGCATCTACTTTAGTATTAGCACATTTGCAAAGACAGTTACATGCCAGCATGTGAAATAGCATGTCATGGTAGCAAGGGAGAACTTGTATTTATTTATAAAAACAATCAAAAAGAAAACGCTTAAGAGGGGCACATAGAGTTGGTGTATCTTAGCACCCCTTCTCGGCATACTGCAATTAGCCAGCATTATCAGATGTCAATGTTGATTTTTATTTTCATGAATTATTATGTTGGTGATGGTATCAGTTTTATATTGCTGCCTAACAAATTTCCGTAAATATAGCAGTTTAACACACACTTCTTATCTCTCAGTTTCTGTGTCAGGAGTCCAGGCATGGCTTAGATGGGTCTTCTGCTGTGGGCCTCACAAGGCTATAATCAAGGTGTTGACCAGGCTGCTATCTCAATTGCAAGTTCAGTAGGGGAGAATCTGCTTCCAGTCTCACTCAGGCTGTTGGCAGTAATCATTGCTGTAGCACTGAGGGCTTCAGTTTCTGACTGGCTGTTGTAGAGGCTGCCTGTGGTCCACGTGGCCACTTCATTAAGCCATAAGGAGGATCTCTCTGTCTGCCACCAACATAGAATCTACATCAAGTTGTTTGTTAGAAGCAAGTCCATCTCCATGTCCCACCCACATTCAAGGGAAAGGGGTCACATAAAGTTATGGAGAGTGAGCTGGGTGCCTGTAGTCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCAAGATCACGCCACTGCACTCATCCTGGGCGACAGAGTGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAGAAGTTATGAATACCAGCCAAAAAAAAAAAAATTATGAATACCAGCAGAGCAGGGATCGTTGGAGTCACTTTAGCGTTTGTCTGTCACAGGGTCATTCTGAGGAAAAGTTCTTTCATTTTAAAAGCTGTGGTGGAAATCTGTGAAAATTAGCAGGAGTGTAAACTAAGCCATTTGAGCTTCCATTTTTTTCATTCCCTTGTCCTTTGGTAAATGAGGAAGGGCAGAAGTAAAGTTGCAGCCTTACTTAATAGATGCTGACAGGAAGGAAGAGAGAAGTTAGTAAGGTTGAAAGAACAATGAAATGAGAGAAGATGATATTCTCTATTATCACATGAGGGAAAAATACAAATGATGGATTAAACTATTGTTGGATATGTTCCCTACCTTACTTTGGAGGTCTGGACAACTGAGCGAATCCCTGCTAGACTAACTTATCTCGTAATGTGCTTTCTATGGGAGGGCACCTTTCAGTAACTAAATCAGAAATGTTGCTTTGCATGAAATTGACAATTTCTTAATCCACCCTGATAGCCCAGTCTTCGGGCCTGGAAGGTACTGTGGCCTTGATCTTCATCCCTTGCAATGAAAACATTGCTAACCAGCATGCTTAGGTTATGGTTTAGGCTTTAACATGGCTTATGCGTAAGCCATAAAAAGACTTGAAGGAGAAAAGGAATAAACAGGATCACAGGTGGTTCCCATATGGCCTGTGGCTCAATTGATTGTTCCACGGTCTGTGGTAAGGCAGTAATTTGGAGATTTAGGAGCACATGTTAGGTTTCAGGGACATTTACTTTGCCTTCTAAAATGGCTGCTTTCAAAATCACCCTCAGGTTTAGTACTACCCCAAGCAACTAGAATCCAGATAAAAGGACCCTTCTATGAAATGATTCTTCATTTGTTTTGTAGGGCATGGTTATTCACAGGACTAATCACAATGTGGATGTTTTACATCCTGGGCTGTACACTGTAGCACAGTGCTTGGCACATAAAGATGCTCAGTTAATGCTAATATGACAGCTACTACTACCATTGTTAACATTTACACATTCAGCTCAGTCTCCCTGCATTCTACAACTATAGTAATATAAGATGCCAAATCAGGCCGGGTGCAGTGGCTCACACCTGTAATGCCAGCACTTTGGGAGGCTGAGTGGGTGAATCAGTGAGGTCAGGAGTTTGAGACTAGCCTGGTTAACATAGTGAAACCCCGCATCTCTACTAAAACTTCAAAATTAGCCGGGGATGGTGGCACATGCCTGTAATCCCAGCTACTTGGGAGGCTGAAGCAGGGGAATCACTTGAACCCGGAAGGCGGAGGTTGCAGTCAGCCAAGATTGCACCACTGTTCCAGCCGGGGCGCGGAGTGAGACTCCGTCTCAAAAAAACAACAACAACAACAAAAAAAATAAAAGAAAAAACAGCCGTCATAATACATATTATTTGATGAGGTCTGCGTATCAGGCACTTGCAATATTATGTTTAGCTCTTGTGACAACTATGCAAGATAGGTATTACTTTTAAAATTTTATAGACAATGAAACAATCTCAAATGTACACGAATACTTTTGTAAAGAGATTTTCACCCACATTGATTTACAAGAAAGTACATAAACCAAAGCCCCCTGTCTTTATGTGTTTTCATCAGCCATATTACCATCTAGGCATTGCATAAACTCTAGAGAGAAGTCCTCCAAAAATCATTTTTGGTAATTTTGGGGGAAAACACACAGCTGTCCTGATTTTTAGAAGTGCTATTTGGATCCATGTTAATTTGTCTCATCATAGTGGAAACTGGAGCTCACCATTCACTACTAGTCAATCAATATTCCATGTATTTGTATACCATCACTTTAGATAGTAATTTCACATATCTAATTCCCCACCAGTGCCTTAACTGCGCTTCACTTCCTTCCCTGGGCATTCACCCTCAGCCCAGGCTGTGGTACTCTGCATGGAATGGCCTACCGCTCTACCCCTCACTGTTCCCTAGTCTCTGCAGCCATCAGTAACTTTACCTGGTTAACTTTCTATTCAGCCTACAAGTCTCAGATGAGATGCCTTTTCTTGGCAGAAATTTTCTTAAATTGTCCATTCCTTTTGGTGCCCTGTCCTTGTGATTTTCCTAATATGTCCTTTTCCCATTTGCTTATCCCGATGACTTGCTTTCTCTCACCCATTGGATTGTGAGCCTCTTGTGGTCAGGGGCAGTGCTCTGTAAGCTGCTGTGTCCCAAATCTGGCCCAGTGTAGGCACTCGCAGCTATAGACTGATGTTAAGAGAAAATGCACATTTCATCTCAGCCTCAAGCAGTTCTGGGAAACAGATTGGAAACCAAAGCTCTGCAGAACGTGGGACTCTCTCAGGGCCATCACAACACTGTTGTTGGTCTCATGTTTGGTGACTGGGTCTCCTATTCCTGGTCTCTTTCCTAGGCATAATGCTTTTATATAAAGTCCCTTCCATTGTTTTTTGTTTGTTTTCTTTTTTCAGCCTAAATAACTTAGTTTCTCTAAACTTTTCTCCCAGGGACTCTTTTTAACCCTTTTTTTAATTCTGTTGATTATTATCTTAATAACTTTTATTTTTTTTCCATTTTGCATGTCATATTTTAGCAAAGCATAAAAGGAACACGGCACAAAGCACACCCATATTTTTGGATGCTGTGGATTTCATCATGCTGCTTATTCCATTACTGTGTATAAGTACCTCCAAGGCATTAATGCTGCCTTCCTCCTTCATTTGAAGACTTCCTGTGCAAGGTGGAATATACGTAAGGAGGCAAACAGACTGGGTTATATGCCTGCTCTGCTTTTACAGAGGCCTCTTCCAGGAGTGTAATACGGGGGTTGCTCATACTCTGAAGAAGATAGTGGCAGGCTATCTCATGAGGAGCCAGAACGTGGCTGGCTCTCCAGACATGGCTTGGTGGGTGCCACGTGATTCCTGGGGTGAGCCTTCTGGTGTGAATTCCCTGCTCACTGGGGTGATTCTTCACTTCCCACAGTTCAACCTGCTGTATTATCCTCTTACCTATTCTTTCTGTGATCCATAGAGGTAATTTAATTTTCCAGTCCATGTACCTACCCTGCCTACTTAGTTTCTTATTCAGTGCCACACTTAATTCCTTCACATTTACTGATTAATTAAATGAGAAGACTATGCCAGGTGAAAAGTTTCAAGCATCTTCAGAACTCTACATGATGCATTACCCCTGAGGCTGCCTTTCAATAACTGAGGTGATATTTTGAGCAGTGTGACCTGTTAGAGGTGCCCAGCAGGTCCGATGAAAAGCCCTCTGATTCTTGGAAATAGTGCATTAGTAAAGTATTATAAGTTTATTTTCACAAAGCTAGATTAGTTGTTACATGTTGGTTTTTGTTTTGCCTAGCCCTAACAAGTATGGAGGTGACCTAATTGTGTATTCTATAAGGGATCTGGGGATATCTGGCTGGGTGGGTGGCTCACACCTGTAATCCCAACAATTTGGGAGGCCGAGGTGGGCGGATCACCTGAGGTCAGGAGTTTGAGAGAGGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAATTAGCCAGGTGTGGTGGCAGGTGCCTGCAATCTCAGCTACTCCGGAGGCTAATGCAGGGGAATCACTTGAACCCGGGAGGTAGAGGTTGCAGTGAGCCAAGATTGTGCCACTGCACTCCAGCCTGGGCAACAGAGCGAGATTCTGCCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAATATCAGGAATATCCATTTTATGCTCAAGCTCACATACCTCACAGTTTTCTGGTCCAATTTTTAGGCACTTTATCAGGCCCTCATATGTTTCAAAAATAATTGCTAATGACTTTGATGAAGCTAGGCCAAGTATTTTTTGGTTTTAGGCATTGGGCTATAGTTTGCCACCTTCCTAATTTAATAGAAGAATTTTTAAACTCTGATTCTCCCCCTTCTCAGGGTGGCTTTACTGCCTTTCCCATTCTAGTGCTTCACAGAAATGACAAGCTCACAGGGGACTTATCTGAGGAAAAGGCCGGAGTAAAAATAAGTACAATGTTAAAAAAATCTATCTTATAGTATCATTTATTTTAGAGCTTCCTCTCCTTTTCTAATGAAAGGCTGCTGTAGTTTCCTTTTGTGCTTTTTTTGCTGAAGGCTTTTCAGTAATATTCCCGTGTGTTGCCTGTGATGCTAAAAGCATGAGCTTGGGGGCAGGTTGACTGGCATTCAGGTATTTGTTTCAGCCTCCAGCCGCAAGACAGAGGCGAATAATATTGATCTCATGGAGCTGAAATGAAAATTAACTTTTCTAATCTGTGAAAATGCTTTGTTATAATCCTTAAATACATGAAATCATGGTTAAAATAGCGAGTACCAAGTGCTGACATTATGCCCACAATTGCCACATGCCATGTCCTTATGATTTTTGCCAGATGTTTAAATGAGATTCTAAATGAATCAGGTCTCTAAATGGGCATCTCCTACTCTCTAGGTGTTTCTGTTTCTGCTTCTCTGTTTTTTCTGTTTGTATCTCCATTTATTTTAATGCCTACCATTATGTGAAGTCTGCCACCTTCCTATACATGGATATACCGAGAAATACATTTTAATTATTAATTATTATTTGATATATAATCTTAAAAACAAATGAATGGAATCTTTATTTTTAATCTCTTTTAAAAAAACTCAATTTTTTTTTCACTTACTGATTAAATCTTGAGTCTTTTGCCTCCAGTGGATCAGTGATTTTTCAGCAGAAAATCTTTCCTCTCCATTGCTTTGTGCTTTTGTTGCTAGGCAGTCAACAGCAGGGCTACTAAAGCACTTCTAATTTAAGACAAATCTTTTCCTCTATTTTAGAAATGGATTTCAATGGTGTTCAGTTTGCAGAAACCTACTGAAAGGTATATGGTACAAATATGAATGTTTATAATTCTCCAGAATAACAAAATGAATATTCTGCTGGCTTTTTTTTTTTTTTGAGATGCAGTTTTGTTGTTGCCCAGCTGGAGTGCAATGGTGTGATCTTGACTCACGGCAATTCCGCCTCCCGGGTTCAAACGATTCTCCTGCCTCAGCCTCCCAAGTTGCTGGGATTACAGGCACCCACCATGCCCAATTTTTTGTATTTGGTGACAATGGGGTTTTGCCATGTTGGCTAGGCTGGTCTCAAACTCCTGACCTCAGGTGATCCACCCACCTCGGCCTCCCAAAGTGTTTAGGATTACAGGCTTGAGCCACTCTGCTTGGCCTCTGCTGACTTTTTAAAAAAATAACAGGGAATGTGGGGTCACTCATAGTGTGGAGGGAAATTTTGAGATTTAAGCACCTCTGAATCATATTGGCAGTTATTGCAGCTCTAGGGGTGTTTTGTGCATTTGGTTTTTTCCTTTTAAAAAATCTCCTTCCTTTGCCACGAACAAACTGTTGCATGGGAAATGACATATACATCTATATATATATATATATATATGTATGACATATCACTGTATATATATGACATATCACTATATATATATATATATATATATATATATATATATATATATATATATATTTTTTTTTTTTGATGGCCAGGCAAAGCCTCCTCAACTCAACTGTAGCTTCCTCCTCTTACCTCGCAGTAAGCTGATGACTACTTCTGTGCCTTCTTCTACCTTCCAAAAGTTAAGTTGCCATTAGTCGAGTTGTAAATCTGACTGAACACATTGTGTTAGTCCAGTCTATGTGACTTAGATAAAATACCACAGGTCAAACAAATTCATTTTAAGAGTAGGTGCTAATCAAAGGCGGGCTTCTCTGAGAGGATGCTTGGGTGACAGAAAGAATGGAACCAGTGTCTCAACATCCATGTCTTTGTTGCTCAGTCTCCTAACCTCCTCAACCTTGCATCTGCTTAGAAAGGCCCATGCTGTCTCCTTGGCATCTTGAGGCAGGTCCTTTCCTCTGTTTAGATTGTCCTCCCATCCCTCCTGTCCACTTTGGTTTGGTTAACTTTAGTTATTCTTCAGGATTCAGCTAATATGTGGTCCCTGATTTCCTCCTGCCACCCAGCCTGGGATGCATCCCTCTTAACATGTTTTCATAAT"

In [10]:
read_2 = "TATGTGTTTGCTCTTCGTTCAGTTACGTATTGCTATAACTTTCCTACCTAGAACTCTGCCCCTTGTAGTCATTTCACGTTTTCTCAAACATTGAGATACTGTGTGGAGCAGGCTTTTGCTGGGTGCTCTGTGTTGGCTGATGGAGGATTGAGACACAGTGTGTGTGCTTGTAGGGAAGAGGTTCCCAGAGCCTGGAGCACCTTGTCTTAGCAAAGTTTCTATTTCTCTTCAATGTAACTATGTGGAAGGTTTGCTTTGTGTACAAATAGACATTTATTTTTTCTTCCTGGTATTAAATAGTTTATGAGAAACAACAAACTTGTAAAATAGTCTTCAATCATAAATGAAAGCATATTTTAAATAATTAAAAATATAATAAGACTTACTTAGGTCTTTCCTCTGTTCTTGACCAGGGTGGAGATCTTCATTATCAAGTTTCCTGTAACTAAAAGGGAACAGATCATCTGAGATCTTGCTCCACTTGTCCTTGAGGGTTATTAAGGTTTGTGTGTGACTGGGCATTGGATGGTCCTTCATAATATCAGGGAAGGTTGACAGATGGCCTCCCTCCAGCAAGGGAGAAGGTGCCATGAAGACTGAACACTGCATTGTTCCCACCAATACAGTTTGCGCCAGTATAGTTTCCCCGACCACAACTGAGGAGAAAGAATGTGCTCAGCTCCACTCCTCAGCAGAGCCAGATGCCCAGTACAGCAGAAACAGTGTACCTGTAAAGTTGCCTAGGCGACATCGGAGAATTGGAAGGAAGAGTAGACAGAATGATTTATGAGGGTTTCTTGCCATAGGAATGGGTCTAAAATTATACTTCCATGTCTTTTCCTAAGAAAATAAATAGCACCAGATTTTATCAAGGGGATTGTGATAGTGAGAACCTCAAGACCTTATCAGACCCCAAGGAGACAGGGGCCAAGGGCAAGGGGAGGGGCAGCAGAAGCAACAAATGAGGAAAGAGGGAATTGCTAATATTGTGTTTTAATTCCTAATTGGAACTTGATGCTCTGCCTTGATGTAAAACTGATTTTTAAAAAACATGCAAATATTGTGTATTTAAAAATGGGACTAAAAAAAAACCACTTTACTGCAGGTCAAATGAGATATTTTATTTCCATATTAAAAAAACAGCTCCTCTCAGCATCTACTTTAGTATTAGCACATTTGCAAAGACAGTTACATGCCAGCATGTGAAATAGCATGTCATGGTAGCAAGGGAGAACTTGTATTTATTTATAAAAACAATCAAAAAGAAAACGCTTAAGAGGGGCACATAGAGTTGGTGTATCTTAGCACCCCTTCTCGGCATACTGCAATTAGCCAGCATTATCAGATGTCAATGTTGATTTTTATTTTCATGAATTATTATGTTGGTGATGGTATCAGTTTTATATTGCTGCCTAACAAATTTCCGTAAATATAGCAGTTTAACACACACTTCTTATCTCTCAGTTTCTGTGTCAGGAGTCCAGGCATGGCTTAGATGGGTCTTCTGCTGTGGGCCTCACAAGGCTATAATCAAGGTGTTGACCAGGCTGCTATCTCAATTGCAAGTTCAGTAGGGGAGAATCTGCTTCCAGTCTCACTCAGGCTGTTGGCAGTAATCATTGCTGTAGCACTGAGGGCTTCAGTTTCTGACTGGCTGTTGTAGAGGCTGCCTGTGGTCCACGTGGCCACTTCATTAAGCCATAAGGAGGATCTCTCTGTCTGCCACCAACATAGAATCTACATCAAGTTGTTTGTTAGAAGCAAGTCCATCTCCATGTCCCACCCACATTCAAGGGAAAGGGGTCACATAAAGTTATGGAGAGTGAGCTGGGTGCCTGTAGTCCCAGCTACTTGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCAAGATCACGCCACTGCACTCATCCTGGGCGACAGAGTGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAGAAGTTATGAATACCAGCCAAAAAAAAAAAAATTATGAATACCAGCAGAGCAGGGATCGTTGGAGTCACTTTAGCGTTTGTCTGTCACAGGGTCATTCTGAGGAAAAGTTCTTTCATTTTAAAAGCTGTGGTGGAAATCTGTGAAAATTAGCAGGAGTGTAAACTAAGCCATTTGAGCTTCCATTTTTTTCATTCCCTTGTCCTTTGGTAAATGAGGAAGGGCAGAAGTAAAGTTGCAGCCTTACTTAATAGATGCTGACAGGAAGGAAGAGAGAAGTTAGTAAGGTTGAAAGAACAATGAAATGAGAGAAGATGATATTCTCTATTATCACATGAGGGAAAAATACAAATGATGGATTAAACTATTGTTGGATATGTTCCCTACCTTACTTTGGAGGTCTGGACAACTGAGCGAATCCCTGCTAGACTAACTTATCTCGTAATGTGCTTTCTATGGGAGGGCACCTTTCAGTAACTAAATCAGAAATGTTGCTTTGCATGAAATTGACAATTTCTTAATCCACCCTGATAGCCCAGTCTTCGGGCCTGGAAGGTACTGTGGCCTTGATCTTCATCCCTTGCAATGAAAACATTGCTAACCAGCATGCTTAGGTTATGGTTTAGGCTTTAACATGGCTTATGCGTAAGCCATAAAAAGACTTGAAGGAGAAAAGGAATAAACAGGATCACAGGTGGTTCCCATATGGCCTGTGGCTCAATTGATTGTTCCACGGTCTGTGGTAAGGCAGTAATTTGGAGATTTAGGAGCACATGTTAGGTTTCAGGGACATTTACTTTGCCTTCTAAAATGGCTGCTTTCAAAATCACCCTCAGGTTTAGTACTACCCCAAGCAACTAGAATCCAGATAAAAGGACCCTTCTATGAAATGATTCTTCATTTGTTTTGTAGGGCATGGTTATTCACAGGACTAATCACAATGTGGATGTTTTACATCCTGGGCTGTACACTGTAGCACAGTGCTTGGCACATAAAGATGCTCAGTTAATGCTAATATGACAGCTACTACTACCATTGTTAACATTTACACATTCAGCTCAGTCTCCCTGCATTCTACAACTATAGTAATATAAGATGCCAAATCAGGCCGGGTGCAGTGGCTCACACCTGTAATGCCAGCACTTTGGGAGGCTGAGTGGGTGAATCAGTGAGGTCAGGAGTTTGAGACTAGCCTGGTTAACATAGTGAAACCCCGCATCTCTACTAAAACTTCAAAATTAGCCGGGGATGGTGGCACATGCCTGTAATCCCAGCTACTTGGGAGGCTGAAGCAGGGGAATCACTTGAACCCGGAAGGCGGAGGTTGCAGTCAGCCAAGATTGCACCACTGTTCCAGCCGGGGCGCGGAGTGAGACTCCGTCTCAAAAAAACAACAACAACAACAAAAAAAATAAAAGAAAAAACAGCCGTCATAATACATATTATTTGATGAGGTCTGCGTATCAGGCACTTGCAATATTATGTTTAGCTCTTGTGACAACTATGCAAGATAGGTATTACTTTTAAAATTTTATAGACAATGAAACAATCTCAAATGTACACGAATACTTTTGTAAAGAGATTTTCACCCACATTGATTTACAAGAAAGTACATAAACCAAAGCCCCCTGTCTTTATGTGTTTTCATCAGCCATATTACCATCTAGGCATTGCATAAACTCTAGAGAGAAGTCCTCCAAAAATCATTTTTGGTAATTTTGGGGGAAAACACACAGCTGTCCTGATTTTTAGAAGTGCTATTTGGATCCATGTTAATTTGTCTCATCATAGTGGAAACTGGAGCTCACCATTCACTACTAGTCAATCAATATTCCATGTATTTGTATACCATCACTTTAGATAGTAATTTCACATATCTAATTCCCCACCAGTGCCTTAACTGCGCTTCACTTCCTTCCCTGGGCATTCACCCTCAGCCCAGGCTGTGGTACTCTGCATGGAATGGCCTACCGCTCTACCCCTCACTGTTCCCTAGTCTCTGCAGCCATCAGTAACTTTACCTGGTTAACTTTCTATTCAGCCTACAAGTCTCAGATGAGATGCCTTTTCTTGGCAGAAATTTTCTTAAATTGTCCATTCCTTTTGGTGCCCTGTCCTTGTGATTTTCCTAATATGTCCTTTTCCCATTTGCTTATCCCGATGACTTGCTTTCTCTCACCCATTGGATTGTGAGCCTCTTGTGGTCAGGGGCAGTGCTCTGTAAGCTGCTGTGTCCCAAATCTGGCCCAGTGTAGGCACTCGCAGCTATAGACTGATGTTAAGAGAAAATGCACATTTCATCTCAGCCTCAAGCAGTTCTGGGAAACAGATTGGAAACCAAAGCTCTGCAGAACGTGGGACTCTCTCAGGGCCATCACAACACTGTTGTTGGTCTCATGTTTGGTGACTGGGTCTCCTATTCCTGGTCTCTTTCCTAGGCATAATGCTTTTATATAAAGTCCCTTCCATTGTTTTTTGTTTGTTTTCTTTTTTCAGCCTAAATAACTTAGTTTCTCTAAACTTTTCTCCCAGGGACTCTTTTTAACCCTTTTTTTAATTCTGTTGATTATTATCTTAATAACTTTTATTTTTTTTCCATTTTGCATGTCATATTTTAGCAAAGCATAAAAGGAACACGGCACAAAGCACACCCATATTTTTGGATGCTGTGGATTTCATCATGCTGCTTATTCCATTACTGTGTATAAGTACCTCCAAGGCATTAATGCTGCCTTCCTCCTTCATTTGAAGACTTCCTGTGCAAGGTGGAATATACGTAAGGAGGCAAACAGACTGGGTTATATGCCTGCTCTGCTTTTACAGAGGCCTCTTCCAGGAGTGTAATACGGGGGTTGCTCATACTCTGAAGAAGATAGTGGCAGGCTATCTCATGAGGAGCCAGAACGTGGCTGGCTCTCCAGACATGGCTTGGTGGGTGCCACGTGATTCCTGGGGTGAGCCTTCTGGTGTGAATTCCCTGCTCACTGGGGTGATTCTTCACTTCCCACAGTTCAACCTGCTGTATTATCCTCTTACCTATTCTTTCTGTGATCCATAGAGGTAATTTAATTTTCCAGTCCATGTACCTACCCTGCCTACTTAGTTTCTTATTCAGTGCCACACTTAATTCCTTCACATTTACTGATTAATTAAATGAGAAGACTATGCCAGGTGAAAAGTTTCAAGCATCTTCAGAACTCTACATGATGCATTACCCCTGAGGCTGCCTTTCAATAACTGAGGTGATATTTTGAGCAGTGTGACCTGTTAGAGGTGCCCAGCAGGTCCGATGAAAAGCCCTCTGATTCTTGGAAATAGTGCATTAGTAAAGTATTATAAGTTTATTTTCACAAAGCTAGATTAGTTGTTACATGTTGGTTTTTGTTTTGCCTAGCCCTAACAAGTATGGAGGTGACCTAATTGTGTATTCTATAAGGGATCTGGGGATATCTGGCTGGGTGGGTGGCTCACACCTGTAATCCCAACAATTTGGGAGGCCGAGGTGGGCGGATCACCTGAGGTCAGGAGTTTGAGAGAGGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAATTAGCCAGGTGTGGTGGCAGGTGCCTGCAATCTCAGCTACTCCGGAGGCTAATGCAGGGGAATCACTTGAACCCGGGAGGTAGAGGTTGCAGTGAGCCAAGATTGTGCCACTGCACTCCAGCCTGGGCAACAGAGCGAGATTCTGCCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAATATCAGGAATATCCATTTTATGCTCAAGCTCACATACCTCACAGTTTTCTGGTCCAATTTTTAGGCACTTTATCAGGCCCTCATATGTTTCAAAAATAATTGCTAATGACTTTGATGAAGCTAGGCCAAGTATTTTTTGGTTTTAGGCATTGGGCTATAGTTTGCCACCTTCCTAATTTAATAGAAGAATTTTTAAACTCTGATTCTCCCCCTTCTCAGGGTGGCTTTACTGCCTTTCCCATTCTAGTGCTTCACAGAAATGACAAGCTCACAGGGGACTTATCTGAGGAAAAGGCCGGAGTAAAAATAAGTACAATGTTAAAAAAATCTATCTTATAGTATCATTTATTTTAGAGCTTCCTCTCCTTTTCTAATGAAAGGCTGCTGTAGTTTCCTTTTGTGCTTTTTTTGCTGAAGGCTTTTCAGTAATATTCCCGTGTGTTGCCTGTGATGCTAAAAGCATGAGCTTGGGGGCAGGTTGACTGGCATTCAGGTATTTGTTTCAGCCTCCAGCCGCAAGACAGAGGCGAATAATATTGATCTCATGGAGCTGAAATGAAAATTAACTTTTCTAATCTGTGAAAATGCTTTGTTATAATCCTTAAATACATGAAATCATGGTTAAAATAGCGAGTACCAAGTGCTGACATTATGCCCACAATTGCCACATGCCATGTCCTTATGATTTTTGCCAGATGTTTAAATGAGATTCTAAATGAATCAGGTCTCTAAATGGGCATCTCCTACTCTCTAGGTGTTTCTGTTTCTGCTTCTCTGTTTTTTCTGTTTGTATCTCCATTTATTTTAATGCCTACCATTATGTGAAGTCTGCCACCTTCCTATACATGGATATACCGAGAAATACATTTTAATTATTAATTATTATTTGATATATAATCTTAAAAACAAATGAATGGAATCTTTATTTTTAATCTCTTTTAAAAAAACTCAATTTTTTTTTCACTTACTGATTAAATCTTGAGTCTTTTGCCTCCAGTGGATCAGTGATTTTTCAGCAGAAAATCTTTCCTCTCCATTGCTTTGTGCTTTTGTTGCTAGGCAGTCAACAGCAGGGCTACTAAAGCACTTCTAATTTAAGACAAATCTTTTCCTCTATTTTAGAAATGGATTTCAATGGTGTTCAGTTTGCAGAAACCTACTGAAAGGTATATGGTACAAATATGAATGTTTATAATTCTCCAGAATAACAAAATGAATATTCTGCTGGCTTTTTTTTTTTTTTGAGATGCAGTTTTGTTGTTGCCCAGCTGGAGTGCAATGGTGTGATCTTGACTCACGGCAATTCCGCCTCCCGGGTTCAAACGATTCTCCTGCCTCAGCCTCCCAAGTTGCTGGGATTACAGGCACCCACCATGCCCAATTTTTTGTATTTGGTGACAATGGGGTTTTGCCATGTTGGCTAGGCTGGTCTCAAACTCCTGACCTCAGGTGATCCACCCACCTCGGCCTCCCAAAGTGTTTAGGATTACAGGCTTGAGCCACTCTGCTTGGCCTCTGCTGACTTTTTAAAAAAATAACAGGGAATGTGGGGTCACTCATAGTGTGGAGGGAAATTTTGAGATTTAAGCACCTCTGAATCATATTGGCAGTTATTGCAGCTCTAGGGGTGTTTTGTGCATTTGGTTTTTTCCTTTTAAAAAATCTCCTTCCTTTGCCACGAACAAACTGTTGCATGGGAAATGACATATACATCTATATATATATATATATATATGTATGACATATCACTGTATATATATGACATATCACTATATATATATATATATATATATATATATATATATATATATATATATATTTTTTTTTTTTGATGGCCAGGCAAAGCCTCCTCAACTCAACTGTAGCTTCCTCCTCTTACCTCGCAGTAAGCTGATGACTACTTCTGTGCCTTCTTCTACCTTCCAAAAGTTAAGTTGCCATTAGTCGAGTTGTAAATCTGACTGAACACATTGTGTTAGTCCAGTCTATGTGACTTAGATAAAATACCACAGGTCAAACAAATTCATTTTAAGAGTAGGTGCTAATCAAAGGCGGGCTTCTCTGAGAGGATGCTTGGGTGACAGAAAGAATGGAACCAGTGTCTCAACATCCATGTCTTTGTTGCTCAGTCTCCTAACCTCCTCAACCTTGCATCTGCTTAGAAAGGCCCATGCTGTCTCCTTGGCATCTTGAGGCAGGTCCTTTCCTCTGTTTAGATTGTCCTCCCATCCCTCCTGTCCACTTTGGTTTGGTTAACTTTAGTTATTCTTCAGGATTCAGCTAATATGTGGTCCCTGATTTCCTCCTGCCACCCAGCCTGGGATGCATCCCTCTTAACATGTTTTCATAAT"

In [17]:
read_0[:30]

'TATGTGTGTGCTCTTCGTTCAGTTACGTAT'

In [18]:
read_1[:30]

'TATGTGTGTGCTCTTCGTTCAGTTACGTAT'

In [19]:
read_2[:30]

'TATGTGTTTGCTCTTCGTTCAGTTACGTAT'

In [13]:
read_0[-10:]

'TTTTCATAAT'

In [14]:
read_1[-10:]

'TTTTCATAAT'

In [15]:
read_2[-10:]

'TTTTCATAAT'