# Sars-CoV-2 alignments

The SARS-CoV2 virus is a small and simple genome frequently sequenced during the pandemic. 

We can look at this sequencing data in JBrowse, and locate SNPs which could correspond to
particular evolutionary strains of the virus (omicron) and even reaveal new strains.

## Data

In [12]:
!ls -1 data/sars-cov2/

NC_045512.2.fasta.gz
NC_045512.2.fasta.gz.fai
NC_045512.2.fasta.gz.gzi
out.sorted.bam
out.sorted.bam.csi
SRR29225150.fastq.gz


The `data/sars-cov2` directory, relative to this notebook, contains the following files:
- `NC_045512.2.fasta.gz*` : reference Sars-CoV2 genome from [NCBI](https://www.ncbi.nlm.nih.gov/nuccore/1798174254), and its indices generated by `samtools faidx`
- `SRR29225150.fastq.gz` : an Oxford Nanopore long read sequencing of the virus, part of experiment [SRX24744563](https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR29225150&display=download)
- `out.sorted.bam` and `out.sorted.bam.csi` : BAM file and index generated from the FASTQ file using [`minimap2`](https://github.com/lh3/minimap2) as follows
```
minimap2 -x map-ont -a NC_045512.2.fasta.gz SRR29225150.fastq.gz | samtools sort -o out.sorted.bam --write-index -
```    

## Browsing

We display a LGV widget including the reference sequence and alignment tracks.

In [1]:
from pyjb import LGVWidget, Fasta, Bam

In [4]:
# Aspergillus Nidulans
sarsCov2 = Fasta(
    sequence='data/sars-cov2/NC_045512.2.fasta.gz',
    name="sars-cov2",
)
align = Bam(
    name='Alignments',
    track='data/sars-cov2/out.sorted.bam',
    index_file='data/sars-cov2/out.sorted.bam.csi',
    index_type='CSI',
    assembly=sarsCov2
)

In [5]:
LGVWidget(
    assembly=sarsCov2, 
    tracks=[align]
)

LGVWidget(assembly=Fasta(sequence='data/sars-cov2/NC_045512.2.fasta.gz', name='sars-cov2', type='bgzipFasta', â€¦