# PyVIF: Python Virus Integration Finder

PyVIF detects integration site of virus in human genome using capture pacbio.

Mappings on human genome and virus genome(s) are necessary using [Minimap2](https://github.com/lh3/minimap2).
For instance on hg38:
```
minimap2 -t 8 \
         -La -x map-pb hg38.fasta subreads.fastq \
         | samtools sort -@ 7 \
         -o human.bam \
         && samtools index human.bam
```

Now, these two mapping are analysed using PyVIF.

In [None]:
from pyvif import bamtools, paftools

## Show basic metrics

#### Read the virus bam file

In [None]:
virus_df = bamtools.bam_to_paf("virus.bam", add_unmapped=True)
virus_paf = paftools.PAF(virus_df)

#### Plot read lengths

In [None]:
virus_paf.plot_length()

#### Plot number of pass

In [None]:
virus_paf.plot_number_pass()

## Find breakpoints

In [None]:
from pyvif import bp_finder

In [None]:
bp_found = bp_finder.BreakpointFinder(human="human.bam", virus="virus.bam")

By the way, you could use the previous `virus_df`. Do not forget to remove unmapped reads with `dropna()` method.

In [None]:
bp_found = bp_finder.BreakpointFinder(human="human.bam", virus=virus_df.dropna())

Check how many palindromics are removed

In [None]:
print(len(bp_found.palindromics))

In [None]:
bp_found.plot_bp_positions()

#### Where reads are aligned

In [None]:
bp_found.plot_positions()

In [None]:
print(bp_found.clustering_breakpoints.__doc__)

In [None]:
bp_found.clustering_breakpoints(human_thd=3, virus_thd=2)

In [None]:
bp_found.summarize_human_clustering()

In [None]:
bp_found.get_bp_in_cluster(0)

In [None]:
bp_found.get_alignment_in_cluster(0)

#### Plot where breakpoint connections are located

In [None]:
bp_found.plot_connections_locations(0)

In [None]:
connected = bp_found.get_bp_connections(0)

In [None]:
connected.loc[connected['virus_clust'] != 3]

In [None]:
bp_found.paf.loc['m54063_170105_132732/12387219/0_7197']

In [None]:
bp_found.paf.loc['m54063_170105_132732/42271690/2964_10201']

## PyVIF report

The `pyvif` command run a complete analyse.

```pyvif --human human.bam --virus virus.bam --output pyvif_report.html```

If the design contains control genes, a mapping on those controls can be added.

```pyvif --human human.bam --virus virus.bam --control control.bam --output pyvif_report.html```