## Introduction

### What is introgression

Introgression is the phenomenon that genetic materials move from one lineage into another lineage between two different species.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/b/b8/Homo_sapiens_lineage.svg/1200px-Homo_sapiens_lineage.svg.png" alt="human_introgression" width="500"/>

**Figure 1 Interbreeding between archaic and modern humans.** This figure is from https://en.wikipedia.org/wiki/Interbreeding_between_archaic_and_modern_humans.

### Strategies for indentifying introgressed regions

There are two major strategies to indentify introgressed regions.
1. If we have genomes from source populations, for example, genomes from Neanderthals, we can compare modern human genomes with Neanderthal genomes directly.
2. If we do not have source genomes, for example, we want to indentify introgression from an unknown lineage (ghost introgression), we can apply some statistical models to find out unusal regions.

Several tools can indentify introgression without source populations.
- [SkovHMM](https://github.com/LauritsSkov/Introgression-detection)
- [sstar](https://github.com/admixVIE/sstar)
- [SPrime](https://github.com/browning-lab/sprime)

In this tutorial, we will try SPrime.

## SPrime pipeline

<img src="https://ars.els-cdn.com/content/image/1-s2.0-S2666166721002574-fx1.jpg" alt="sprimepipeline"/>

**Figure 2 The SPrime pipeline.** This figure is from [Zhou and Browing (2021)](https://doi.org/10.1016/j.xpro.2021.100550).


### Download SPrime

We can download `SPrime` using the command `wget`.

```
wget https://faculty.washington.edu/browning/sprime.jar
```

### Run SPrime

Here, we provide two datasets for detecting introgressed fragments in the CEU population with `SPrime`.
1. The chromosome 21 with biallelic SNPs from the YRI and CEU populations.
2. The TLR cluster in the chromosome 4 with biallelic SNPs from the YRI and CEU populations.

Then we can use the following commands.

```
java -Xmx1g -jar sprime.jar gt=./data/chr21.YRI.CEU.biallelic.snps.vcf.gz outgroup=./data/YRI.list map=./data/plink.chr21.GRCh37.map out=chr21.CEU.introgressed
```
and
```
java -Xmx1g -jar sprime.jar gt=./data/chr4.YRI.CEU.TLR.biallelic.snps.vcf.gz outgroup=./data/YRI.list map=./data/plink.chr4.GRCh37.map out=chr4.CEU.TLR.introgressed
```

- The argument `gt` specifies a VCF file containing the genetic data we want to use.
- The argument `outgroup` specifies a file containing samples without introgression (we assume), for example, the YRI population.
- The argument `map` specifies a file containing a genetic map associated with the VCF file we want to use.
- The argument `out` specifies the prefix of the output files.

Other arguments in `SPrime` can be found in its [manual](https://github.com/browning-lab/sprime) or use the command `java -jar sprime.jar`.

### Process SPrime output

There are two output files for each dataset after running `SPrime`. One ended with `.log` contains a summary of the analysis from `SPrime`, the other one ended with `.score` contains the results we want to have a look at.

An example for the `score` file is below.

```
CHROM   POS     ID      REF     ALT     SEGMENT ALLELE  SCORE
21      14410518        rs186656431     T       C       2       1       470440
21      14415190        rs199840792     G       T       2       1       470440
21      14426851        rs138088002     T       C       2       1       470440
```
- The `CHROM` column is the name of the chromosome.
- The `POS` column is the position of the variant.
- The `ID` column is the name of the variant.
- The `REF` column is the reference allele of the variant.
- The `ALT` column is the alternative allele of the variant.
- The `SEGMENT` column is the index of the introgressed segment.
- The `ALLELE` column is the introgressed allele.
- The `SCORE` column is the *S'* score.

However, we want to convert the score file into [BED format](https://genome.ucsc.edu/FAQ/FAQformat.html#format1), which is more easier to be visualized.

Here we provide a customized `Python` script to convert a score file into a BED file with the following commands.

```
python process_sprime_output.py chr21.CEU.introgressed.score chr21.CEU.introgressed.bed
```
and
```
python process_sprime_output.py chr4.CEU.TLR.introgressed.score chr4.CEU.TLR.introgressed.bed
```

### Check results

We can simply print out `chr4.CEU.TLR.introgressed.bed` in the terminal with the command `cat chr4.CEU.TLR.introgressed.bed`.

```
4       38760338        38905731        0       239319
```
- The first column is the name of the chromosome.
- The second column is the start position of the introgressed region.
- The third column is the ending position of the introgressed region.
- The fourth column is the index of the introgressed segment from `SPrime`.
- the fifth column is the *S'* score estimated by `SPrime`. If this score is larger than a threshold (the default is 100,000), then `SPrime` would assign the region as a introgressed region.

This is the TLR cluster under adaptive introgression suggested by [Dannemann et al. (2015)](http://dx.doi.org/10.1016/j.ajhg.2015.11.015).

<img src="https://els-jbs-prod-cdn.jbs.elsevierhealth.com/cms/attachment/6110f2d8-da32-4138-85bb-13e4e79c7b05/gr1.jpg" />

**Figure 3 The introgressed region encompassing the genes *TLR10*, *TLR1*, and *TLR6* on chromosome 4.** This figure is from [Dannemann et al. (2015)](http://dx.doi.org/10.1016/j.ajhg.2015.11.015).

For the chromosome 21, we can use [the UCSC Genome Browser](https://genome.ucsc.edu/) to visualize the results.