## OPT Tutorial

Our paper, "Evidence of off-target probe binding in the 10x Genomics Xenium v1 Human Breast Gene Expression Panel compromises accuracy of spatial transcriptomic profiling," discusses off-target binding in the Xenium v1 Human Breast Gene Expression Panel. Here, we will run through the OPT pipeline of how we obtained our results while also sharing how to effectively use OPT on your own probe sequences.

First, head to the [README.md](https://github.com/JEFworks-Lab/off-target-probe-tracker) and install the OPT pipeline with the necassary dependencies.

We have provided the 3 reference annotations: gencode, refseq, and chess, that were used in our paper. They are located in the "data" folder within the "off-target-probe-tracker" directory. You will need to unzip them to use them, which can be done as followed:

In [None]:
# for gencode
!cd data/gencode
!gunzip *.gz

# for refseq
!cd data/refseq
!gunzip *.gz

# for chess
!cd data/chess
!gunzip *.gz

You will also need to unzup the xenium breast fasta file:

In [None]:
!cd data/probes
!gunzip *.gz

If you have your own probe sequences you want to check, for consistency and so you can copy and paste this code best we recommend adding your probes in the "data/probes" folder. To note, the current expected header format for the probe sequence is:


`>gene_id|gene_name|accession`

As discussed in the [README.md](https://github.com/JEFworks-Lab/off-target-probe-tracker), there are 3 main modules in OPT: `flip`, `track`, and `stat`.

The `flip` module will allow all your probes to be on the same same strand orientation as your intended target gene. We highly recommend running all probes through this module to confirm all probes are set for downstream alignment. To run the xenium probes through the `flip` module, you can run the following command:

In [None]:
!opt -o xenium/ -p 10 flip -i data/probes/xenium_human_breast_gene_expression_panel_probe_sequences.fasta -a data/gencode/gencode.v47.basic.annotation.fmted.gff -f data/gencode/gencode.v47.basic.annotation.fmted.fa

Here, we are creating a new output directory named "xenium", allowing 10 cores to be used with the `-p` flag, and have input the needed files of the probe sequences `-i`, gff of reference annotation `-a`, and fa of reference annotation `-f`, in which we use gencode for this example.

This will output files in the "xenium" directory, one of which will be the `fwd_oriented.fa` that contains all probes in the correct orientation for OPT to proceed. What this should look like after running is:

Next, we will use the `track` module to align all the `fwd_oriented.fa` probes to the transcriptome as follows:

In [None]:
!opt -o xenium/ -p 10 track -q xenium/fwd_oriented.fa -a data/gencode/gencode.v47.basic.annotation.fmted.gff -t data/gencode/gencode.v47.basic.annotation.fmted.fa

Here, we are using our output directory named "xenium", allowing 10 cores to be used with the `-p` flag, and have input the needed files of the forward oriented probe sequences `-q`, gff of reference annotation `-a`, and fa of reference annotation `-t`, in which we use gencode for this example. The output on terminal should look something like this:

This will output tsv file, `probe2targets.tsv`, that contains the gene and transcript information to which each probe aligns. To better understand this file and extract understable statistics, we run the `stat` module:

In [None]:
!opt -o xenium/ stat -i xenium/probe2targets.tsv -q xenium/fwd_oriented.fa -s data/gene_synonyms.csv

Here, we are using our output directory named "xenium" and have input the needed files: `probe2targets.tsv` `-i`, the forward oriented probes `-q`, and an optional parameter `-s` that allows you to provide gene synonyms that may have been counted as off-targets but simply differ in name. The terminal should again look something like this:

From the `stat` module we get a few important files.
1. `collapsed_summary.tsv` - this will give us results similar to table 1 of the paper where we have the target gene, number of probes for that gene, which genes the probes aligned to, how many alignment hits, and how many probes were aligned to each gene. This is a great table to observe which genes are affected by off-target activity.
2. `probes2target.tsv` - this file is provided after the track module, but gives important information such as which probes aligned to which gene as well as the CIGAR string for these probes. It also shares what transcript type the probe is aligned to (ex. protein coding or lncRNA)
3. `stat_missed_genes.txt` and `stat_missed_probes.txt.` - provide the genes and probes without alignments
4. `stat_off_target_genes.txt` and `stat_off_target_probes.txt.` - provide the genes and probes with off-target activity
5. `stat_summary.tsv` - a table similar to `collapsed_summary.tsv` but all new-lines are one alignment of probes.
6. More to come!

Hopefully this tutorial shows you how to use and intepret OPT results. Another thing to note, the `-pl` flag can be used to change the number of errors allowed on the terminal ends of the probes during alignment - simply add this flag to the `track` module to obtain these results. Feel free to reach out or open an issue for questions/comments/concerns!