# All-vs-All Dotplot Between Two Sequence Indices

This tutorial demonstrates how to compare sequences from **two separate FASTA files** (e.g. two genome assemblies) using `CrossIndexPaf`.  Each FASTA file is assigned to its own group, and pairwise alignments are computed between all sequences in group A and all sequences in group B.

## Overview

1. Create a `CrossIndexPaf` and load two FASTA files into groups A and B.
2. Retrieve all cross-group PAF alignments with `get_paf_all()`.
3. Build a `PafAlignment` from those records for contig reordering.
4. Plot the sorted all-vs-all dotplot with `DotPlotter`.

In [None]:
from rusty_dot import SequenceIndex
from rusty_dot.dotplot import DotPlotter
from rusty_dot.paf_io import CrossIndexPaf, PafAlignment, PafRecord

## 1. Create example sequences

For demonstration we build two small sets of sequences in memory.  In real usage you would call `cross.load_fasta('genome_a.fasta', group='a')` and `cross.load_fasta('genome_b.fasta', group='b')`.

In [None]:
# Sequences for "genome A" — three contigs of different lengths
genome_a = {
    'contigA1': 'ACGTACGTACGTACGTACGT' * 10,  # 200 bp
    'contigA2': 'TACGTACGTACGTACGTACG' * 5,  # 100 bp
    'contigA3': 'GCGCGCGCGCGCGCGCGCGC' * 3,  # 60 bp
}

# Sequences for "genome B" — three contigs
genome_b = {
    'contigB1': 'ACGTACGTACGTACGTACGT' * 8,  # 160 bp  (similar to contigA1)
    'contigB2': 'GCGCGCGCGCGCGCGCGCGC' * 4,  # 80 bp   (similar to contigA3)
    'contigB3': 'TTTTAAAAAGGGGCCCCTTTT' * 2,  # 42 bp   (no matches)
}

## 2. Build a CrossIndexPaf

`CrossIndexPaf` holds one internal `SequenceIndex` that contains sequences from *both* groups, using internal prefixes (`a:` / `b:`) to prevent name collisions.

In [None]:
cross = CrossIndexPaf(k=10)

for name, seq in genome_a.items():
    cross.add_sequence(name, seq, group='a')

for name, seq in genome_b.items():
    cross.add_sequence(name, seq, group='b')

print(cross)

## 3. Retrieve all cross-group PAF alignments

In [None]:
paf_lines = cross.get_paf_all(merge=True)

print(f'Total PAF lines: {len(paf_lines)}')
for line in paf_lines[:5]:
    print(line)

## 4. Build a PafAlignment for contig reordering

Parse the raw PAF strings into `PafRecord` objects so we can use the `reorder_contigs` method to maximise collinearity.

In [None]:
records = [PafRecord.from_line(line) for line in paf_lines]
aln = PafAlignment.from_records(records)

q_sorted, t_sorted = aln.reorder_contigs(
    query_names=cross.query_names,
    target_names=cross.target_names,
)

print('Sorted query (genome A) contigs:', q_sorted)
print('Sorted target (genome B) contigs:', t_sorted)

> **Note:** Contigs with no cross-group matches (e.g. `contigB3`) are placed at the end, sorted by descending length.

## 5. Build a combined index for plotting

`DotPlotter` requires a single `SequenceIndex` containing all sequences to be plotted.  We create one by adding sequences from both genomes.

In [None]:
combined_idx = SequenceIndex(k=10)

for name, seq in genome_a.items():
    combined_idx.add_sequence(name, seq)

for name, seq in genome_b.items():
    combined_idx.add_sequence(name, seq)

print(combined_idx)

## 6. Plot the all-vs-all dotplot with relative scaling

Pass `scale_sequences=True` so that each subplot's width and height are proportional to the lengths of the compared sequences.

In [None]:
plotter = DotPlotter(combined_idx)

plotter.plot(
    query_names=q_sorted,
    target_names=t_sorted,
    output_path='/tmp/cross_index_dotplot.png',
    figsize_per_panel=4.0,
    scale_sequences=True,
    title='Genome A vs Genome B — collinearity-sorted contigs',
    dpi=100,
)

from IPython.display import Image

Image('/tmp/cross_index_dotplot.png')

## Using `CrossIndexPaf.reorder_contigs` directly

A convenience wrapper is also available on the `CrossIndexPaf` object itself, which internally calls `SequenceIndex.optimal_contig_order`.

In [None]:
q_opt, t_opt = cross.reorder_contigs()
print('Optimal query order:', q_opt)
print('Optimal target order:', t_opt)