Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ jobs:

release:
name: Release
if: ${{ startsWith(github.ref, 'refs/tags/python_bindings-v') }}
if: ${{ startsWith(github.ref, 'refs/tags/py_lib_tsalign-v') }}
runs-on: ubuntu-latest
needs: [linux, windows, macos, sources]
permissions:
Expand Down
132 changes: 128 additions & 4 deletions python_bindings/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,133 @@

[![PyPI](https://img.shields.io/pypi/v/tsalign)](https://pypi.org/project/tsalign/)

These bindings are still very minimal and are subject to improvement and/or breaking changes with future versions.
Python bindings for the template switch aligner. Aligns two DNA sequences
while detecting template switches — short-range translocations where a query
region is copied from (or aligns to) a different location, possibly on the
reverse complement strand.

## Usage
Install with `pip install tsalign`.
## Installation

The most important function is `tsalign.align(reference, query, **settings)`. On the object that is returned, you can e.g. call `.stats()` or `.cigar()`.
```bash
pip install tsalign
```

## Quick start

```python
import tsalign

result = tsalign.align("ACGTACGT", "ACGACGT")
print(result.cigar()) # compact alignment string
print(result.stats()) # cost, duration, node counts, …
```

## Aligner options

Create an `Aligner` once and reuse it for many sequences:

```python
aligner = tsalign.Aligner(
min_length_strategy="preprocess_lookahead", # default: "lookahead"
chaining_strategy="lower_bound", # default: "none"
total_length_strategy="maximise", # default: "maximise"
no_ts=False, # set True for plain gap-affine
)

result = aligner.align("ACGTACGT", "ACGACGT")
```

## Custom cost configuration

Costs are specified in `.tsa` format. Use `sample_tsa_config/config.tsa`
as a starting point and consult the main repository README for a description
of each parameter.

```python
aligner = tsalign.Aligner(costs_file="sample_tsa_config/config.tsa")
result = aligner.align("ACGTACGT", "ACGACGT")
```

You can also pass the cost string directly:

```python
with open("my_costs.tsa") as f:
cost_str = f.read()
aligner = tsalign.Aligner(costs=cost_str)
```

## Restricting the alignment range

Use `AlignmentRange` to align only a window of the input sequences:

```python
from tsalign import Aligner, AlignmentRange

aligner = Aligner()
result = aligner.align(
"NNNACGTACGTNNN",
"ACGACGT",
range=AlignmentRange(reference_start=3, reference_end=11),
)
print(result.cigar())
```

Individual start/limit keyword arguments are also accepted when `range` is
not provided:

```python
result = aligner.align(
"NNNACGTACGTNNN",
"ACGACGT",
reference_start=3,
reference_limit=11,
)
```

## Working with alignment operations

`alignment.alignments()` returns a typed list of `(count, op)` pairs:

```python
from tsalign import align, TemplateSwitchEntranceOp, TemplateSwitchExitOp

result = align(reference, query)
for count, op in result.alignments():
if isinstance(op, TemplateSwitchEntranceOp):
print(f"Template switch: {op.direction}, primary={op.primary}, offset={op.first_offset}")
elif isinstance(op, TemplateSwitchExitOp):
print(f"Exit, anti-primary gap: {op.anti_primary_gap}")
else:
# SimpleAlignmentOp — a basic edit in the primary or secondary track
print(f"{count}x {op.kind}")
```

## Visualisation

```python
result = tsalign.align(reference, query)
result.viz_template_switches() # prints ASCII art to stdout
```

## Limiting search resources

```python
result = aligner.align(
reference,
query,
cost_limit=100, # return None if cost would exceed this
memory_limit=500_000, # return None if memory exceeds this number of bytes
)
if result is None:
print("No alignment found within limits")
```

## Accepted sequence types

Any object whose `str()` representation is a valid DNA string (ACGTN) is
accepted — including `Bio.Seq`:

```python
from Bio.Seq import Seq
result = tsalign.align(Seq("ACGTACGT"), Seq("ACGACGT"))
```
2 changes: 2 additions & 0 deletions python_bindings/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,5 @@ classifiers = [
dynamic = ["version"]
[tool.maturin]
features = ["pyo3/extension-module"]
module-name = "tsalign._tsalign"
python-source = "python"
Loading
Loading