# New cut implementation

The most important thing is that cuts are now represented as `((cut_watson, ovhg), enz)`:

- `cut_watson` is a positive integer contained in `[0,len(seq))`, where `seq` is the sequence that will be cut. It represents the position of the cut on the watson strand, using the full sequence as a reference. By "full sequence" I mean the one you would get from `str(Dseq)`. See example below.
- `ovhg` is the overhang left after the cut. It has the same meaning as `ovhg` in the `Bio.Restriction` enzyme objects, or pydna's `Dseq` property.
- `enz` is the enzyme object. It's not necessary to perform the cut, but can be used to keep track of which enzyme was used.

The new implementation of `Dseq.cut` now looks like this:

```python
cutsites = self.get_cutsites(*enzymes)
cutsite_pairs = self.get_cutsite_pairs(cutsites)
return tuple(self.apply_cut(*cs) for cs in cutsite_pairs)
```

Let's go through it step by step

In [1]:
from pydna.dseq import Dseq
from Bio.Restriction import EcoRI, PacI

dseq = Dseq('aaGAATTCaaGAATTCaa')

# what this function does is basically handle the format of the enzymes, and return the cut positions
# that are returned by enz.search, along with the enzyme name and overhang. Positions are made zero-base
# instead of one-based

print('get_cutsites output:', dseq.get_cutsites([EcoRI]))
print('EcoRI.search output:', EcoRI.search(dseq), '< (positions are 1-based)')
print('EcoRI.ovhg:', EcoRI.ovhg)
print()

# Below are two examples of circular sequences with a cutsite that spans the origin.
dseq = Dseq('TTCaaGAA', circular=True)
print('get_cutsites output:', dseq.get_cutsites([EcoRI]))
print('EcoRI.search output:', EcoRI.search(dseq, linear=False), '< (positions are 1-based)')
print('EcoRI.ovhg:', EcoRI.ovhg)
print()

dseq = Dseq('TTAAaaTTAA', circular=True)
print('get_cutsites output:', dseq.get_cutsites([PacI]))
print('PacI.search output:', PacI.search(dseq, linear=False), '< (positions are 1-based)')
print('PacI.ovhg:', PacI.ovhg)
print()


get_cutsites output: [((3, -4), EcoRI), ((11, -4), EcoRI)]
EcoRI.search output: [4, 12] < (positions are 1-based)
EcoRI.ovhg: -4

get_cutsites output: [((6, -4), EcoRI)]
EcoRI.search output: [7] < (positions are 1-based)
EcoRI.ovhg: -4

get_cutsites output: [((1, 2), PacI)]
PacI.search output: [2] < (positions are 1-based)
PacI.ovhg: 2



Note in the above printed output how if the ovhg is negative, for an origin spanning cutsite, the position lies on the left side of the origin, and viceversa.

Below, you can see that the `cut_watson` is defined with respect to the "full sequence"

In [2]:
# `cut_watson` is defined with respect to the "full sequence"
for ovhg in [-1, 0, 1]:
    dseq = Dseq.from_full_sequence_and_overhangs('aaGAATTCaa', ovhg, 0)
    print(dseq.__repr__())
    print('ovhg:', ovhg, '>>', dseq.get_cutsites([EcoRI]))
    print()


Dseq(-10)
aaGAATTCaa
 tCTTAAGtt
ovhg: -1 >> [((3, -4), EcoRI)]

Dseq(-10)
aaGAATTCaa
ttCTTAAGtt
ovhg: 0 >> [((3, -4), EcoRI)]

Dseq(-10)
 aGAATTCaa
ttCTTAAGtt
ovhg: 1 >> [((3, -4), EcoRI)]



Cuts are only returned if the recognition site and overhang are on the double-strand part of the sequence.

In [8]:

seq = Dseq('GAATTC')
print(seq.get_cutsites([EcoRI]))

seq = Dseq.from_full_sequence_and_overhangs('GAATTC', -1, 0)
print(seq.get_cutsites([EcoRI]))

[((1, -4), EcoRI)]
[]




## Pairing cutsites

A fragment produced by restriction is represented by a tuple of length 2 that may contain cutsites or `None`:

- Two cutsites: represents the extraction of a fragment between those two cutsites, in that orientation. To represent the opening of a circular molecule with a single cutsite, we put the same cutsite twice. See below.
- `None`, cutsite: represents the extraction of a fragment between the left edge of linear sequence and the cutsite.
- cutsite, `None`: represents the extraction of a fragment between the cutsite and the right edge of a linear sequence.

## Generating the sequence

To get the fragment, we use the function `dseq.apply_cut`, passing the two elements of the tuple as arguments.

In [6]:
dseq = Dseq('aaGAATTCaaGAATTCaa')
cutsites = dseq.get_cutsites([EcoRI])

cutsite_pairs = dseq.get_cutsite_pairs(cutsites)
pair_types = ['None, cutsite', 'cutsite, cutsite', 'cutsite, None']

for pair, pair_type in zip(cutsite_pairs, pair_types):
    print('>', pair_type, ':',pair)
    print(dseq.apply_cut(*pair).__repr__())
    print()

# Opening a circular sequence
print('Circular molecule')
dseq = Dseq('TTCaaGAA', circular=True)
cutsites = dseq.get_cutsites([EcoRI])
cutsite_pairs = dseq.get_cutsite_pairs(cutsites)
print('> cutsite, cutsite :', cutsite_pairs[0])
print(dseq.apply_cut(*cutsite_pairs[0]).__repr__())

> None, cutsite : (None, ((3, -4), EcoRI))
Dseq(-7)
aaG
ttCTTAA

> cutsite, cutsite : (((3, -4), EcoRI), ((11, -4), EcoRI))
Dseq(-12)
AATTCaaG
    GttCTTAA

> cutsite, None : (((11, -4), EcoRI), None)
Dseq(-7)
AATTCaa
    Gtt

Circular molecule
> cutsite, cutsite : (((6, -4), EcoRI), ((6, -4), EcoRI))
Dseq(-12)
AATTCaaG
    GttCTTAA


In [7]:
# Note that the cutsite respects the ovhg of the parent sequence:
dseq = Dseq.from_full_sequence_and_overhangs('aaGAATTCaaGAATTCaa', 1, 1)
f1, f2, f3 = dseq.cut([EcoRI])
print(f1.__repr__())
print()
print(f3.__repr__())


Dseq(-7)
 aG
ttCTTAA

Dseq(-7)
AATTCaa
    Gt
