# Partial Digest Algorithm


The Partial Digest Problem (PDP) is a classic problem in computational biology, where we are given a set of pairwise distances between points on a line, and we need to reconstruct the positions of these points. The goal is to find the positions of the points on the line (which could represent restriction sites in a DNA sequence) from the given set of pairwise distances (which could be obtained from a DNA fragment). The Partial Digest Algorithm is a recursive, backtracking approach to reconstruct the positions of points from a set of pairwise distances. It is used in bioinformatics, especially for problems like DNA restriction mapping. By trying different placements for points and backtracking when necessary, the algorithm can solve the problem efficiently.

The Partial Digest Algorithm is a combinatorial algorithm used to reconstruct a set of points on a line given the pairwise distances between them. Given a multiset D of distances between pairs of points, we need to find the positions of the points that generate this multiset.

Here's how we can implement it step by step:

1. Start by identifying the largest distance L in D, which corresponds to the distance between the two outermost points and place the points at positions 0 and L.
2. Recursively place the remaining points while removing distances from D.
3. Backtrack if an invalid placement is found.


In [1]:
from collections import Counter

## Removing Distances

We need a helper function to remove distances from the multiset `D` and ensure that all required distances are accounted for (i.e., handling multiple occurrences of distances). This function returns a new multiset with the distances removed if possible, or None if the removal is invalid.

In [2]:
def remove_distances(D, distances):
    D_count = Counter(D)  # Multiset behavior using Counter
    for d in distances:
        if D_count[d] > 0:
            D_count[d] -= 1
        else:
            return None  # If we can't remove the required distance, return None
    return list(D_count.elements())  # Return the remaining multiset as a list


## Recursive Placement Function

The `place_point` function will attempt to place points by picking the largest distance and checking whether it's possible to place a new point either at `max_dist` from 0 or `L - max_dist` from the outermost point. This function returns a list of positions that satisfy the distance multiset `D`, or `None` if no solution exists.

In [3]:
def place_point(D, positions, L):
    if not D:
        return positions  # If D is empty, we've placed all points correctly

    max_dist = max(D)  # The largest remaining distance in D

    # Try placing a new point at max_dist from 0 (right side placement)
    possible_point = max_dist
    new_distances = [abs(possible_point - p) for p in positions]
    new_D = remove_distances(D, new_distances)

    if new_D is not None:
        positions.append(possible_point)
        result = place_point(new_D, positions, L)
        if result:
            return result
        positions.pop()  # Backtrack if placing at max_dist didn't work

    # Try placing the new point at L - max_dist (left side placement)
    possible_point = L - max_dist
    new_distances = [abs(possible_point - p) for p in positions]
    new_D = remove_distances(D, new_distances)

    if new_D is not None:
        positions.append(possible_point)
        result = place_point(new_D, positions, L)
        if result:
            return result
        positions.pop()  # Backtrack if placing at L - max_dist didn't work

    return None  # If neither placement worked, return None


## Main Function

The next function is the main function that sets up the initial conditions (placing points at 0 and `L`) and calls the recursive placement function.

In [4]:
def partial_digest(D):
    L = max(D)  # The largest distance corresponds to the distance between the outermost points
    D.remove(L)  # Remove L from the multiset
    positions = [0, L]  # Place points at 0 and L

    # Start the recursive placement
    result = place_point(D, positions, L)

    return sorted(result) if result else None


## Testing the Algorithm

In [5]:
D = [2, 2, 3, 3, 4, 5, 6, 7, 8, 10]
positions = partial_digest(D)
print(f"Positions: {positions}")

Positions: [0, 3, 6, 8, 10]
