## Using `Range` to Annotate a Sequence Region

In this example, we will explore how to define continuous regions on a sequence using the `Range` location type in SBOL. The `Range` specifies a span of positions between a defined start and end point. This is useful for annotating features such as coding sequences or regulatory elements.

We will demonstrate two `Range` annotations on a DNA sequence:
1. **Inline orientation**: A region from positions 5 to 20 on the forward strand.
2. **Reverse complement orientation**: A region from positions 25 to 41 on the reverse strand.

This approach illustrates how SBOL makes it easy to annotate sequence features on both strands, highlighting the versatility of SBOL for standardized biological data representation.

In [12]:
import sbol2
from Bio.Seq import Seq  # Biopython for reverse complement

# Create an SBOL document
doc = sbol2.Document()

# Set a namespace for the document
sbol2.setHomespace('https://github.com/SynBioDex/SBOL-Notebooks')


sequence_elements = 'ATGCGTACGTAGCTAGTCTGATCGTAGCTAGTCGATGCAGGGC'
seq = sbol2.Sequence('example_sequence')
seq.elements = sequence_elements
seq.encoding = sbol2.SBOL_ENCODING_IUPAC

# Add the sequence to the document
doc.addSequence(seq)

# Create a ComponentDefinition for the sequence
comp_def = sbol2.ComponentDefinition('example_component', sbol2.BIOPAX_DNA)
comp_def.sequences = [seq.persistentIdentity]

# Add the ComponentDefinition to the document
doc.addComponentDefinition(comp_def)

# --- Range 1: Forward Strand (SBOL_ORIENTATION_INLINE) ---
# Create a SequenceAnnotation for a region on the forward strand
forward_annotation = sbol2.SequenceAnnotation('forward_region')

# Define a Range from position 5 to 20 on the forward strand
# Note - by default, the orientation is set to Inline (i.e. Forward)
forward_range = sbol2.Range('forward_range', 5, 20)

# Add the Range to the SequenceAnnotation
forward_annotation.locations.add(forward_range)

# Add the SequenceAnnotation to the ComponentDefinition
comp_def.sequenceAnnotations.add(forward_annotation)

# --- Range 2: Reverse Complement (SBOL_ORIENTATION_REVERSE_COMPLEMENT) ---
# Create a SequenceAnnotation for a region on the reverse complement strand
reverse_annotation = sbol2.SequenceAnnotation('reverse_complement_region')

# Define a Range from position 25 to 41 on the reverse complement strand
reverse_range = sbol2.Range('reverse_complement_range', 25, 41)
reverse_range.orientation = sbol2.SBOL_ORIENTATION_REVERSE_COMPLEMENT

# Add the Range to the SequenceAnnotation
reverse_annotation.locations.add(reverse_range)

# Add the SequenceAnnotation to the ComponentDefinition
comp_def.sequenceAnnotations.add(reverse_annotation)

In [13]:
# Validate the document to ensure compliance with SBOL standards
doc.validate()

'Valid.'

In [14]:
# Save the document to an SBOL file
doc.write('range_example.xml')

'Valid.'

## Extracting and Displaying Annotated Regions

After defining the `Range` annotations, we can now extract and display the regions from the sequence. 

1. **Original Sequence**: First, we print the original sequence so that we can reference it while extracting the annotated regions.
2. **Forward Strand Region**: For the inline (forward) strand annotation, the sequence is sliced between positions 5 and 20. Since Python uses zero-based indexing, the start position is adjusted by subtracting 1.
3. **Reverse Complement Sequence**: We then compute the reverse complement of the entire sequence to properly interpret the reverse strand region.
4. **Reverse Complement Region**: For the reverse complement annotation (positions 25 to 41), we extract the region from the reverse complement sequence. The correct positions are calculated by adjusting for Python's zero-based indexing and reverse strand orientation.

These steps allow us to directly extract the regions of interest for both forward and reverse orientations from the sequence.

In [15]:
# --- Extract and display the annotated regions ---
# Get the original sequence
print(f"Original sequence: {sequence_elements}")

# Adjust the forward strand indices for zero-based indexing
forward_region = sequence_elements[forward_range.start-1:forward_range.end]
print(f"Forward strand region (positions {forward_range.start} to {forward_range.end}): {forward_region}")

# Compute the reverse complement of the sequence
reverse_complement_sequence = str(Seq(sequence_elements).reverse_complement())
print(f"Reverse complement sequence: {reverse_complement_sequence}")

# Adjust the reverse complement region for zero-based indexing
reverse_region = reverse_complement_sequence[len(sequence_elements) - reverse_range.end: len(sequence_elements) - reverse_range.start + 1]
print(f"Reverse complement region (positions {reverse_range.start} to {reverse_range.end}): {reverse_region}")

Original sequence: ATGCGTACGTAGCTAGTCTGATCGTAGCTAGTCGATGCAGGGC
Forward strand region (positions 5 to 20): GTACGTAGCTAGTCTG
Reverse complement sequence: GCCCTGCATCGACTAGCTACGATCAGACTAGCTACGTACGCAT
Reverse complement region (positions 25 to 41): CCTGCATCGACTAGCTA


## Using a Helper Functions

In this notebook, we've shown how to extract regions of a sequence based on the start and end positions, accounting for the orientation (either forward or reverse complement). 

To simplify this process, especially for larger and more complex designs, we can create a helper function that automates this. This helper function takes an SBOL `Sequence` object and a `Range` object as inputs and returns the correct sub-sequence. It automatically handles:
- **Zero-based indexing**: Converts the one-based biological indexing to Python's zero-based indexing for accurate slicing.
- **Orientation**: Detects whether the sub-sequence is in the forward or reverse complement orientation, ensuring that the appropriate sub-sequence is returned.

In [16]:
def extract_sequence_from_range(sequence_obj, range_obj):
    """
    Extracts a sub-sequence based on the SBOL Sequence object and Range object.

    :param sequence_obj: The SBOL Sequence object (contains the DNA sequence as 'elements').
    :param range_obj: The SBOL Range object (contains start, end, and orientation).
    :return: The extracted sub-sequence (reverse complement if specified).
    """
    # Get the sequence string from the Sequence object
    sequence = sequence_obj.elements
    
    # Extract start, end, and orientation from the Range object
    start = range_obj.start
    end = range_obj.end
    orientation = range_obj.orientation
    
    # Adjust for zero-based indexing for slicing
    sub_sequence = sequence[start - 1:end]

    # If orientation is reverse complement, reverse complement the sub-sequence only
    if orientation == sbol2.SBOL_ORIENTATION_REVERSE_COMPLEMENT:
        return str(Seq(sub_sequence).reverse_complement())
    
    # If orientation is inline, return the sub-sequence directly
    return sub_sequence

In [17]:
forward_sequence = extract_sequence_from_range(seq, forward_range)
print(f"Forward strand region (positions {forward_range.start} to {forward_range.end}): {forward_sequence}")
reverse_sequence = extract_sequence_from_range(seq, reverse_range)
print(f"Reverse complement region (positions {reverse_range.start} to {reverse_range.end}): {reverse_sequence}")

Forward strand region (positions 5 to 20): GTACGTAGCTAGTCTG
Reverse complement region (positions 25 to 41): CCTGCATCGACTAGCTA
