## Assembly Plan with Plasmid Selection from Abstract Design

Assumptions: 
- user selects MoClo with BsaI restriction enzyme
- user creates abstract design in SBOLCanvas without including a circular backbone glpyh/component
- user selects desired backbone ahead of time, in this case: https://synbiohub.org/user/ryangreer/gonzaloplasmids/module1/1
- all plasmids have defined fusion sites, marked with role: http://identifiers.org/so/SO:0001953


In [287]:
import sbol2build as s2b
import sbol2
from typing import List

#### Pull Collection of Plasmids from SBH

In [288]:
plasmid_collection = sbol2.Document()

sbh = sbol2.PartShop('https://synbiohub.org')
sbh.pull('https://synbiohub.org/user/ryangreer/gonzaloplasmids/gonzaloplasmids_collection/1/c4869067a3286b13552d65bfe9387a81e92f2972/share', plasmid_collection)

In [289]:
for md in plasmid_collection.moduleDefinitions:
    print(f"Plasmid: {md.displayId}")
    for fc in md.functionalComponents:
        print(f"Top Component: {fc.displayId}")
        definition = plasmid_collection.getComponentDefinition(fc.definition)

        print("Contains:")
        for component in definition.components:
            print(f"\t{component.displayId}")


    print("\n")

Plasmid: module3
Top Component: GFP_Plas_3
Contains:
	Cir_qxow_5
	Scar_C_2
	GFP_3
	Scar_D_4


Plasmid: module1
Top Component: UJHDBOTD_3
Contains:
	Scar_F_4
	RFP_cassette_3
	Scar_A_2
	Cir_qxow_5


Plasmid: module5
Top Component: pro_25
Contains:
	J23101_3
	Scar_A_2
	Cir_qxow_5
	Scar_B_4


Plasmid: module2
Top Component: rbs_3
Contains:
	Scar_C_4
	Scar_B_2
	Cir_qxow_5
	B0034_3


Plasmid: module4
Top Component: term_25
Contains:
	Cir_qxow_5
	Scar_F_4
	Scar_D_2
	B0015_3




### Import Abstract Design from Local

In [290]:
abstract_design = sbol2.Document()
abstract_design.read('tests/test_files/abstract_design.xml')

In [291]:
for md in abstract_design.moduleDefinitions:
    print(f"Plasmid: {md.displayId}")
    for fc in md.functionalComponents:
        print(f"Top Component: {fc.displayId}")
        definition = abstract_design.getComponentDefinition(fc.definition)

        print("Contains:")
        for component in definition.components:
            print(f"\t{component.displayId}")

        print("SequenceConstraints:")
        for constraint in definition.sequenceConstraints:
            print(f"\t{constraint.restriction}, subject: {constraint.subject}, object: {constraint.object}")

        print("Annotations:")
        for annotation in definition.sequenceAnnotations:
            range = annotation.locations.getRange()
            print(f"\t{annotation.component}, ({range.start}, {range.end})")


    print("\n")

Plasmid: basic_TU
Top Component: cYYPhACA_28
Contains:
	B0015_4
	GFP_3
	B0034_2
	J23101_1
SequenceConstraints:
	http://sbols.org/v2#precedes, subject: https://sbolcanvas.org/cYYPhACA/GFP_3/1, object: https://sbolcanvas.org/cYYPhACA/B0015_4/1
	http://sbols.org/v2#precedes, subject: https://sbolcanvas.org/cYYPhACA/B0034_2/1, object: https://sbolcanvas.org/cYYPhACA/GFP_3/1
	http://sbols.org/v2#precedes, subject: https://sbolcanvas.org/cYYPhACA/J23101_1/1, object: https://sbolcanvas.org/cYYPhACA/B0034_2/1
Annotations:
	https://sbolcanvas.org/cYYPhACA/B0034_2/1, (36, 56)
	https://sbolcanvas.org/cYYPhACA/B0015_4/1, (774, 902)
	https://sbolcanvas.org/cYYPhACA/GFP_3/1, (57, 773)
	https://sbolcanvas.org/cYYPhACA/J23101_1/1, (1, 35)




Algorithm:
1) grab abstract design (no backbone in sbolcanvas)
2) grab collections of plasmid (circular roles?)
3) sort components within 
4) take user backbone selection as constraint on first part fusion site (informs plasmid selection)
5) repeat with all subsequent plasmids


#### Sort Components in Design With Precedes SequenceConstraints

look into .getInSequentialOrder

In [292]:
def get_design_component_order(design: sbol2.ComponentDefinition, doc: sbol2.document) -> List[sbol2.ComponentDefinition]:
    """
    Determine and return the ordered list of components in a design based on 'precedes' constraints.

    This function sorts the components of a given SBOL `ComponentDefinition` by analyzing its
    `sequenceConstraints`. Each constraint defines a directional relationship (subject precedes object)
    between two components. The function reconstructs the full linear order of components by iteratively
    inserting subjects before their corresponding objects (or vice versa) until all components are placed
    according to the defined constraints.

    Args:
        design (sbol2.ComponentDefinition): component definition of abstract design

    Returns:
        List[sbol2.ComponentDefinition]: list of components ordered according to the 'precedes'
        relationships defined in the sequence constraints.
    """
    constraint_list = list(design.sequenceConstraints)
    ordered_component_list = []
    
    ordered_component_list.extend([constraint_list[0].subject, constraint_list[0].object])

    i = 1
    while len(ordered_component_list) < len(constraint_list)+1:
        if i >= len(constraint_list): 
            i = 1

        subject = constraint_list[i].subject
        object = constraint_list[i].object
        
        if object in ordered_component_list and subject not in ordered_component_list:
            objectIndex = ordered_component_list.index(object)
            ordered_component_list.insert(objectIndex, subject)
        elif subject in ordered_component_list and object not in ordered_component_list:
            subjectIndex = ordered_component_list.index(subject)
            ordered_component_list.insert(subjectIndex+1, object)

        i += 1

    return ordered_component_list

In [None]:
def extract_design_parts(design: sbol2.ComponentDefinition, doc: sbol2.Document) -> List[sbol2.ComponentDefinition]:
    component_list = [c for c in design.getInSequentialOrder()]
    return [doc.getComponentDefinition(component.definition) for component in component_list]

extract_design_parts(definition, abstract_design)

https://sbolcanvas.org/cYYPhACA/J23101_1/1
https://sbolcanvas.org/cYYPhACA/B0034_2/1
https://sbolcanvas.org/cYYPhACA/GFP_3/1
https://sbolcanvas.org/cYYPhACA/B0015_4/1


[<sbol2.componentdefinition.ComponentDefinition at 0x118bdb770>,
 <sbol2.componentdefinition.ComponentDefinition at 0x118bdbbf0>,
 <sbol2.componentdefinition.ComponentDefinition at 0x118bdb890>,
 <sbol2.componentdefinition.ComponentDefinition at 0x118bdb9b0>]

In [None]:
orderedComponentURIList = get_design_component_order(definition, abstract_design)
print(orderedComponentURIList)
orderedDefinitionList = []

for URI in orderedComponentURIList:    
    for comp in definition.components:        
        if str(comp) == URI:
            print(comp)
            part_def = abstract_design.getComponentDefinition(comp.definition)
            orderedDefinitionList.append(part_def)


https://sbolcanvas.org/cYYPhACA/J23101_1/1
https://sbolcanvas.org/cYYPhACA/B0034_2/1
https://sbolcanvas.org/cYYPhACA/GFP_3/1
https://sbolcanvas.org/cYYPhACA/B0015_4/1


## Finding plasmids containing parts of interest

Define Fusion Site Constants

In [294]:
FUSION_SITES = {
    "A": "GGAG",
    "B": "TACT",
    "C": "AATG",
    "D": "AGGT",
    "E": "GCTT",
    "F": "CGCT",
    "G": "TGCC",
    "H": "ACTA",
}

In [None]:
def extract_fusion_sites(plasmid: sbol2.ComponentDefinition, doc: sbol2.Document) -> List[sbol2.ComponentDefinition]:
    fusion_sites = []
    for component in plasmid.components:
        definition = doc.getComponentDefinition(component.definition)
        if ("http://identifiers.org/so/SO:0001953" in definition.roles):
            fusion_sites.append(definition)

    return fusion_sites

# Establish MocloPlasmid Class

In [None]:
class MocloPlasmid:
    def __init__(self, name: str, definition: sbol2.ComponentDefinition, doc: sbol2.document):
        self.definition = definition
        self.fusion_sites = self.match_fusion_sites(doc)
        self.name = name + "".join(f"_{s}" for s in self.fusion_sites)


    def match_fusion_sites(self, doc: sbol2.document) -> List[str]:
        fusion_site_definitions = extract_fusion_sites(self.definition, doc)
        fusion_sites = []
        for site in fusion_site_definitions:
            sequence_obj = doc.getSequence(site.sequences[0])
            sequence = sequence_obj.elements

            for key, seq in FUSION_SITES.items():
                if seq == sequence.upper():
                    fusion_sites.append(key)

        fusion_sites.sort()
        return fusion_sites

    def __repr__(self) -> str:
        return (
            f"MocloPlasmid:\n"
            f"  Name: {self.name}\n"
            f"  Definition: {self.definition.identity}\n"
            f"  Fusion Sites: {self.fusion_sites or 'Not found'}"
        )

### Construct Plasmid Dictionary by identifying plasmids containing abstract design parts
- where key = name of component in plasmid and value = list of MocloPlasmid Objects

In [None]:
plasmid_dict = {}
for part in orderedDefinitionList:
    for plasmid in plasmid_collection.moduleDefinitions:
        toplevel_URI = plasmid.functionalComponents[0].definition
        toplevel_definition = plasmid_collection.getComponentDefinition(toplevel_URI)

        for component in toplevel_definition.components:
            if component.definition == str(part):
                fusion_sites = [site.name for site in extract_fusion_sites(toplevel_definition, plasmid_collection)]
                print(f"found: {component.definition} in {plasmid} with {fusion_sites}")
                plasmid_dict.setdefault(part.displayId, [])

                componentName = plasmid_collection.getComponentDefinition(component.definition).name


                plasmid_dict[part.displayId].append(
                    MocloPlasmid(componentName, toplevel_definition, plasmid_collection)
                )

found: https://synbiohub.org/user/ryangreer/gonzaloplasmids/J23101/1 in https://synbiohub.org/user/ryangreer/gonzaloplasmids/module5/1 with ['Scar_A', 'Scar_B']
found: https://synbiohub.org/user/ryangreer/gonzaloplasmids/B0034/1 in https://synbiohub.org/user/ryangreer/gonzaloplasmids/module2/1 with ['Scar_C', 'Scar_B']
found: https://synbiohub.org/user/ryangreer/gonzaloplasmids/GFP/1 in https://synbiohub.org/user/ryangreer/gonzaloplasmids/module3/1 with ['Scar_C', 'Scar_D']
found: https://synbiohub.org/user/ryangreer/gonzaloplasmids/B0015/1 in https://synbiohub.org/user/ryangreer/gonzaloplasmids/module4/1 with ['Scar_F', 'Scar_D']


In [298]:
for key,value in plasmid_dict.items():
    print(key)
    print(value)

J23101
[MocloPlasmid:
  Name: J23101_A_B
  Definition: https://synbiohub.org/user/ryangreer/gonzaloplasmids/pro/1
  Fusion Sites: ['A', 'B']]
B0034
[MocloPlasmid:
  Name: B0034_B_C
  Definition: https://synbiohub.org/user/ryangreer/gonzaloplasmids/rbs/1
  Fusion Sites: ['B', 'C']]
GFP
[MocloPlasmid:
  Name: GFP_C_E
  Definition: https://synbiohub.org/user/ryangreer/gonzaloplasmids/GFP_Plas/1
  Fusion Sites: ['C', 'E']]
B0015
[MocloPlasmid:
  Name: B0015_E_F
  Definition: https://synbiohub.org/user/ryangreer/gonzaloplasmids/term/1
  Fusion Sites: ['E', 'F']]


### Get Selected Backbone and Check Compatibility

In [299]:
backbone_doc = sbol2.Document()
sbh.pull("https://synbiohub.org/user/ryangreer/gonzaloplasmids/module1/1/9d94ff284e1963ad6a2b00e99520ee5437994370/share", backbone_doc)

In [300]:
backbone_URI = backbone_doc.moduleDefinitions[0].functionalComponents[0].definition
backbone_definition = backbone_doc.getComponentDefinition(backbone_URI)

backbone = MocloPlasmid("cir_backbone", backbone_definition, backbone_doc)

Scar_A, GGAG, A
Scar_F, cgct, F


## Iterate Through Plasmid Dict to Find Matching Sites

In [304]:
selected_plasmids = []
match_to = backbone
match_idx = 0

for i, key in enumerate(plasmid_dict):
    for plasmid in plasmid_dict[key]:
        if plasmid.fusion_sites[0] == match_to.fusion_sites[match_idx]:
            print(f"matched {plasmid.name} with {match_to.name} on fusion site {plasmid.fusion_sites[0]}!")
            selected_plasmids.append(plasmid)
            match_to = plasmid
            match_idx = 1
            break

print("\nFinal List!", selected_plasmids, backbone)

matched J23101_A_B with cir_backbone_A_F on fusion site A!
matched B0034_B_C with J23101_A_B on fusion site B!
matched GFP_C_E with B0034_B_C on fusion site C!
matched B0015_E_F with GFP_C_E on fusion site E!

Final List! [MocloPlasmid:
  Name: J23101_A_B
  Definition: https://synbiohub.org/user/ryangreer/gonzaloplasmids/pro/1
  Fusion Sites: ['A', 'B'], MocloPlasmid:
  Name: B0034_B_C
  Definition: https://synbiohub.org/user/ryangreer/gonzaloplasmids/rbs/1
  Fusion Sites: ['B', 'C'], MocloPlasmid:
  Name: GFP_C_E
  Definition: https://synbiohub.org/user/ryangreer/gonzaloplasmids/GFP_Plas/1
  Fusion Sites: ['C', 'E'], MocloPlasmid:
  Name: B0015_E_F
  Definition: https://synbiohub.org/user/ryangreer/gonzaloplasmids/term/1
  Fusion Sites: ['E', 'F']] MocloPlasmid:
  Name: cir_backbone_A_F
  Definition: https://synbiohub.org/user/ryangreer/gonzaloplasmids/UJHDBOTD/1
  Fusion Sites: ['A', 'F']
