Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
SEP 013 -- Sequence insertion and replacement #28
This SEP introduces a new Component.sourceLocations field as a way of allowing only part of a sequence to be imported, rather than its entirety.
Proposal document: https://github.com/SynBioDex/SEPs/blob/master/sep_013.md
So the typical workflow is to have a starting cell with some level of characterized genome - eg fully sequenced, partially sequenced, only sequenced around genes of interest. The scientist designs CRISPRs to TALens to do say a knock out. We can model this part so so far so good. As soon as they do the experiement, they then want to go in and characterize the cells. So they grow up their cells, split them and then do three mein types of measurements:
Most people that are engineering cells want to first check their cell population to see how many cells were edited. They will then go through the remaining cells, isolate them and grow them up separately. The goal is to go as quickly as possible to single isolated colony where all the cells have the same genetic changes. So edit, do GCD, cell sort, GCD assay, select most promising wells where cells have highest numbers of penetrance (again with GCD). Repeat cell sorting on most successful wells, followed by GCD analysis to arrive at a genetically stable population of cells with the same genotype. Then characterize by CE sequencing and do NGS sequencing to find possible off targets.
The types of data one would want to track include:
What would be interesting to manage in the standard would be the evidence of the final clonal isolate. I have no doubt you could probably track all this data but you would publish or share your cells with that final data set. So the standard would need the ability to link information on the experiments done to the cell and the resulting changed host context:
Please let me know if you have any questions on this.
I'd like to suggest moving this back to the 2.3 milestone, as we need this capability for representation of strains in SD2 right now. We are likely to begin using this functionality soon, and it would be nice to do it in the SBOL namespace rather than an experimental namespace.
There are a couple of reasons that I consider this for SBOL 3.0. First, it would make more sense to introduce sourceLocation to Component at the same time we introduce location to Component, which is schedule for SBOL 3.0. Second, this is going to have some non-trivial library support implications. We have functions and validation rules, that rely upon being able to derive a sequence at the top-level using the sequences of its components. These will need to be updated, which is going to take some care to ensure that it is done properly. These changes are a bit more involved than the other SEPs slated for 2.3, so it could slow down 2.3 support a bit.
That being said, SBOL 3.0 has larger ramifications, so a slower 2.3 is still going to be faster than waiting for 3.0. If we do decided to push this one up, then I would be inclined to consider adding location to Component at the same time.