New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEP 013 -- Sequence insertion and replacement #28

jakebeal opened this Issue Jan 17, 2017 · 4 comments


None yet
7 participants

jakebeal commented Jan 17, 2017

This SEP introduces a new Component.sourceLocations field as a way of allowing only part of a sequence to be imported, rather than its entirety.

Proposal document:

@graik graik changed the title from SEP 013 -- Representing modified sequences to SEP 013 -- Sequence insertion and replacement Jan 30, 2017


This comment has been minimized.

MadraghRua commented Mar 28, 2017

HI Folks
I had a discussion with jake about this a couple of weeks ago and thought I'd add a few thoughts here.
As you probably know we are working on gene editing, insertion and knock out. So
Editing - changing an existing gene, typically through changing nucleotides
Insertion - adding a new gene or domain
Knock out - killing or removing an existing gene.

So the typical workflow is to have a starting cell with some level of characterized genome - eg fully sequenced, partially sequenced, only sequenced around genes of interest. The scientist designs CRISPRs to TALens to do say a knock out. We can model this part so so far so good. As soon as they do the experiement, they then want to go in and characterize the cells. So they grow up their cells, split them and then do three mein types of measurements:

  1. Genome cleavage assay - so a PCR based assay where you compare the size of the targeted region to a wild type control. If it's changed after a knock out or knock in experiment, you can deduce a) the average size change to the target and b) the degree to which this change has penetrated the genome of your cell population - eg 90% penetrance (very successful) vs 1% penetrance (probabaly not very successful or a reflection of a difficult to edit cell line). You are looking at a picture of a gel and the analysis of the bands to infer degree of penetrance of the cleavage event across the target region.
  2. Sequencing assay - you amplify the region around your target and sequence across where the change takes place. This is typcially CE analyses, is typically analyzed using software tools like TIDE and
    gives you a sequence based result of the effect of the editing event.
  3. NGS sequencing analysis - you perform editing in the presence of a short oligo that you introduce into the experiment. So the technique to be aware of is Guide-Seq. The oligo can be inserted anywhere that the CRISPR complex cuts - both on target and off target sites. You sequence off the short oligo and this gives you the sequence of the genome around which a CRISPR cleavage/oligo insertion event took place. The results are typically in BAM files and other associated data,e g excel. Ideally you are correlating this to your annotated genome to infer what the possible effects of the editing were upon the target and other possible off-target sites throughout the genome.

Most people that are engineering cells want to first check their cell population to see how many cells were edited. They will then go through the remaining cells, isolate them and grow them up separately. The goal is to go as quickly as possible to single isolated colony where all the cells have the same genetic changes. So edit, do GCD, cell sort, GCD assay, select most promising wells where cells have highest numbers of penetrance (again with GCD). Repeat cell sorting on most successful wells, followed by GCD analysis to arrive at a genetically stable population of cells with the same genotype. Then characterize by CE sequencing and do NGS sequencing to find possible off targets.

The types of data one would want to track include:
the planned change
initial evidence of editing
cell sorting - what cells went where
clonal analysis - again GCD
(maybe repeat the last two steps a few times to get to a suitable level of penetrance of the edit within the clonal isolate)
CE sequencing across the target
NGS sequencing for evidence of off-target events

What would be interesting to manage in the standard would be the evidence of the final clonal isolate. I have no doubt you could probably track all this data but you would publish or share your cells with that final data set. So the standard would need the ability to link information on the experiments done to the cell and the resulting changed host context:
Host context -
The initial genetic background
The changed host genetic background
The new genetic background linked to the associated experimental context

Experimental context:
the type of experiment done
the original data generated by the experiment
the analysis of the experiment
the final conclusion of the experiment and it's link to the host context

Please let me know if you have any questions on this.

@jakebeal jakebeal added this to the SBOL 2.3 milestone Jun 19, 2018

@cjmyers cjmyers modified the milestones: SBOL 2.3, SBOL 3.0 Jun 28, 2018

@cjmyers cjmyers assigned udp and unassigned AGMoreno Jun 28, 2018


This comment has been minimized.


jakebeal commented Jul 3, 2018

I'd like to suggest moving this back to the 2.3 milestone, as we need this capability for representation of strains in SD2 right now. We are likely to begin using this functionality soon, and it would be nice to do it in the SBOL namespace rather than an experimental namespace.


This comment has been minimized.


cjmyers commented Jul 3, 2018

There are a couple of reasons that I consider this for SBOL 3.0. First, it would make more sense to introduce sourceLocation to Component at the same time we introduce location to Component, which is schedule for SBOL 3.0. Second, this is going to have some non-trivial library support implications. We have functions and validation rules, that rely upon being able to derive a sequence at the top-level using the sequences of its components. These will need to be updated, which is going to take some care to ensure that it is done properly. These changes are a bit more involved than the other SEPs slated for 2.3, so it could slow down 2.3 support a bit.

That being said, SBOL 3.0 has larger ramifications, so a slower 2.3 is still going to be faster than waiting for 3.0. If we do decided to push this one up, then I would be inclined to consider adding location to Component at the same time.


This comment has been minimized.


NeilWipat commented Oct 11, 2018

Update of COMBINE 2018

Awaiting revisions by Jake

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment