Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bring in all gene and transcript data, but disallow history (or use different uniquenames) if OGS #2401

Closed
wants to merge 52 commits into from

Conversation

nathandunn
Copy link
Contributor

@nathandunn nathandunn commented Mar 11, 2020

fixes #2372

make work for GFF3 track: in https://github.com/GMOD/Apollo/blob/develop/grails-app/services/org/bbop/apollo/FeatureService.groovy#L1293 make sure that we pull in the additional properties if there (gene_product, provenance, comments, dbxref, go, etc. etct.)

Note: have to reload once official track is set.


TEST:

fill out all the way export

  • delete exon, casting error
  • fix "show visible only"
  • delete entire gene that is fully annotated not working if GO annotation on the gene (and/or)
  • annotation gene with ALL annotation pieces should be deletable (currently has issues with GO annotations)
  • add data to EVERY gene and EVERY transcript and export as GFF3
    • confirm output has all details from above in GFF3 (minus the gene)
    • import into non-OGS and do basic structural and functional annotations
    • import into OGS and do basic structural and functional annotations
  • test structural edits
  • test history
  • test other exports (FASTA, GFF3, etc.)
  • test history of loaded versus unloaded

  • test for non-coding
  • test for single-level
  • test for variant
  • verify that this doesn't break NCList data
  • test against GFF3 loaded NCList data (--type mRNA)
  • verify use_name
  • write test for use_cds by modifying CDS in a GFF3 and using it as another example
  • for feature_event, set organism to current organism (single script?)

  • fill out all details
  • fill out dbxrefs (+ pmid):
    • update fails
    • delete switches when it shouldn't
  • fill out comments (+ canned)
    • fix canned comments
  • fill out attributes (+ canned)
  • fill out suggested names
  • add GO (gene / transcript)
  • edit GO (gene / transcript)
  • delete GO (gene / transcript)
  • add gene product (gene / transcript)
    • including historical names
  • edit gene product (gene / transcript)
  • delete gene product (gene / transcript)
  • add provenance (gene / transcript)
  • edit provenance (gene / transcript)
  • delete provenance (gene / transcript)

  • history should add properly when doing actions (one at a time)
  • undo works and redo work, but require a reload
  • [undo and redo without reload inhibit undo / redo if not a full gene
  • ~~ delete should remove feature_events or mark as deleted / change uniquename~~
  • load attribute for everything left (need an exclude list) or is this handled automatically in properties: note that they will have to be cast to lower-case on input

  • showHistory should only reflect a single event if OGS when added
  • IDs get converted are problematic when going back and forth so need to add an organism Id and match if present versus if not
  • using OGS, I need to make organisms unique in featureEvent code and only include those with
  • update REST doc
  • Feature.findByUniqueName() should be changed to use sequence or Organism in method (63)
  • load gene_product
  • load go_annotation (fix and test similar to gene_product)
  • GO transcript selection does not update
  • migrate gff3 code over to use the independent services
  • align with column 9 spec GFF3: note that the keys are assigned to lower-case
  • load provenance (fix and test similar to gene_product)
  • transcript should use existing name
  • load comment
  • handle synonyms
  • handle dbxref (added)
  • when creating new track with an "official" track, it is not visible to annotator see below, or reference sequence, or organism panel as annotation
  • can not delete an official new track now see below
    • ADD sequence-spepcfic check
  • pass in useCDS and useName and add test
  • deleteFeatures with gene product on transcript (I think)
  • descriptions for both are preserved
  • status works for both
  • need to add a new feature (error seen)
  • gene name and symbol are preserved
  • create annotation with that track, making sure that it sends the Gene as a parent ID
  • in JSONUtils.copyOfficialData build a parent_id (with the unique object) and a parent object
  • implement add GO to export similar to gene_product and provenance in order to aid with round-trip #2400 to add GO
  • exported GFF3 should contain all of the attributes

@nathandunn nathandunn added this to In progress in 2.6.0 LTS via automation Mar 11, 2020
@nathandunn nathandunn changed the title all of the data kind of comes through bring in all gene and transcript data Mar 11, 2020
@nathandunn
Copy link
Contributor Author

test input:

{"track":"Group1.10","features":[{"location":{"fmin":450633,"fmax":450881,"strand":-1},"type":{"cv":{"name":"sequence"},"name":"mRNA"},"name":"transcriptname1","orig_id":"GB40762-RA","seq_id":"Group1.10","source":null,"score":null,"phase":null,"owner":"ndunn@me.com","parent":{"location":{"fmin":450633,"fmax":450881,"strand":-1},"seq_id":"Group1.10","source":null,"type":{"cv":{"name":"sequence"},"name":"gene"},"score":null,"phase":null,"owner":"ndunn@me.com","symbol":"genesymbol1","go_annotations":"rank=1;aspect=BP;term=GO:0008015;db_xref=generef1:12312312;evidence=ECO:0000353;gene_product_relationship=RO:0004035;negate=true;note=[\\"gene note 1\\"];based_on=[\\"geneprefixwith1:123123\\"];last_updated=2020-03-12 11:52:04.339;date_created=2020-03-12 11:52:04.339,rank=2;aspect=MF;term=GO:0018742;db_xref=gogeneref2:22222;evidence=ECO:0000316;gene_product_relationship=RO:0002326;negate=false;note=[\\"gene note 2 a\\",\\"gene note 2\\"];based_on=[\\"gogeneprefix2:2222\\",\\"gogeneprefix2:33333\\"];last_updated=2020-03-12 11:53:54.202;date_created=2020-03-12 11:53:54.202","description":["gene comment 1","gene comment 2"],"name":"genename1","date_creation":"2020-03-12","provenance":"rank=1;field=SYNONYM;db_xref=geneprovref:1111;evidence=ECO:0000315;note=null;based_on=[\\"geneprove1:2222\\",\\"geneprove1:1111\\"];last_updated=2020-03-12 11:57:07.316;date_created=2020-03-12 11:57:07.316,rank=2;field=ATTRIBUTE;db_xref=geneprovref2:2222;evidence=ECO:0000269;note=null;based_on=[\\"geneprovwith2:333\\"];last_updated=2020-03-12 11:57:33.296;date_created=2020-03-12 11:57:33.296","alias":["genesyn1","genesyn2"],"geneattr1":"1111","gene_product":"rank=1;term=geneproduct2;db_xref=genereference2:11111;evidence=ECO:0000318;alternate=true;note=[];based_on=[\\"genewith2:11111\\"];last_updated=2020-03-12 11:55:20.712;date_created=2020-03-12 11:55:20.712,rank=2;term=geneproduc1;db_xref=genereference:1111;evidence=ECO:0000250;alternate=true;note=[];based_on=[\\"genewith1:2222\\",\\"genewith1:1111\\"];last_updated=2020-03-12 11:54:43.307;date_created=2020-03-12 11:54:43.307","uniquename":"958d7047-7342-4275-b91d-ddf68ecd3b80","date_last_modified":"2020-03-12","dbxref":["genename1dbxref:1212","genename2dbxref:2222"],"status":"under review","geneattr2":"2222"},"description":["trans comment 2 v1","trans comment 2 v 2"],"date_creation":"2020-03-12","transattr1":"1111","provenance":"rank=1;field=ATTRIBUTE;db_xref=transref1:11111;evidence=ECO:0000316;note=null;based_on=[\\"transattr1v2:2222\\",\\"transattr1:1111\\"];last_updated=2020-03-12 12:03:05.811;date_created=2020-03-12 12:03:05.811,rank=2;field=SYNONYM;db_xref=transprovref:1111;evidence=ECO:0000353;note=null;based_on=[\\"transprov1v2:2222\\",\\"transprov1:1111\\"];last_updated=2020-03-12 12:02:06.013;date_created=2020-03-12 12:02:06.013","transattr2":"2222","alias":["transyn2","transyn1"],"gene_product":"rank=1;term=trans prod 1;db_xref=transprodref1:1111;evidence=ECO:0000318;alternate=true;note=[];based_on=[\\"transprod1wtih2:2222\\",\\"transprod1with1:1111\\"];last_updated=2020-03-12 12:01:01.382;date_created=2020-03-12 12:01:01.38,rank=2;term=trans prod 2;db_xref=transref2:2222;evidence=ECO:0000315;alternate=false;note=[];based_on=[\\"trandprod2:33333\\"];last_updated=2020-03-12 12:01:28.469;date_created=2020-03-12 12:01:28.469","uniquename":"GB40762-RA","date_last_modified":"2020-03-12","dbxref":["transdbxref:22222","transdbxref2:11111"],"status":"under review","parent_id":"958d7047-7342-4275-b91d-ddf68ecd3b80","children":[{"location":{"fmin":450633,"fmax":450681,"strand":-1},"type":{"cv":{"name":"sequence"},"name":"exon"},"name":"c41dab40-ff33-42a5-898d-c97244f4eb4e","orig_id":"c41dab40-ff33-42a5-898d-c97244f4eb4e"},{"location":{"fmin":450788,"fmax":450881,"strand":-1},"type":{"cv":{"name":"sequence"},"name":"exon"},"name":"f862fe8b-ec8a-4188-a697-4269a5afd476","orig_id":"f862fe8b-ec8a-4188-a697-4269a5afd476"},{"location":{"fmin":450633,"fmax":450881,"strand":-1},"type":{"cv":{"name":"sequence"},"name":"CDS"}}]}],"operation":"add_transcript","clientToken":"14918555161196787143"}

…a location (eg. genes) and not removing things from transcripts properly
@nathandunn
Copy link
Contributor Author

notes:

                 name                 |        class         
--------------------------------------+----------------------
 genename1-00001                      | org.bbop.apollo.MRNA
 c41dab40-ff33-42a5-898d-c97244f4eb4e | org.bbop.apollo.Exon
 f862fe8b-ec8a-4188-a697-4269a5afd476 | org.bbop.apollo.Exon
 0395c20f-7f6b-49ef-854e-d34d746522cd | org.bbop.apollo.CDS
(4 rows)

apollo-none=# select this_.name,this_.class from feature this_ inner join feature_location featureloc1_ on this_.id=featureloc1_.feature_id inner join sequence sequence_a2_ on featureloc1_.sequence_id=sequence_a2_.id where ((sequence_a2_.organism_id=39969));

If I extended this to MRNA and other transcript types, it returns mRNA as the top-level feature (not what we want).

@nathandunn
Copy link
Contributor Author

This was the original one:

select s.name, f.id,f.name,f.class from feature f join feature_location fl on fl.feature_id = f.id join Sequence s on s.id = fl.sequence_id where class like '%Gene' ;
   name    |  id   |   name    |        class         
-----------+-------+-----------+----------------------
 Group1.10 | 22137 | genename1 | org.bbop.apollo.Gene

vs

=# select f.id,f.name,f.class from feature f where class like '%Gene' ;
  id   |             name             |        class         
-------+------------------------------+----------------------
  7259 | Apple3                       | org.bbop.apollo.Gene
  7799 | mRNA8198a                    | org.bbop.apollo.Gene
  7786 | mRNA8198                     | org.bbop.apollo.Gene
  8124 | Group11.18-00001aasdfasdasdf | org.bbop.apollo.Gene
 20477 | GB55593-RA                   | org.bbop.apollo.Gene
 22137 | genename1                    | org.bbop.apollo.Gene
  8433 | GB55426-RAa                  | org.bbop.apollo.Gene
  8369 | GB42840-RAasdasdfa           | org.bbop.apollo.Gene
 14211 | GB45005-RA                   | org.bbop.apollo.Gene
  8011 | Group11.18-00001asdf         | org.bbop.apollo.Gene
  7769 | mRNA8197a                    | org.bbop.apollo.Gene
  7756 | mRNA8197                     | org.bbop.apollo.Gene
  8398 | GB50198-RAasdfasdf           | org.bbop.apollo.Gene
  8542 | GB55428-RA                   | org.bbop.apollo.Gene
 20522 | GB55595-RA                   | org.bbop.apollo.Gene
 20442 | GB42659-RA                   | org.bbop.apollo.Gene
 20571 | GB55427-RA                   | org.bbop.apollo.Gene
 20782 | Apple3                       | org.bbop.apollo.Gene

Which is weird as this doesn't show up:

image

even though the GFF3 is legal:

##gff-version 3
##sequence-region Group1.10 1 1405242
Group1.10	.	gene	450634	450881	.	-	.	owner=ndunn@me.com;symbol=genesymbol1;Alias=genesyn1,genesyn2;Note=gene comment 1,gene comment 2;geneattr1=1111;description=gene description 1;ID=958d7047-7342-4275-b91d-ddf68ecd3b80;date_last_modified=2020-03-12;Name=genename1;status=under review;geneattr2=2222;date_creation=2020-03-12
Group1.10	.	mRNA	450634	450881	.	-	.	owner=ndunn@me.com;Parent=958d7047-7342-4275-b91d-ddf68ecd3b80;description=["trans comment 2 v1"%2C"trans comment 2 v 2"];ID=GB40762-RA;orig_id=GB40762-RA;date_last_modified=2020-03-12;Name=genename1-00001;status=under review;date_creation=2020-03-12
Group1.10	.	exon	450634	450681	.	-	.	Parent=GB40762-RA;ID=2cec439d-d950-45fc-91db-1ea259fffc68;Name=c41dab40-ff33-42a5-898d-c97244f4eb4e
Group1.10	.	CDS	450789	450881	.	-	0	Parent=GB40762-RA;ID=e6576f93-9091-4dd6-9590-cc6a3e55e325;Name=e6576f93-9091-4dd6-9590-cc6a3e55e325
Group1.10	.	CDS	450634	450681	.	-	0	Parent=GB40762-RA;ID=e6576f93-9091-4dd6-9590-cc6a3e55e325;Name=e6576f93-9091-4dd6-9590-cc6a3e55e325
Group1.10	.	exon	450789	450881	.	-	.	Parent=GB40762-RA;ID=2aa426c7-a391-4589-a574-4826cae4392e;Name=f862fe8b-ec8a-4188-a697-4269a5afd476
###

@nathandunn
Copy link
Contributor Author

Need to confirm that the gene location is not sent or otherwise created on upload (which I think is the case).

@nathandunn
Copy link
Contributor Author

nathandunn commented Mar 12, 2020

Sent this info:

{"track":"Group1.10","features":[{"location":{"fmin":450633,"fmax":450881,"strand":-1},"type":{"cv":{"name":"sequence"},"name":"mRNA"},"name":"transcriptname1","orig_id":"GB40762-RA","seq_id":"Group1.10","source":null,"score":null,"phase":null,"owner":"ndunn@me.com","parent":{"location":{"fmin":450633,"fmax":450881,"strand":-1},"seq_id":"Group1.10","source":null,"type":{"cv":{"name":"sequence"},"name":"gene"},"score":null,"phase":null,"owner":"ndunn@me.com","symbol":"genesymbol1","go_annotations":"rank=1;aspect=BP;term=GO:0008015;db_xref=generef1:12312312;evidence=ECO:0000353;gene_product_relationship=RO:0004035;negate=true;note=['gene note 1'];based_on=['geneprefixwith1:123123'];last_updated=2020-03-12 11:52:04.339;date_created=2020-03-12 11:52:04.339,rank=2;aspect=MF;term=GO:0018742;db_xref=gogeneref2:22222;evidence=ECO:0000316;gene_product_relationship=RO:0002326;negate=false;note=['gene note 2 a','gene note 2'];based_on=['gogeneprefix2:2222','gogeneprefix2:33333'];last_updated=2020-03-12 11:53:54.202;date_created=2020-03-12 11:53:54.202","description":["gene comment 1","gene comment 2"],"name":"genename1","date_creation":"2020-03-12","provenance":"rank=1;field=SYNONYM;db_xref=geneprovref:1111;evidence=ECO:0000315;note=null;based_on=['geneprove1:2222','geneprove1:1111'];last_updated=2020-03-12 11:57:07.316;date_created=2020-03-12 11:57:07.316,rank=2;field=ATTRIBUTE;db_xref=geneprovref2:2222;evidence=ECO:0000269;note=null;based_on=['geneprovwith2:333'];last_updated=2020-03-12 11:57:33.296;date_created=2020-03-12 11:57:33.296","alias":["genesyn1","genesyn2"],"geneattr1":"1111","gene_product":"rank=1;term=geneproduct2;db_xref=genereference2:11111;evidence=ECO:0000318;alternate=true;note=[];based_on=['genewith2:11111'];last_updated=2020-03-12 11:55:20.712;date_created=2020-03-12 11:55:20.712,rank=2;term=geneproduc1;db_xref=genereference:1111;evidence=ECO:0000250;alternate=true;note=[];based_on=['genewith1:2222','genewith1:1111'];last_updated=2020-03-12 11:54:43.307;date_created=2020-03-12 11:54:43.307","uniquename":"958d7047-7342-4275-b91d-ddf68ecd3b80","date_last_modified":"2020-03-12","dbxref":["genename1dbxref:1212","genename2dbxref:2222"],"status":"under review","geneattr2":"2222"},"description":["trans comment 2 v1","trans comment 2 v 2"],"date_creation":"2020-03-12","transattr1":"1111","provenance":"rank=1;field=ATTRIBUTE;db_xref=transref1:11111;evidence=ECO:0000316;note=null;based_on=['transattr1v2:2222','transattr1:1111'];last_updated=2020-03-12 12:03:05.811;date_created=2020-03-12 12:03:05.811,rank=2;field=SYNONYM;db_xref=transprovref:1111;evidence=ECO:0000353;note=null;based_on=['transprov1v2:2222','transprov1:1111'];last_updated=2020-03-12 12:02:06.013;date_created=2020-03-12 12:02:06.013","transattr2":"2222","alias":["transyn2","transyn1"],"gene_product":"rank=1;term=trans prod 1;db_xref=transprodref1:1111;evidence=ECO:0000318;alternate=true;note=[];based_on=['transprod1wtih2:2222','transprod1with1:1111'];last_updated=2020-03-12 12:01:01.382;date_created=2020-03-12 12:01:01.38,rank=2;term=trans prod 2;db_xref=transref2:2222;evidence=ECO:0000315;alternate=false;note=[];based_on=['trandprod2:33333'];last_updated=2020-03-12 12:01:28.469;date_created=2020-03-12 12:01:28.469","uniquename":"GB40762-RA","date_last_modified":"2020-03-12","dbxref":["transdbxref:22222","transdbxref2:11111"],"status":"under review","parent_id":"958d7047-7342-4275-b91d-ddf68ecd3b80","children":[{"location":{"fmin":450788,"fmax":450881,"strand":-1},"type":{"cv":{"name":"sequence"},"name":"exon"},"name":"f862fe8b-ec8a-4188-a697-4269a5afd476","orig_id":"f862fe8b-ec8a-4188-a697-4269a5afd476"},{"location":{"fmin":450633,"fmax":450681,"strand":-1},"type":{"cv":{"name":"sequence"},"name":"exon"},"name":"c41dab40-ff33-42a5-898d-c97244f4eb4e","orig_id":"c41dab40-ff33-42a5-898d-c97244f4eb4e"},{"location":{"fmin":450633,"fmax":450881,"strand":-1},"type":{"cv":{"name":"sequence"},"name":"CDS"}}]}],"operation":"add_transcript","clientToken":"14918555161196787143"}

Received:

{"features":[{"owner":"ndunn@me.com","parent_name":"genename1","uniquename":"GB40762-RA","description":"[\"trans comment 2 v1\",\"trans comment 2 v 2\"]","type":{"cv":{"name":"sequence"},"name":"mRNA"},"parent_type":{"cv":{"name":"sequence"},"name":"gene"},"date_creation":1584049789873,"sequence":"Group1.10","children":[{"owner":"None","parent_name":"genename1-00001","uniquename":"9e3fe24b-75c0-4936-bc17-581e2362a720","type":{"cv":{"name":"sequence"},"name":"CDS"},"parent_type":{"cv":{"name":"sequence"},"name":"mRNA"},"date_creation":1584049789862,"sequence":"Group1.10","parent_id":"GB40762-RA","name":"9e3fe24b-75c0-4936-bc17-581e2362a720","location":{"strand":-1,"id":51296,"fmin":450633,"fmax":450881},"id":51295,"properties":[],"date_last_modified":1584049790375},{"owner":"None","parent_name":"genename1-00001","uniquename":"0d82ff59-e67c-4bf2-b3aa-f010fea2a43d","type":{"cv":{"name":"sequence"},"name":"exon"},"parent_type":{"cv":{"name":"sequence"},"name":"mRNA"},"date_creation":1584049789817,"sequence":"Group1.10","parent_id":"GB40762-RA","name":"c41dab40-ff33-42a5-898d-c97244f4eb4e","location":{"strand":-1,"id":51292,"fmin":450633,"fmax":450681},"id":51291,"properties":[{"name":"orig_id","type":{"cv":{"name":"feature_property"}},"value":"c41dab40-ff33-42a5-898d-c97244f4eb4e"}],"date_last_modified":1584049789856},{"owner":"None","parent_name":"genename1-00001","uniquename":"ab8363e6-bb84-4ae3-9b15-e152f9607539","type":{"cv":{"name":"sequence"},"name":"exon"},"parent_type":{"cv":{"name":"sequence"},"name":"mRNA"},"date_creation":1584049789752,"sequence":"Group1.10","parent_id":"GB40762-RA","name":"f862fe8b-ec8a-4188-a697-4269a5afd476","location":{"strand":-1,"id":51288,"fmin":450788,"fmax":450881},"id":51287,"properties":[{"name":"orig_id","type":{"cv":{"name":"feature_property"}},"value":"f862fe8b-ec8a-4188-a697-4269a5afd476"}],"date_last_modified":1584049789812}],"parent_id":"958d7047-7342-4275-b91d-ddf68ecd3b80","name":"genename1-00001","location":{"strand":-1,"id":51286,"fmin":450633,"fmax":450881},"id":51285,"properties":[{"name":"orig_id","type":{"cv":{"name":"feature_property"}},"value":"GB40762-RA"}],"date_last_modified":1584049790488,"status":"under review"}],"sequenceAlterationEvent":false,"operation":"ADD"}

GFF3:

##gff-version 3
##sequence-region Group1.10 1 1405242
Group1.10	.	gene	450634	450881	.	-	.	owner=ndunn@me.com;symbol=genesymbol1;Alias=genesyn1,genesyn2;Note=gene comment 1,gene comment 2;geneattr1=1111;description=gene description 1;ID=958d7047-7342-4275-b91d-ddf68ecd3b80;date_last_modified=2020-03-12;Name=genename1;status=under review;geneattr2=2222;date_creation=2020-03-12
Group1.10	.	mRNA	450634	450881	.	-	.	owner=ndunn@me.com;Parent=958d7047-7342-4275-b91d-ddf68ecd3b80;description=["trans comment 2 v1"%2C"trans comment 2 v 2"];ID=GB40762-RA;orig_id=GB40762-RA;date_last_modified=2020-03-12;Name=genename1-00001;status=under review;date_creation=2020-03-12
Group1.10	.	exon	450634	450681	.	-	.	Parent=GB40762-RA;ID=2cec439d-d950-45fc-91db-1ea259fffc68;Name=c41dab40-ff33-42a5-898d-c97244f4eb4e
Group1.10	.	CDS	450789	450881	.	-	0	Parent=GB40762-RA;ID=e6576f93-9091-4dd6-9590-cc6a3e55e325;Name=e6576f93-9091-4dd6-9590-cc6a3e55e325
Group1.10	.	CDS	450634	450681	.	-	0	Parent=GB40762-RA;ID=e6576f93-9091-4dd6-9590-cc6a3e55e325;Name=e6576f93-9091-4dd6-9590-cc6a3e55e325
Group1.10	.	exon	450789	450881	.	-	.	Parent=GB40762-RA;ID=2aa426c7-a391-4589-a574-4826cae4392e;Name=f862fe8b-ec8a-4188-a697-4269a5afd476
###

@nathandunn
Copy link
Contributor Author

nathandunn commented Mar 12, 2020

When deleting the same GFF3 non-official from here it works:

##gff-version 3
##sequence-region Group1.10 1 1405242
Group1.10	.	gene	450634	450881	.	-	.	owner=ndunn@me.com;ID=d47490d8-3bae-4bc8-9993-3bbea99b0887;date_last_modified=2020-03-12;Name=transcriptname1;date_creation=2020-03-12
Group1.10	.	mRNA	450634	450881	.	-	.	owner=ndunn@me.com;Parent=d47490d8-3bae-4bc8-9993-3bbea99b0887;ID=5e8e5a1d-cbd9-4b82-ad37-7cb90b7fbc13;orig_id=GB40762-RA;date_last_modified=2020-03-12;Name=transcriptname1-00001;date_creation=2020-03-12
Group1.10	.	exon	450634	450681	.	-	.	Parent=5e8e5a1d-cbd9-4b82-ad37-7cb90b7fbc13;ID=ee3d76d8-ddbd-4fab-8a52-4e5e92ac55f5;Name=ee3d76d8-ddbd-4fab-8a52-4e5e92ac55f5
Group1.10	.	exon	450789	450881	.	-	.	Parent=5e8e5a1d-cbd9-4b82-ad37-7cb90b7fbc13;ID=661971f6-4e22-4c17-9629-8ebf28ca7786;Name=661971f6-4e22-4c17-9629-8ebf28ca7786
Group1.10	.	CDS	450789	450881	.	-	0	Parent=5e8e5a1d-cbd9-4b82-ad37-7cb90b7fbc13;ID=537de717-7f55-44cb-80d5-f0f2177499c7;Name=537de717-7f55-44cb-80d5-f0f2177499c7
Group1.10	.	CDS	450634	450681	.	-	0	Parent=5e8e5a1d-cbd9-4b82-ad37-7cb90b7fbc13;ID=537de717-7f55-44cb-80d5-f0f2177499c7;Name=537de717-7f55-44cb-80d5-f0f2177499c7
###

when we add it as an official track:

##gff-version 3
##sequence-region Group1.10 1 1405242
Group1.10	.	gene	450634	450881	.	-	.	owner=ndunn@me.com;symbol=genesymbol1;Alias=genesyn1,genesyn2;Note=gene comment 1,gene comment 2;geneattr1=1111;description=gene description 1;ID=958d7047-7342-4275-b91d-ddf68ecd3b80;date_last_modified=2020-03-12;Name=genename1;status=under review;geneattr2=2222;date_creation=2020-03-12
Group1.10	.	mRNA	450634	450881	.	-	.	owner=ndunn@me.com;Parent=958d7047-7342-4275-b91d-ddf68ecd3b80;description=["trans comment 2 v1"%2C"trans comment 2 v 2"];ID=GB40762-RA;orig_id=GB40762-RA;date_last_modified=2020-03-12;Name=genename1-00001;status=under review;date_creation=2020-03-12
Group1.10	.	exon	450634	450681	.	-	.	Parent=GB40762-RA;ID=2cec439d-d950-45fc-91db-1ea259fffc68;Name=c41dab40-ff33-42a5-898d-c97244f4eb4e
Group1.10	.	CDS	450789	450881	.	-	0	Parent=GB40762-RA;ID=e6576f93-9091-4dd6-9590-cc6a3e55e325;Name=e6576f93-9091-4dd6-9590-cc6a3e55e325
Group1.10	.	CDS	450634	450681	.	-	0	Parent=GB40762-RA;ID=e6576f93-9091-4dd6-9590-cc6a3e55e325;Name=e6576f93-9091-4dd6-9590-cc6a3e55e325
Group1.10	.	exon	450789	450881	.	-	.	Parent=GB40762-RA;ID=2aa426c7-a391-4589-a574-4826cae4392e;Name=f862fe8b-ec8a-4188-a697-4269a5afd476
###

I get the error:

Hibernate operation: could not execute statement; SQL [n/a]; ERROR: update or delete on table "feature" violates foreign key constraint "fk_72kmd92rdc6gne0nrh026o1j0" on table "feature_relationship" Detail: Key (id)=(22137) is still referenced from table "feature_relationship".; nested exception is org.postgresql.util.PSQLException: ERROR: update or delete on table "feature" violates foreign key constraint "fk_72kmd92rdc6gne0nrh026o1j0" on table "feature_relationship" Detail: Key (id)=(22137) is still referenced from table "feature_relationship".

which makes no sense as the database structure looks identical:

select f.name, f.class, ch.name,ch.class,  fr.* from feature f join feature_relationship fr on fr.parent_feature_id = f.id join feature_location fl on fl.feature_id=f.id join sequence s on s.id=fl.sequence_id join organism o on o.id=s.organism_id join feature ch on fr.child_feature_id=ch.id  where o.common_name='bee9' ;
apollo-none=# select f.name, f.class, ch.name,ch.class,  fr.* from feature f join feature_relationship fr on fr.parent_feature_id = f.id join feature_location fl on fl.feature_id=f.id join sequence s on s.id=fl.sequence_id join organism o on o.id=s.organism_id join feature ch on fr.child_feature_id=ch.id  where o.common_name='bee9' ;
      name       |        class         |                 name                 |        class         |  id   | version | child_feature_id | parent_feature_id | rank | value 
-----------------+----------------------+--------------------------------------+----------------------+-------+---------+------------------+-------------------+------+-------
 genename1-00001 | org.bbop.apollo.MRNA | c41dab40-ff33-42a5-898d-c97244f4eb4e | org.bbop.apollo.Exon | 68326 |       0 |            68323 |             68321 |    0 | 
 genename1-00001 | org.bbop.apollo.MRNA | f862fe8b-ec8a-4188-a697-4269a5afd476 | org.bbop.apollo.Exon | 68330 |       0 |            68327 |             68321 |    0 | 
 genename1-00001 | org.bbop.apollo.MRNA | 0dc1741d-3660-42cb-ae72-f5f53cf2d6d2 | org.bbop.apollo.CDS  | 68333 |       0 |            68331 |             68321 |    0 | 
 genename1       | org.bbop.apollo.Gene | genename1-00001                      | org.bbop.apollo.MRNA | 68336 |       0 |            68321 |             68319 |    0 | 
(4 rows)

However, if we are replicating unique names.

When loaded fresh however, its not a problem, so something else is going on during the deletion process.

@nathandunn
Copy link
Contributor Author

De novo features as official tracks are actually okay.

Removing gene_products, provenance, etc. doesn't seem to have an effect.

Deleting status gives another result. Possibly as its not deleting an orphan (as we share the same, I think).

Something is wrong with the uniqueNames . . not sure.

@nathandunn nathandunn moved this from In progress to To do in 2.6.0 LTS Apr 10, 2020
@nathandunn nathandunn moved this from To do to In progress in 2.6.0 LTS Apr 10, 2020
@nathandunn nathandunn marked this pull request as draft April 13, 2020 20:10
@nathandunn
Copy link
Contributor Author

nathandunn commented Apr 15, 2020

Original Test GFF3:

##gff-version 3
##sequence-region Group11.18 1 4736299
Group11.18	.	gene	2583914	2593823	.	+	.	owner=admin@local.host;symbol=genesymbol;go_annotations=rank%3D1%3Baspect%3DMF%3Bterm%3DGO:0051018%3Bdb_xref%3Dgenegoreference:asdfasf1%3Bevidence%3DECO:0000315%3Bgene_product_relationship%3DRO:0002327%3Bnegate%3Dtrue%3Bnote%3D["genegonote"]%3Bbased_on%3D["genegowith:123213"]%3Blast_updated%3D2020-04-15 11:31:12.753%3Bdate_created%3D2020-04-15 11:31:12.753;description=genedescriptioin;geneattrributekey=asdfasdf;Name=genename;date_creation=2020-04-15;provenance=rank%3D1%3Bfield%3DDESCRIPTION%3Bdb_xref%3Dgeneprovenanceref:asdfasdf%3Bevidence%3DECO:0000353%3Bnote%3D["geneprovenancenote"]%3Bbased_on%3D["geneprovenanceiwth:12313"]%3Blast_updated%3D2020-04-15 11:32:18.564%3Bdate_created%3D2020-04-15 11:32:18.564;Alias=genealias;Note=genecomment;gene_product=rank%3D1%3Bterm%3Dgenegeneproduct%3Bdb_xref%3Dgenegeneproductreference:123132%3Bevidence%3DECO:0000316%3Balternate%3Dfalse%3Bnote%3D["asdfasdfafs"]%3Bbased_on%3D["genegeneproductwith:asdf"]%3Blast_updated%3D2020-04-15 11:31:46.603%3Bdate_created%3D2020-04-15 11:31:46.603;ID=1424751f-1276-41da-9d34-350499e940c6;date_last_modified=2020-04-15;Dbxref=genedbxref:asdfasdf
Group11.18	.	mRNA	2583914	2593823	.	+	.	owner=admin@local.host;Parent=1424751f-1276-41da-9d34-350499e940c6;go_annotations=rank%3D1%3Baspect%3DMF%3Bterm%3DGO:0140399%3Bdb_xref%3Dtranscriptgoref:123123%3Bevidence%3DECO:0000315%3Bgene_product_relationship%3DRO:0002326%3Bnegate%3Dfalse%3Bnote%3D["transcriptgonote"]%3Bbased_on%3D["transcriptgo:1231313"]%3Blast_updated%3D2020-04-15 11:27:43.295%3Bdate_created%3D2020-04-15 11:27:43.295;description=transcriptdescription;orig_id=GB45157-RA;Name=transcriptname;date_creation=2020-04-15;provenance=rank%3D1%3Bfield%3DNAME%3Bdb_xref%3Dtranscriptprovenanceref:1231321%3Bevidence%3DECO:0000316%3Bnote%3D["transcriptprovenancenote"]%3Bbased_on%3D["transcriptnamewithprov:123132"]%3Blast_updated%3D2020-04-15 11:29:54.618%3Bdate_created%3D2020-04-15 11:29:54.618;transcriptattribtute=asdfasdf;Alias=transcriptalias;Note=transcriptcomment;gene_product=rank%3D1%3Bterm%3Dtranscriptgeneproductnote%3Bdb_xref%3Dtranscriptgeneproductreference:123131%3Bevidence%3DECO:0000319%3Balternate%3Dtrue%3Bnote%3D["transcriptgeneproductnote"]%3Bbased_on%3D["transcriptwithgeneproductprefix:123123"]%3Blast_updated%3D2020-04-15 11:28:30.814%3Bdate_created%3D2020-04-15 11:28:30.814;ID=GB45157-RA;date_last_modified=2020-04-15;Dbxref=transcriptdbxref:12312312
Group11.18	.	exon	2583914	2584376	.	+	.	Parent=GB45157-RA;ID=b48e9c2e-76b9-4462-b786-347e3d931f64;Name=b48e9c2e-76b9-4462-b786-347e3d931f64
Group11.18	.	exon	2591333	2593823	.	+	.	Parent=GB45157-RA;ID=7d791cf8-1fb2-4da9-8423-f3b4f0183b28;Name=7d791cf8-1fb2-4da9-8423-f3b4f0183b28
Group11.18	.	exon	2584843	2585034	.	+	.	Parent=GB45157-RA;ID=33d284d0-5f50-4c4c-be55-49ab6c92c29a;Name=33d284d0-5f50-4c4c-be55-49ab6c92c29a
Group11.18	.	CDS	2584185	2584376	.	+	0	Parent=GB45157-RA;ID=01831f8e-c457-4733-80dc-7f4f07fc2277;Name=01831f8e-c457-4733-80dc-7f4f07fc2277
Group11.18	.	CDS	2584843	2585034	.	+	0	Parent=GB45157-RA;ID=01831f8e-c457-4733-80dc-7f4f07fc2277;Name=01831f8e-c457-4733-80dc-7f4f07fc2277
Group11.18	.	CDS	2591333	2591707	.	+	0	Parent=GB45157-RA;ID=01831f8e-c457-4733-80dc-7f4f07fc2277;Name=01831f8e-c457-4733-80dc-7f4f07fc2277
###

@nathandunn
Copy link
Contributor Author

Note that the mRNA is the same, but the gene is missing from the imported evidence:

Group11.18	.	mRNA	2583914	2593823	.	+	.	owner=admin@local.host;go_annotations=rank%3D1%3Baspect%3DMF%3Bterm%3DGO:0140399%3Bdb_xref%3Dtranscriptgoref:123123%3Bevidence%3DECO:0000315%3Bgene_product_relationship%3DRO:0002326%3Bnegate%3Dfalse%3Bnote%3D["transcriptgonote"]%3Bbased_on%3D["transcriptgo:1231313"]%3Blast_updated%3D2020-04-15 11:27:43.295%3Bdate_created%3D2020-04-15 11:27:43.295;description=transcriptdescription;orig_id=GB45157-RA;Name=transcriptname;date_creation=2020-04-15;provenance=rank%3D1%3Bfield%3DNAME%3Bdb_xref%3Dtranscriptprovenanceref:1231321%3Bevidence%3DECO:0000316%3Bnote%3D["transcriptprovenancenote"]%3Bbased_on%3D["transcriptnamewithprov:123132"]%3Blast_updated%3D2020-04-15 11:29:54.618%3Bdate_created%3D2020-04-15 11:29:54.618;transcriptattribtute=asdfasdf;Alias=transcriptalias;Note=transcriptcomment;gene_product=rank%3D1%3Bterm%3Dtranscriptgeneproductnote%3Bdb_xref%3Dtranscriptgeneproductreference:123131%3Bevidence%3DECO:0000319%3Balternate%3Dtrue%3Bnote%3D["transcriptgeneproductnote"]%3Bbased_on%3D["transcriptwithgeneproductprefix:123123"]%3Blast_updated%3D2020-04-15 11:28:30.814%3Bdate_created%3D2020-04-15 11:28:30.814;ID=GB45157-RA;date_last_modified=2020-04-15;Dbxref=transcriptdbxref:12312312
Group11.18	.	exon	2583914	2584376	.	+	.	Parent=GB45157-RA;ID=b48e9c2e-76b9-4462-b786-347e3d931f64;Name=b48e9c2e-76b9-4462-b786-347e3d931f64
Group11.18	.	exon	2591333	2593823	.	+	.	Parent=GB45157-RA;ID=7d791cf8-1fb2-4da9-8423-f3b4f0183b28;Name=7d791cf8-1fb2-4da9-8423-f3b4f0183b28
Group11.18	.	exon	2584843	2585034	.	+	.	Parent=GB45157-RA;ID=33d284d0-5f50-4c4c-be55-49ab6c92c29a;Name=33d284d0-5f50-4c4c-be55-49ab6c92c29a
Group11.18	.	CDS	2584185	2584376	.	+	0	Parent=GB45157-RA;ID=01831f8e-c457-4733-80dc-7f4f07fc2277;Name=01831f8e-c457-4733-80dc-7f4f07fc2277
Group11.18	.	CDS	2584843	2585034	.	+	0	Parent=GB45157-RA;ID=01831f8e-c457-4733-80dc-7f4f07fc2277;Name=01831f8e-c457-4733-80dc-7f4f07fc2277
Group11.18	.	CDS	2591333	2591707	.	+	0	Parent=GB45157-RA;ID=01831f8e-c457-4733-80dc-7f4f07fc2277;Name=01831f8e-c457-4733-80dc-7f4f07fc2277

@nathandunn
Copy link
Contributor Author

nathandunn commented Apr 15, 2020

round-trip GFF3 is:

##gff-version 3
##sequence-region Group11.18 1 4736299
Group11.18	.	gene	2583914	2593823	.	+	.	owner=admin@local.host;symbol=genesymbol;go_annotations=rank%3D1%3Baspect%3DMF%3Bterm%3DGO:0051018%3Bdb_xref%3Dgenegoreference:asdfasf1%3Bevidence%3DECO:0000315%3Bgene_product_relationship%3DRO:0002327%3Bnegate%3Dtrue%3Bnote%3D['genegonote']%3Bbased_on%3D['genegowith:123213']%3Blast_updated%3D2020-04-15 13:11:23.753%3Bdate_created%3D2020-04-15 13:11:23.753;description=genedescriptioin;Name=genename;date_creation=2020-04-15;provenance=rank%3D1%3Bfield%3DDESCRIPTION%3Bdb_xref%3Dgeneprovenanceref:asdfasdf%3Bevidence%3DECO:0000353%3Bnote%3D['geneprovenancenote']%3Bbased_on%3D['geneprovenanceiwth:12313']%3Blast_updated%3D2020-04-15 13:11:23.747%3Bdate_created%3D2020-04-15 13:11:23.747;Alias=genealias;Note=genecomment;gene_product=rank%3D1%3Bterm%3Dgenegeneproduct%3Bdb_xref%3Dgenegeneproductreference:123132%3Bevidence%3DECO:0000316%3Balternate%3Dtrue%3Bnote%3D['asdfasdfafs']%3Bbased_on%3D['genegeneproductwith:asdf']%3Blast_updated%3D2020-04-15 13:11:23.742%3Bdate_created%3D2020-04-15 13:11:23.742;ID=1424751f-1276-41da-9d34-350499e940c6;date_last_modified=2020-04-15;Dbxref=genedbxref:asdfasdf
Group11.18	.	mRNA	2583914	2593823	.	+	.	owner=admin@local.host;Parent=1424751f-1276-41da-9d34-350499e940c6;go_annotations=rank%3D1%3Baspect%3DMF%3Bterm%3DGO:0140399%3Bdb_xref%3Dtranscriptgoref:123123%3Bevidence%3DECO:0000315%3Bgene_product_relationship%3DRO:0002326%3Bnegate%3Dtrue%3Bnote%3D['transcriptgonote']%3Bbased_on%3D['transcriptgo:1231313']%3Blast_updated%3D2020-04-15 13:11:23.7%3Bdate_created%3D2020-04-15 13:11:23.7;description=transcriptdescription;orig_id=GB45157-RA;Name=transcriptname;date_creation=2020-04-15;provenance=rank%3D1%3Bfield%3DNAME%3Bdb_xref%3Dtranscriptprovenanceref:1231321%3Bevidence%3DECO:0000316%3Bnote%3D['transcriptprovenancenote']%3Bbased_on%3D['transcriptnamewithprov:123132']%3Blast_updated%3D2020-04-15 13:11:23.534%3Bdate_created%3D2020-04-15 13:11:23.534;Alias=transcriptalias;Note=transcriptcomment;gene_product=rank%3D1%3Bterm%3Dtranscriptgeneproductnote%3Bdb_xref%3Dtranscriptgeneproductreference:123131%3Bevidence%3DECO:0000319%3Balternate%3Dtrue%3Bnote%3D['transcriptgeneproductnote']%3Bbased_on%3D['transcriptwithgeneproductprefix:123123']%3Blast_updated%3D2020-04-15 13:11:23.42%3Bdate_created%3D2020-04-15 13:11:23.42;ID=GB45157-RA;date_last_modified=2020-04-15;Dbxref=transcriptdbxref:12312312
Group11.18	.	exon	2583914	2584376	.	+	.	Parent=GB45157-RA;ID=09e598c5-1a2c-480f-93da-2ee4a363a371;Name=b48e9c2e-76b9-4462-b786-347e3d931f64
Group11.18	.	CDS	2584185	2584376	.	+	0	Parent=GB45157-RA;ID=46c8c230-abc2-4757-886b-392f5b29a46f;Name=46c8c230-abc2-4757-886b-392f5b29a46f
Group11.18	.	CDS	2584843	2585034	.	+	0	Parent=GB45157-RA;ID=46c8c230-abc2-4757-886b-392f5b29a46f;Name=46c8c230-abc2-4757-886b-392f5b29a46f
Group11.18	.	CDS	2591333	2591707	.	+	0	Parent=GB45157-RA;ID=46c8c230-abc2-4757-886b-392f5b29a46f;Name=46c8c230-abc2-4757-886b-392f5b29a46f
Group11.18	.	exon	2591333	2593823	.	+	.	Parent=GB45157-RA;ID=8143a0c0-3162-4082-9cd4-d3ce59797c1c;Name=7d791cf8-1fb2-4da9-8423-f3b4f0183b28
Group11.18	.	exon	2584843	2585034	.	+	.	Parent=GB45157-RA;ID=419ff462-7c94-4a8f-84c6-0a07655ee2b9;Name=33d284d0-5f50-4c4c-be55-49ab6c92c29a
###

@nathandunn
Copy link
Contributor Author

nathandunn commented Apr 15, 2020

Seems to pull in everything but attributes, which is fine and potentially this is a good feature as important arbitrary attributes may be problematic.

@nathandunn
Copy link
Contributor Author

I think that this is going to be problematic to introduce this without a refactor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
2.6.0 LTS
  
Done
Development

Successfully merging this pull request may close these issues.

import / export entire gene evidence as model (should work) with output evidence with gene models
1 participant