Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow upload and download of JSON for functional annotation #2601

Closed
3 tasks
nathandunn opened this issue Mar 31, 2021 · 15 comments · Fixed by #2617
Closed
3 tasks

allow upload and download of JSON for functional annotation #2601

nathandunn opened this issue Mar 31, 2021 · 15 comments · Fixed by #2617
Labels
Projects

Comments

@nathandunn
Copy link
Contributor

Should support both downloading and uploading as JSON of functional annotation data. Minimally this will include:

  • GO
  • Gene Product
  • Provenance

May also include attributes / PMID, etc. but maybe not necessary if not immediate need.

@nathandunn nathandunn added this to To do in 2.6.4 LTS via automation Mar 31, 2021
@nathandunn
Copy link
Contributor Author

@mbc32

from google doc:

I can add this to the user-interface pretty easily. However, I have some questions:
(1) do we upload all GO annotations at once, individually, etc. and then separately for Gene Product and Provenances, etc. or do we upload them all together? If done individually are you doing them one annotation at a time or in a batch together?
(2) puting this in the UI is very do-able, but it seems like if you are getting it from a remote service and wanting to plug it in here, doing it via a script that pulls from one web-service and adds to another might make more sense. Perhaps even a "load annotation from URL"? If not, doing it from the UI is easy.

@nathandunn
Copy link
Contributor Author

nathandunn commented Apr 12, 2021

Funtional_annotation_workflow.pdf

response:

  1. All data should be loaded from a single JSON file
  2. when the data is exported from main database they may contain errors and be incomplete. I.e some organisms was annotated a long time ago and does not have the with_in key word. Other problems are GO terms without evidence and obsolete GO terms. Would it be possible to load the data directly into the webform without checking, so the annotators can correct them before they a saved to the database.

@nathandunn
Copy link
Contributor Author

@mbc32 I would propose loading the output JSON into its own app (or even something like the JSON beautifier) in tree mode.

Once fixed and validated, I would use the web service to the JSON feature by feature.

If its too slow, I can write an end-point for bulk loading. We might be able to add one to the python-apollo library as well.

Anyway, that is my 2 cents on that, but happy to discuss further.

@mbc32
Copy link
Contributor

mbc32 commented May 4, 2021

Hi @nathandunn
I you could implement the below functionality before you leave it would be very helpful. If the functionality is there we can modify it later when the format of the JSON file has been finalized. If you need to information please ask me.
Notes from VEuPathDB Apollo meeting:
The aim is to make a mechanism which enable the annotator to load functional annotation for a single gene via the user interface (Open annotation) from a JSON file.

  1. There should be one mechanism to load 'GO', 'Gene Product', 'Provenance' from a single JSON file.
  2. The format of the JSON file can be finalized later.

@nathandunn
Copy link
Contributor Author

FYI @rbuels

@nathandunn
Copy link
Contributor Author

@mbc32 I'd like to have an idea of the JSON you'll have to upload so that you know what it converts into.

If we are going to do it this way, I would probably do something like:

{ go:  [ { } , {} ], provenance: [{ }, {}], gene_product: [{}, {}] }

where the empty objects are the valid annotations already supported by the existing web services. That being, if you are pulling these out of a database, it would be trivial (and possibly cleaner) to call a web service to do the same thing, but I'm unsure what the curator workflow is, how they are pulling JSON, etc. They would have to be aware of the uniqueName, however.

I could also add it here: https://github.com/galaxy-genome-annotation/python-apollo/

so the command would be: arrow annotations add_go <json_file> etc.

@nathandunn
Copy link
Contributor Author

What are you pulling in from the existing database?

@mbc32
Copy link
Contributor

mbc32 commented May 4, 2021

Hi @nathandunn

  1. The JSON schema looks fine to me
  2. We would be getting all existing functional annotation for a gene, then correct it or add to it in apollo.
    Would there be any may to load the data direct into the web-forms from a JSON file. Bypassing the database and data checks?
  3. For this to work the annotator should not be required to find and copy the uniqueName.

@nathandunn
Copy link
Contributor Author

nathandunn commented May 4, 2021 via email

@mbc32
Copy link
Contributor

mbc32 commented May 5, 2021

Hi @nathandunn
I made a JSON schema and example:
apollo_FA_schema_example.tar.gz

There are some point

  1. Each section must be optional i.e. for now no genes in VEupathDB has provenance
  2. For genes we also need gene name and symbol, but that is a different part of the web service Annotation Service (setName, setSymbol). Not sure how to include them
  3. A gene can have several transcript ISO from. Not sure if we should nest the JSON. {gene:[{/transcript/},{}]}

@nathandunn
Copy link
Contributor Author

@mbc32 I think your schema makes sense. What I'm missing is the process. I'm a little Leary of pushing raw JSON through a UI (that's why we have a UI!), though we can do it if that is what we do.

My understanding of the process is:

  1. you create an annotation in Apollo from structural data
  2. you pull the functional annotations form an existing database where symbol and name match
  3. you push the functional annotation into Apollo matching the name / symbol, etc.

I'm assuming you will do this at the gene and transcript level both?

@mbc32
Copy link
Contributor

mbc32 commented May 6, 2021

Hi @nathandunn ,
I agree the workflow may be odd. What we are trying to do is having one place and one place only where all the correct information is together at the same time.
workflow

  1. Export existing functional annotation from our main database to JSON. i.e. A gene with GO annotation
  2. Import the JSON into Apollo.
  3. Add or update functional annotation. i.e. update product name, add new GO term, delete one GO term
  4. Export the functional annotation from Apollo via GFF
  5. Loading the new functional annotation into our main database overwriting any existing annotation

@nathandunn
Copy link
Contributor Author

@mbc32 I think your workflow makes a lot of sense.

I'm just wondering for step 2 if having a command-line interface would make more sense?

For step 1 and 5, are you doing that via an interface or via scripts?

Are curators doing all of this one at a time or are you doing a bulk load of functional annotations?

Thanks.

@mbc32
Copy link
Contributor

mbc32 commented May 7, 2021

HI @nathandunn
Step 2 is done by the annotator while working on a single gene, so doing it via the user interface would be best.
We still have to implement the functionality for step 1.
Step 5 is done in bulk once we release our data.

@nathandunn
Copy link
Contributor Author

nathandunn commented May 7, 2021

If With / From is not provided, should add NOT_PROVIDED:UNKNOWN and same for reference

nathandunn added a commit that referenced this issue May 7, 2021
2.6.4 LTS automation moved this from To do to Done May 10, 2021
nathandunn added a commit that referenced this issue May 10, 2021
* fixes #2601 when complete

* added reasonable UI

* kind of validating

* added example

* works, but only for the first anntation

* fixed compilation errors

* fixed empty references and withOrFrom

* added

* updated

* server code working

* updated rest doc and added reload

* fixed messages

* fixed messages

* fixed null references

* fixed formatting

* fixed deletions

* removed consol logs

* updated REST api

* added changelog

* fixed calls to clear
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

Successfully merging a pull request may close this issue.

2 participants