Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an admin facility to add articles using biopax file from url #881

Merged
merged 15 commits into from
Nov 19, 2020

Conversation

metincansiper
Copy link
Contributor

@metincansiper metincansiper commented Oct 14, 2020

@maxkfranz
Copy link
Member

The entity names are not rendered on entities but when I click them I see that their name and also see that they are actually associated. It must be a UI issue in rendering entities.
When I click on an interaction it still does not show the last UI step. I think that it also probably is a UI issue similar to the case of entities.

You could do a diff on doc.json() to see what's different between a BioPAX-generated document and a manually created one:

  1. Create a document via BioPAX.
  2. Create a document with the same entities and interactions as (1) manually.
  3. Diff the JSON: doc1.json() versus doc2.json()

@maxkfranz
Copy link
Member

@metincansiper & @jvwong : To test this, it might be simplest to merge this into unstable so that this can be tried out on unstable.factoid.baderlab.org. The unstable instance should use an environment variable to use the latest factoid-converters.

@metincansiper, do you have a publicly-accessible instance of the latest factoid-converters?

@maxkfranz
Copy link
Member

The entity names are not rendered on entities but when I click them I see that their name and also see that they are actually associated. It must be a UI issue in rendering entities.

An element may have to be complete (ele.complete()). This normally marks that the user has performed some action to complete the entity (e.g. clicking the interaction type manually versus leaving it the default value -- in both cases there is still an associated type).

@metincansiper
Copy link
Contributor Author

metincansiper commented Oct 15, 2020

An element may have to be complete (ele.complete()). This normally marks that the user has performed some action to complete the entity (e.g. clicking the interaction type manually versus leaving it the default value -- in both cases there is still an associated type).

Thank you, the problem has been fixed after calling ele.complete() on elements.

@metincansiper
Copy link
Contributor Author

@metincansiper, do you have a publicly-accessible instance of the latest factoid-converters?

No. Is the factoid-converters deployed automatically to "http://biopax.baderlab.org/convert" from either master or unstable of factoid-converters?

@metincansiper
Copy link
Contributor Author

One other thing is what must be the email address used here (eb1564d#diff-24330a3dafb9473fe683be4b0a77ec948baf915b8a675a9fd6981009332aa77aR91). I did not want to assign it to the pc help email address to avoid the email that would be sent. I wanted to use an address most probably non-existing like pc@biofactoid.com.

The other question is should we avoid sending emails in a way when multiple articles are created through a biopax file url.

@maxkfranz
Copy link
Member

maxkfranz commented Oct 15, 2020 via email

@jvwong
Copy link
Member

jvwong commented Oct 15, 2020

@metincansiper, do you have a publicly-accessible instance of the latest factoid-converters?

No. Is the factoid-converters deployed automatically to "http://biopax.baderlab.org/convert" from either master or unstable of factoid-converters?

The unstable/master instances (running on XXX.XXX.81.174 ) can access a local converters instance exposed on port 9999. Currently, it is accessing the prod converter version, but the factoid-unstable.sh can be updated to point to local. Just tell me what branch/commit to run.

@maxkfranz
Copy link
Member

maxkfranz commented Oct 15, 2020

I think it's this branch here biopax_to_factoid: https://github.com/PathwayCommons/factoid-converters/tree/biopax_to_factoid

It's got a dockerfile, so that's probably the easiest way to deploy it. Manually setting up all the java stuff on a vm would be a pain.

@jvwong
Copy link
Member

jvwong commented Oct 15, 2020

... It may also be a good idea to have a flag in the document itself to indicate that these documents are generated from PC (something like source: pc | author).

We've met this naming problem before. Here's some options:

  • BioPAX: uses a bp:standardName of PhosphoSitePlus in bp:Provenance
  • Identifiers.org: provides a prefix phosphosite.protein and name PhosphoSite Protein

So any of those work for me.

@jvwong
Copy link
Member

jvwong commented Oct 15, 2020

I think it's this branch here biopax_to_factoid: https://github.com/PathwayCommons/factoid-converters/tree/biopax_to_factoid

It's got a dockerfile, so that's probably the easiest way to deploy it. Manually setting up all the java stuff on a vm would be a pain.

I've got pathwaycommons/factoid-converters:biopax_to_factoid running. Turns out the unstable.factoid.baderlab.org instance was always pointing at a local converters via BIOPAX_CONVERTER_URL="http://localhost:9999/convert/v2/"

@metincansiper
Copy link
Contributor Author

The document creation route could have a new parameter (i.e. email: true | false) to control whether emails are sent. We may need a similar flag for avoiding tweets on submission.
It may also be a good idea to have a flag in the document itself to indicate that these documents are generated from PC (something like source: pc | author).

I added the controls for email on doc creation and the source. However, I guess that there may not be any need for updating anything to disable tweet on submit. I think so because the tweeting is done when the submit is done through requesting this service endpoint (https://github.com/PathwayCommons/factoid/blob/unstable/src/server/routes/api/document/index.js#L1426). In our case I just call doc.submit(). Am I right?

@jvwong
Copy link
Member

jvwong commented Oct 16, 2020

The document creation route could have a new parameter (i.e. email: true | false) to control whether emails are sent. We may need a similar flag for avoiding tweets on submission.
It may also be a good idea to have a flag in the document itself to indicate that these documents are generated from PC (something like source: pc | author).

I added the controls for email on doc creation and the source. However, I guess that there may not be any need for updating anything to disable tweet on submit. I think so because the tweeting is done when the submit is done through requesting this service endpoint (https://github.com/PathwayCommons/factoid/blob/unstable/src/server/routes/api/document/index.js#L1426). In our case I just call doc.submit(). Am I right?

Yes.

The editor TaskView coordinates what happens after the user clicks the submit button: i.e. set the doc status 'public' which leads to 1. updating related papers 2. sending thank you email and 3. tweeting it

@jvwong
Copy link
Member

jvwong commented Oct 16, 2020

I think it's this branch here biopax_to_factoid: https://github.com/PathwayCommons/factoid-converters/tree/biopax_to_factoid
It's got a dockerfile, so that's probably the easiest way to deploy it. Manually setting up all the java stuff on a vm would be a pain.

I've got pathwaycommons/factoid-converters:biopax_to_factoid running. Turns out the unstable.factoid.baderlab.org instance was always pointing at a local converters via BIOPAX_CONVERTER_URL="http://localhost:9999/convert/v2/"

I think I figured out that your converter endpoint /convert/v2/biopax-url-to-json wants an archive of some sort (tar.gz)? Is this true?

@jvwong
Copy link
Member

jvwong commented Oct 16, 2020

There's no grounding information coming out of that converter: see PathwayCommons/factoid-converters#14

Right now, we could probably fake it, but we need some sort of object that looks like the grounding information:

Currently:

{
			"type": "protein",
			"name": "Oct1",
			"id": "e35b8c32-165e-446c-be2e-c3a4e946ffaa"
		}

Desired:

  {
    "namespace": "ncbi",
    "type": "ggp",
    "dbName": "NCBI Gene",
    "dbPrefix": "ncbigene",
    "id": "6580",
    "organism": "9606",
    "organismName": "Homo sapiens",
    "name": "SLC22A1",
    "synonyms": [
      "HOCT1",
      "OCT1",
      "oct1_cds",
      "solute carrier family 22 member 1",
      "organic cation transporter 1",
      "solute carrier family 22 (organic cation transporter), member 1"
    ],
...
  },

Copy link
Member

@jvwong jvwong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grounding should be solved by mapping rather than search.

name
};

return searchGrounding( opts ).then( res => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BioPAX source will provide an bp:EntityReference with bp:UnificationXref , in the case of PhosphoSitePlues, some uniprot knowledgebase id.

We'll have to map those rather than search.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g.

wget --quiet \
  --method POST \
  --header 'accept: application/json' \
  --header 'content-type: application/x-www-form-urlencoded' \
  --body-data 'from=ACC&to=P_ENTREZGENEID&query=P14873&format=tab' \
  --output-document \
  - https://www.uniprot.org/uploadlists/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@metincansiper & @jvwong, can this be marked as resolved?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. We can always pop in the grounding-search /map here instead, now or later @metincansiper

@metincansiper
Copy link
Contributor Author

How many documents are there in total? If the 149 are a small number and they don't include any foundational articles, then it's fine to just discard them.

@maxkfranz ena entities are in 147 different documents out of 16557 documents. 16557 is the number of documents remaining after excluding the ones including entities with no db xref. Before excluding them there are 16567 documents in total. I think 147 looks pretty small as compare to whole as you mentioned.

@maxkfranz
Copy link
Member

OK, that makes sense, Metin. I think it’s fine to exclude those documents for now. We can reevaluate later if needed.

At the end of the conversion, we should have a list of excluded documents that we can review manually. Maybe a CSV that we could import into a spreadsheet (something like id, name, pc url, columns)

@metincansiper
Copy link
Contributor Author

OK, that makes sense, Metin. I think it’s fine to exclude those documents for now. We can reevaluate later if needed.

Okay, then I think that the code is ready for review.

@maxkfranz
Copy link
Member

OK. While Jeff and I review, could you put together that spreadsheet I mentioned?

It looks like there’s also a conflict that needs to be resolved before merging.

Thanks

@metincansiper
Copy link
Contributor Author

OK. While Jeff and I review, could you put together that spreadsheet I mentioned?

Okay, is this row you mentioned (id, name, pc url, columns) all for a document? What do you expect to have in columns field?

@maxkfranz
Copy link
Member

I just meant that it should include these columns:

  • ID
  • Name
  • PC URL

If you think of other columns that would be useful, feel free to add them.

@metincansiper
Copy link
Contributor Author

Actually paxtools provide me with the pubmed ids of interactions and I make the documents by grouping interactions by their pubmed ids. I can query the document name as well by using the same way as it is done in factoid. However, I guess there may not be a PC URL for these documents/pmid's. What do you think?

@maxkfranz
Copy link
Member

The pubmed id is the main thing, so that’s fine

@maxkfranz
Copy link
Member

A list with just the pubmed ids would be fine

@metincansiper
Copy link
Contributor Author

The pubmed id is the main thing, so that’s fine

Okay, where were you meaning to download(factoid client side, factoid server side or java server side)?

@maxkfranz
Copy link
Member

It doesn't matter to me. Wherever it's easiest. You could even just log them to the console and copy to Google Sheets.

We just need the list once. The list generation probably won't be a proper, reusable feature of the system, since we don't expect to need it more than once.

Does that make sense?

@metincansiper
Copy link
Contributor Author

metincansiper commented Nov 12, 2020

It doesn't matter to me. Wherever it's easiest. You could even just log them to the console and copy to Google Sheets.
We just need the list once. The list generation probably won't be a proper, reusable feature of the system, since we don't expect to need it more than once.
Does that make sense?

Sure I created the Google Sheet with pmids: https://docs.google.com/spreadsheets/d/1Aafdwi3f9pmlPCD_zrGIrUo_s3hp9nIEdbux8pAThig/edit?usp=sharing

@maxkfranz
Copy link
Member

Great, Metin. Could you add a column that just links out to pubmed for the ID? We'll need to go over the articles on Pubmed to determine whether any of them are important.

@metincansiper
Copy link
Contributor Author

Great, Metin. Could you add a column that just links out to pubmed for the ID? We'll need to go over the articles on Pubmed to determine whether any of them are important.

I updated the document in that way.

@maxkfranz maxkfranz self-requested a review November 16, 2020 16:30
import fs from 'fs';
import { URLSearchParams } from 'url';

import { exportToZip, EXPORT_TYPES } from './export';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file needs the conflicts to be resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxkfranz I resolved the conflict and merged the current unstable in this branch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks Metin

Copy link
Member

@maxkfranz maxkfranz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there are any red flags, let's merge this in for testing on the master instance.

@jvwong
Copy link
Member

jvwong commented Nov 17, 2020

@metincansiper
Copy link
Contributor Author

@jvwong
Copy link
Member

jvwong commented Nov 18, 2020

@metincansiper does this depend on https://github.com/PathwayCommons/factoid-converters/tree/biopax_to_factoid ?

@jvwong yes

OK, then let's get that converter code in as well.

@maxkfranz
Copy link
Member

Great. Let's merge in the convertors to the latest, and let's merge this in. Then we can tag and release.

The email template has been updated to the latest version of the text from the Google Doc. I verified that the emails are sent by hitting the refresh button on the admin UI, so we're ready to go. Once we've deployed prod, we can send out the emails for one of the already submitted factoids.

@metincansiper
Copy link
Contributor Author

OK, then let's get that converter code in as well.

I was not able to merge and push the changes into master directly. I got the following logs:

remote: Resolving deltas: 100% (1/1), completed with 1 local object.
remote: error: GH006: Protected branch update failed for refs/heads/master.
remote: error: Required status check "continuous-integration/travis-ci" is expected.
To https://github.com/PathwayCommons/factoid-converters.git
 ! [remote rejected] master -> master (protected branch hook declined)
error: failed to push some refs to 'https://github.com/PathwayCommons/factoid-converters.git'

Therefore, I created a PR instead (PathwayCommons/factoid-converters#18). I am not allowed to merge it in yet. It is pending as below right now, I am not able to click somewhere to visit the travis run:

Screen Shot 2020-11-18 at 3 08 24 PM

@maxkfranz
Copy link
Member

@metincansiper, I've made you an admin for the convertors repo, so you should be able to override and merge form now on, if you need. Travis is just really slow lately. It looks like the builds haven't even started.

@metincansiper metincansiper merged commit 4303103 into unstable Nov 19, 2020
@metincansiper
Copy link
Contributor Author

Okay I merged both of this one and the PR in factoid-converters.

@jvwong jvwong deleted the pc2_data branch December 9, 2020 17:17
@metincansiper
Copy link
Contributor Author

@jvwong considering this comment (#881 (comment)) and so PathwayCommons/factoid-converters#14:

We are currently searching the grounding by the xrefs coming from converter and using the results as here (https://github.com/PathwayCommons/factoid/blob/unstable/src/server/routes/api/document/index.js#L1675).

Here is the example of an xref currently coming from factoid converters (I created a PR for adding the org field it is not merged yet):

xref:  { db: 'hgnc symbol', id: 'EGFR', org: '9606' }

and this is the grounding that is retrieved in factoid side using that xref:

{
  namespace: 'ncbi',
  type: 'ggp',
  dbName: 'NCBI Gene',
  dbPrefix: 'ncbigene',
  id: '1956',
  organism: '9606',
  organismName: 'Homo sapiens',
  name: 'EGFR',
  synonyms: [
    'ERBB',
    'ERBB1',
    'ERRP',
    'HER1',
    'NISBD2',
    'PIG61',
    'mENA',
    'epidermal growth factor receptor',
    'avian erythroblastic leukemia viral (v-erb-b) oncogene homolog',
    'cell growth inhibiting protein 40',
    'cell proliferation-inducing protein 61',
    'epidermal growth factor receptor tyrosine kinase domain',
    'erb-b2 receptor tyrosine kinase 1',
    'proto-oncogene c-ErbB-1',
    'receptor tyrosine-protein kinase erbB-1'
  ],
  dbXrefs: [
    { db: 'MIM', id: '131550' },
    { db: 'HGNC', id: 'HGNC:3236' },
    { db: 'Ensembl', id: 'ENSG00000146648' }
  ],
  typeOfGene: 'protein-coding'
}

Are you suggesting to send the grounding as the second json directly from the converters instead of sending the xref from converter (and so not retrieving grounding from the factoid based on the xref as we are doing now but using the input from the converter as is)?

@jvwong
Copy link
Member

jvwong commented Feb 26, 2021

I think things have improved a lot since that comment and maybe its OK now.

I think the original problem was that there wasn't any grounding info at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants