Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Information 'note' for organism-gene terms: has_gene_template #161

Closed
nataled opened this issue Sep 27, 2019 · 10 comments
Closed

Information 'note' for organism-gene terms: has_gene_template #161

nataled opened this issue Sep 27, 2019 · 10 comments
Assignees
Labels
Web Site Report issue with PRO website (proconsortium.org)
Milestone

Comments

@nataled
Copy link
Collaborator

nataled commented Sep 27, 2019

The sentence at the top of organism-gene term pages that indicates what the page is describing (all products of geneG in organismO) currently relies on the parent term's PRO-short-label for the name of geneG. This should instead be changed to use the name given in the has_gene_template line. This will increase the number of terms with such a message by over 30,000. It could also be possible to use the current mechanism ONLY if there is no has_gene_template line for the term of interest and ONLY if the parent term is gene level. Fewer than 400 terms would be covered by that addition, so might be okay to skip that and make it only rely on the has_gene_template line.

Caveat: There are some entries that can be encoded by multiple genes. is one example. For these, we have two choices:

  1. Suppress the message in such cases, or
  2. List them out.

For the example above, the statement currently is "This page represents a class of proteins encompassing all the protein products of the VCY1 gene in human." but it would become "This page represents a class of proteins encompassing all the protein products of the VCY and VCY1B genes in human."

The term with the largest number of genes (37) is PR:P62593. The distribution is:
1 98293
2 248
3 30
4 10
5 10
6 3
7 7
8 2
9 1
10 2
12 3
13 1
14 2
16 2
17 1
18 2
19 1
24 3
37 1

@nataled nataled added the Web Site Report issue with PRO website (proconsortium.org) label Sep 27, 2019
@nataled
Copy link
Collaborator Author

nataled commented Sep 25, 2020

@nataled to find examples where there is no has_gene_template line for an org-gene term.

@nataled
Copy link
Collaborator Author

nataled commented Sep 25, 2020

Karen suggests we simply use the phrase "This page represents a class of proteins encompassing all the protein products of multiple gene templates (listed below) in human." This could be applied at whatever cutoff, but considering that there are relatively few cases, just make the cutoff 1.

@nataled
Copy link
Collaborator Author

nataled commented Oct 9, 2020

Final decision: "This page represents a class of proteins encompassing all the protein products of the multiple gene templates listed below in human."

@Julie-Cowart
Copy link
Collaborator

I found an example that has no gene template (https://proconsortium.org/app_test/entry/PR%3A000044795/). I am fine with letting it revert to the current behavior (use parents short-name) easily enough but I will have to add the explicit check for the parent category being gene.

We currently don't show any message for terms with high level parents (such a PR:000000001) but this is because we use a very simplified and different template for these pages (rather than the organism-gene page template) because the logic for finding forms etc goes up to the parent and that wouldn't work for such high level parents. For example https://proconsortium.org/app/entry/PR%3AP62593/. If we use gene-template name that is no longer a problem. I assume we want this to work but it will be more complex so will need some extra testing.

@Julie-Cowart
Copy link
Collaborator

Now implemented on test site. Here are some cases and we can compare the changes by using the linked url and the production version. I list out parent PR:000000001 examples since they were suppressed before while with other parents the short name was used regardless of category (now we check if gene).

  • 1 gene template (always show message)
    • same parent short name and gene template name PR:Q15796 - no change
    • parent short name and gene-template name are slightly different PR:000044798 - slight name change
    • with family parent PR:P11035- fixes misleading message
    • with PR:000000001 parent PR:000044542 - adds missing message
  • multiple gene templates (always show message but say 'multiple')
    • with gene parent PR:Q9UEU5 - improves misleading message
    • with family parent PR:P84247 - improves misleading message
    • with PR:000000001 parent PR:AP62593 - adds missing message
  • no gene template (message is conditional)
    • with gene parent: PR:000044795 - no change since this is fallback
    • with family parent PR:P25240 - drops misleading message
    • with organism-seqgroup parent: PR:A4GBY0 - drops misleading message
    • with PR:000000001 parent PR:000010532 - no change since no message possible

@nataled
Copy link
Collaborator Author

nataled commented Oct 14, 2020

The 'no gene template' cases could possibly have a bit more done with them. First, the reason the has_gene_template line would be missing is because there's no entry in some model organism database, ensembl, or NCBI GeneID for the relevant gene. So, the PRO-short-label for organism-gene terms is intended to be the name of the gene of the prototype organism, usually human. The reason we used the parent short label is because the same applies there; that is, it is the gene name in the prototype organism, but without the organism name to confound things. The vast majority of the family or PR:000000001 parents could be fixed to show the correct message, but there would be some work to do for some cases (either revising the label, deriving rules for all short labels to remove the organism name (there's a hodgepodge of types), or outlining the simple cases and ignoring the rest.

Another possibility is for me to create a look-up table for all genes that could be created for each release.

@nataled
Copy link
Collaborator Author

nataled commented Oct 18, 2020

I've created the aforementioned look-up table. It provides the correct gene name to display on this line for all organism-gene PRO terms that have only a single gene template. I will produce this with every release. You can check it out on hershey at /home/dnatale/data/PRO_orggenes.dat

@nataled
Copy link
Collaborator Author

nataled commented Dec 4, 2020

This file will be henceforth available from the appropriate internal release folder /data/pir/projects/pro/releaseNN

@nataled nataled added the Discuss label Dec 4, 2020
@nataled
Copy link
Collaborator Author

nataled commented Dec 4, 2020

Example for organism-gene with 'protein' parent and without has_gene_template : PR:Q94FT4

@nataled
Copy link
Collaborator Author

nataled commented Dec 11, 2020

The 'no gene template' (and other cases) can be improved further, but the current solution will already greatly improve the information. Accordingly, a new issue has been created, #215.

@nataled nataled closed this as completed Dec 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Web Site Report issue with PRO website (proconsortium.org)
Projects
None yet
Development

No branches or pull requests

2 participants