Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell types qualified by anatomical location #455

Open
cmungall opened this issue Sep 21, 2020 · 20 comments
Open

Cell types qualified by anatomical location #455

cmungall opened this issue Sep 21, 2020 · 20 comments

Comments

@cmungall
Copy link
Collaborator

On Translator Relay call, SPOKE mention difficulties in mapping some types, e.g food, cell types.

In fact we have cell types: https://biolink.github.io/biolink-model/docs/Cell

Copied from Zoom chat:

it is the anatomy localized AnatomyCellType that we have had some trouble with, e.g. identifier:
UBERON:0002185/CL:0002368
 name:
respiratory epithelial cells in bronchus

I can comment further but input requested from SPOKE team

@karthiksoman
Copy link

It would be great if we can sort this data modeling issue. In SPOKE, "AnatomyCellType" nodes have got a heterogenous ID. For example, we have the SPOKE ID "UBERON:0001155/CL:0000111" for "peripheral nerve/ganglion in colon" node. So, we are not sure, how to formulate a CURIE to feed into the node normalization service to fetch the preferred Biolink ID.

@karthiksoman
Copy link

In addition to the above query, it would be great if you can let us know if there is a way to map the following 4 node types to Biolink entity. Currently I do not find an appropriate biolink mapping for these node types.
(1) SideEffect
(2) Food
(3) PharmacologicClass
(4) Nutrient

@cmungall
Copy link
Collaborator Author

I see. What you are describing is what is called post-composition or post-coordination.

This is often confusing because often the ontology will pre-compose the term you need. For example, the original example was respiratory epithelial cell in the bronchus. This is already coordinated in CL as http://purl.obolibrary.org/obo/CL_0002328, and if you load the ontology KG you will see the part-of to bronchus and the is-a (transitive) to respiratory epithelial cell.

If you don't have the concept pre-composed then we can easily make one quickly for you. In fact we have easily set up a workflow for you whereby you give us a 2 column TSV and we give a CL ID, either existing or newly created.

However, there may still be scenarios where you want to post-compose (CL->anatomy or otherwise). Here is what I recommend:

  • this is still a cell type just like any other cell type in CL, so just classify as bl:cell
  • create a bl:subClassOf link to the core CL concept (e.g neuron)
  • create a part-of link to the anatomical concept (e.g. colon)
  • it's up to you what ID scheme you use. However, I might recommend that we do something like hash the OWL class expression (in this case 'peripheral nerve cell' AND part-of SOME colon).

There is a lot of literature on this topic, I will summarize some of it here later. In general I would recommend pre-composition for your scenario

@cmungall
Copy link
Collaborator Author

Currently I do not find an appropriate biolink mapping for these node types.

We may want to make specific tickets

(1) SideEffect

Would PhenotypicFeature work here?

(2) Food

See #248

(3) PharmacologicClass
(4) Nutrient

I'll get to these later, we have ongoing discussion about roles

@karthiksoman
Copy link

I see. What you are describing is what is called post-composition or post-coordination.

This is often confusing because often the ontology will pre-compose the term you need. For example, the original example was respiratory epithelial cell in the bronchus. This is already coordinated in CL as http://purl.obolibrary.org/obo/CL_0002328, and if you load the ontology KG you will see the part-of to bronchus and the is-a (transitive) to respiratory epithelial cell.

If you don't have the concept pre-composed then we can easily make one quickly for you. In fact we have easily set up a workflow for you whereby you give us a 2 column TSV and we give a CL ID, either existing or newly created.

However, there may still be scenarios where you want to post-compose (CL->anatomy or otherwise). Here is what I recommend:

  • this is still a cell type just like any other cell type in CL, so just classify as bl:cell
  • create a bl:subClassOf link to the core CL concept (e.g neuron)
  • create a part-of link to the anatomical concept (e.g. colon)
  • it's up to you what ID scheme you use. However, I might recommend that we do something like hash the OWL class expression (in this case 'peripheral nerve cell' AND part-of SOME colon).

There is a lot of literature on this topic, I will summarize some of it here later. In general I would recommend pre-composition for your scenario

Thank you Chris for the reply. This is indeed helpful. Pre-composition seems to be pretty elegant solution. I got two queries about the pre-composition workflow that you mentioned.
(1) Firstly, the two column tsv that you mentioned, I presume, each column is an ID of a particular class. For example, in the case of the example "UBERON:0001155/CL:0000111", one column will be ""UBERON:0001155" and the other column will be "CL:0000111". Please correct me if this is not the case.
(2) Where can I pursue the workflow that you mentioned? Is it some sort of pre-composition service or any python code?
It would be great if you can clarify these.
Thank you.

@cmungall
Copy link
Collaborator Author

cmungall commented Oct 9, 2020

I can coordinate getting the cell types pre-composed in CL. Any format will do. If you have a list already I'll take a look to see if they are appropriate for pre-composition. There are no hard and fast rules here. But generally "kidney macrophage" is a reasonable term, "epithelial cell of left pinky" is not

@karthiksoman
Copy link

Thanks Chris for the reply. I haven't made the list now, but I can make it and am more than happy to send it to you. We were thinking of doing this pre-composition process to "AnatomyCelllType" SPOKE node during the weekly update of SPOKE. In that case, can you please let us know if there is a way to automate the pre-composition process?

@nlharris
Copy link
Contributor

Does anyone know if there is a way to automate the pre-composition process?

@karthiksoman
Copy link

As per the discussion we had during the data modelling meeting (happened on Sept-09-2021), I am hereby stating the broader context where we use "AnatomyCelllType" node type in our knowledge graph.
We connect "AnatomyCelllType" with "Gene" using "AnatomyCellType-expresses-Gene" edge type. This edge represents the gene expression in a specific cell type in a specific tissue from the Human Protein Atlas.
Apart from that, we also connect "AnatomyCelllType" to their respective "Anatomy" and "CellType" nodes (using "AnatomyCellType-isin-Anatomy" and "AnatomyCellType-isin-CellType").

@nlharris
Copy link
Contributor

We discussed this today during the Help Desk call.

@sierra-moxon
Copy link
Member

@karthiksoman - any chance you have a file for this yet?

@karthiksoman
Copy link

Hi Sierra. I am hereby attaching AnatomyCelltype file from SPOKE to post compose to Cell Ontology as we discussed. Please let me know if there is anything else that needs to be done from our side
SPOKE_AnatomyCelltype_file_to_postcompose_Nov_18_2021.csv
.

@sierra-moxon
Copy link
Member

@karthiksoman - Have we done the resolution necessary here? Or are you expecting more terms to be added/submitted?

@karthiksoman
Copy link

@sierra-moxon Thanks for the reminder. Last update that was made from my end was consolidating all the Anatomy-Cell type nodes in SPOKE and shared with you, so that it could be post composed to Cell Ontology. From my end, there are no further additions. May I know if that is now post-composed in Cell Ontology?

@mbrush
Copy link
Collaborator

mbrush commented May 11, 2022

Hi all. I want to revisit this decision to pre-compose anatomy-specific cell type classes in light of recent work toward capturing more statement semantics in qualifiers. One of the guiding principles we established for this work was to not create dependencies on external ontologies when a concept can be representing using post-composition using subject/object qualifiers. For example, we decided (with Chris's recommendation) that we would represent exposures to some entity using the pattern Entity (qualifier:'Exposure'), rather than use or submit the term to an ontology like ECTO. Same for things like 'severe bleeding' (post compose, rather than request term from HP), or 'late stage ebola' (post-composed, rather than submit term request to a disease ontology). Similarly, the pattern that was proposed for anatomy-specific cell types was to use the existing CL term for the anatomy-agnostic cell type as the S/O node IRI, and use a subject_location qualifier to capture an Uberon term indicating the anatomical context of this cell type - e.g. Macrophage (qualifier:kidney).

@cmungall Your recommendation here to add pre-composed terms to CL and use this as subject node IRIs seems to go against this principle. Perhaps you see something different about the anatomy-specific cell type use case that makes pre-composing acceptable here (e.g. ease of getting into the relevant ontology, where there is precedent for these types of classes). But I just want to be careful that we are as principled and consistent as possible in how we represent a given type of semantic in our models. For example, we have already encountered other cases where we want to constrain the anatomical context of other types of entities (biological processes , molecular activities, medical procedures), and here we resorted to post-composing (as it was deemed not appropriate to create anatomy-specific classes in an existing ontology in these cases). What are the implications if we allow for anatomical context to be pre-composed for cell types, but not other types of entities/concepts? Having SPOKE do this means that pre-composition becomes the standard way to represent anatomy-specific cell types - so other data providers need to also go through the process of requesting terms from CL and waiting form IRIs to be minted.

I am happy to revisit our earlier principles/decisions - and consider if we might allow for an approach where we use pre-composition in some cases but not others - if we can find some principled rationale to guide such decisions. Maybe this is something we want to discuss on a DM call soon - as we are collecting more and more use cases where there is a potential for pre-composition - and we need to establish some clear rules for when to do this. And also consider how we might create/use tools to translate between pre- and post-composed representations - if this provides a solution to the problem.

Thoughts @cmungall @sierra-moxon @mikebada?

@cmungall
Copy link
Collaborator Author

cmungall commented May 19, 2022

@karthiksoman - I'm looking at some of the pairs

In many cases the anatomical qualifier is redundant, an OWL reasoner tells us the composed concept is equivalent to the CL type:

UBERON:0000473,testis,CL:0000178,Leydig cell

image

UBERON:0000473,testis,CL:0000216,Sertoli cell

image

Same is true for:

UBERON:0000970,eye,CL:0000575,corneal epithelial cell
UBERON:0000970,eye,CL:0000142,vitreous cell
UBERON:0000970,eye,CL:0002224,lens epithelial cell
UBERON:0000970,eye,CL:0011004,lens fiber cell
UBERON:0002370,thymus,CL:0000883,thymic cortical macrophage
UBERON:0000966,retina,CL:0000740,retinal ganglion cell

For this one:

UBERON:0000966,retina,CL:0000149,pigment cell

I would avoid using very general functional terms like "pigment cell".

I think this is the pre-composed concept you want:

image

In contrast, an OWL reasoner would tell us that this is unsatisfiable:

UBERON:0002370,thymus,CL:0000336,medullary chromaffin cell of adrenal gland

As I believe thymus and adrenal gland are spatially disjoint (though functionally related)

There are a whole host of terms like this:

UBERON:0000002,uterine cervix,CL:1001586,mammary gland glandular cell
UBERON:0001155,colon,CL:1001586,mammary gland glandular cell
UBERON:0002114,duodenum,CL:1001586,mammary gland glandular cell
UBERON:0001295,endometrium,CL:1001586,mammary gland glandular cell
UBERON:0001301,epididymis,CL:1001586,mammary gland glandular cell
UBERON:0003889,fallopian tube,CL:1001586,mammary gland glandular cell
UBERON:0002110,gall bladder,CL:1001586,mammary gland glandular cell
UBERON:0001132,parathyroid gland,CL:1001586,mammary gland glandular cell
UBERON:0002367,prostate gland,CL:1001586,mammary gland glandular cell
UBERON:0001052,rectum,CL:1001586,mammary gland glandular cell
UBERON:0001829,major salivary gland,CL:1001586,mammary gland glandular cell
UBERON:0000998,seminal vesicle,CL:1001586,mammary gland glandular cell
UBERON:0002108,small intestine,CL:1001586,mammary gland glandular cell
UBERON:0000945,stomach,CL:1001586,mammary gland glandular cell
UBERON:0002046,thyroid gland,CL:1001586,mammary gland glandular cell
UBERON:0002369,adrenal gland,CL:1001586,mammary gland glandular cell
UBERON:0001154,vermiform appendix,CL:1001586,mammary gland glandular cell

@karthiksoman
Copy link

@cmungall Thanks for this details. Yes, I understand there exists redundancy in the Anatomy qualifier. So, are you suggesting that we could leave such redundant entities as they are now (for e.g. UBERON:0000473,testis,CL:0000216,Sertoli cell) and pre-compose only the others (for e.g. UBERON:0001154,vermiform appendix,CL:1001586,mammary gland glandular cell)?

@cmungall
Copy link
Collaborator Author

What I am saying is that for this one:

UBERON:0000473,testis,CL:0000216,Sertoli cell

You can just use the node CL:0000216 in your graph. it is already part of the testis, you get this edge when you bring in CL

This one:

UBERON:0001154,vermiform appendix,CL:1001586,mammary gland glandular cell

doesn't make sense unless we are talking about some kind of metastatic cell

@karthiksoman
Copy link

@cmungall Got it Chris.
Regarding the second example (appendix and mammary gland glandular cell), when I checked SPOKE, it showed the name for that AnatomyCellType node as "glandular cells in appendix" with the identifier as "UBERON:0001154/CL:1001586". However, CL:1001586 corresponds to "mammary gland glandular cell" in Cell Ontology. So, in this example I think, instead of providing the CL id for "appendix glandular cell", it might have given the CL id for "mammary gland glandular cell". (Then I think the question arises, if there is a CL id for "appendix glandular cell" then why do we need a separate AnatomyCelltype node for that entity?).
I can bring this mismatch during our internal SPOKEtech discussion. I see that, you have given a bunch of such examples above. Thank you for pointing this out. It would be great if you can let me know if you come across more of such scenarios and this will help us to fine tune things.

@sierra-moxon
Copy link
Member

@karthiksoman - do you think you have enough info here to update the data in SPOKE accordingly, or are there anatomical entities that are still needed after Chris's comments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants