Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New OBCI #10

Closed
Reeya123 opened this issue Apr 18, 2023 · 4 comments
Closed

New OBCI #10

Reeya123 opened this issue Apr 18, 2023 · 4 comments
Assignees

Comments

@Reeya123
Copy link

Mar 7, 2023

Darren:

A few notes on what's new (aside from more terms) and some considerations:

  1. "Normal" ranges: These turn out to be fairly useless. I can look at three different resources and find three different ranges, as they are dependent not only on the specific biospecimen used, but also how they were measured, the subject population, sex, age, etc.
    We should not call these "normal" range. We should call this "reference" instead. I have some text mining and EHR mining ideas to get these reference ranges. Jorge Sepulveda can do this with help from our student.

Raja:

The nomenclature doesn't matter. I can prove that these are of no use unless tied to very specific scenarios. I personally think it would be irresponsible to present these without all the caveats they will require (markerDB made a very big error related to reference ranges), but gathering that information will be difficult. TM might help in some cases.

Darren:

  1. I revised many of the relation names to be more descriptive (so, indicated_by_level_of --> indicated_by_difference_in_level_of and
    indicated_by_increased_level_of --> indicated_by_above_normal_level_of)

  2. Added (most) definitions.

  3. Made biomarkers sample-dependent when appropriate. I came to the realization that to have a biomarker annotated with biospecimen and disease separately is not good enough. For example, suppose that there is a biomarker defined as 'indicated_by_above_normal_level_of unicornase'. Suppose further that a paper shows that high levels of unicornase is diagnostic for COVID-19, and the study was done using blood. Another paper finds that high levels of unicornase is diagnostic for diabetes, this time using urine. I previously would have had something like this:

DEFINITION:
indicated_by_above_normal_level_of unicornase

ANNOTATIONS:
sampled_from blood
sampled_from urine
diagnostic_for COVID-19
diagnostic for diabetes

The problem is that one might assume that both COVID-19 and diabetes can be diagnosed using blood or urine, which might not be the case, or even that the diagnosis for COVID-19 was done using urine, which is incorrect. My solution was to roll the biospecimen into the definition:

DEFINITION 1:
indicated_by_above_normal_level_of unicornase in blood

ANNOTATION 1:
diagnostic_for COVID-19

DEFINITION 2:
indicated_by_above_normal_level_of unicornase in urine

ANNOTATION 2:
diagnostic for diabetes

Note that this brings us to the ability to say different things about these different scenarios.

  1. Changed how biomarkers are named. With the added biospecimen info, the previously format would have made it something like 'increased level of unicornase in blood' but instead I chose to focus on the assessed entity: 'unicornase in blood above normal'. The reason for this is because it allows a direct visual determination if a particular entity in a particular biospecimen has both above normal and below normal biomarkers (because displays are based on alphabetical ordering).

  2. Revised the 'sampled_from' relation to 'determined_using_sample_from' so that the biomarker can be related directly to the biospecimen. So, for a biomarker "unicornase in blood above normal" we'd have

biomarker and (indicated_by_above_normal_level_of some unicornase) and (determined_using_sample_from some blood)

I did try Dan's suggestion again, which would be

biomarker
and (indicated_by_above_normal_level_of some (unicornase and
sampled_from some blood))

but that won't automatically reason into a 'blood biomarker' even if I configure blood biomarkers (currently defined as 'biomarker and determined_using_sample_from some blood') to use the (now-old) sampled_from relation.

Can we replace "normal" with "reference" everywhere?

@Reeya123
Copy link
Author

March 13, 2023

Raja:

Sure. Still means the same thing.

@Reeya123
Copy link
Author

Mar 7, 2023

Darren:

DISCUSSION POINT:
In the latest implementation I created what is called a 'shadow hierarchy' for disease-based biomarkers. A shadow hierarchy is one that mimics another. With the shadow hierarchy, we'll have biomarkers that reflect the DO hierarchy:

disease biomarker of disease
disease by infectious agent biomarker of disease by
infectious agent
COVID-19 biomarker of COVID-19

While this facilitates browsing, it comes with the cost of extra work. I don't have a strong feeling one way or another, but users might have a preference. Without the shadow hierarchy one could still get to the biomarkers of interest by doing a DL query.

@Reeya123
Copy link
Author

Mar 13, 2023

Raja:

Don't know enough about the usecase you are proposing to weigh in. For now whatever you are doing seems okay.
The use case is when you suspect a particular medical condition is the issue, and you want to order tests to verify.

Lets not do normal or reference range in the ontology. It can be annotations in a biomarker DB

@Reeya123
Copy link
Author

Mar 13, 2023

Darren:

Done properly, it can be useful in a database. It won't be useful for an ontology (even if done perfectly) since you can't do math with it.

@Reeya123 Reeya123 assigned Reeya123 and danlymangw and unassigned Reeya123 Apr 25, 2023
@Reeya123 Reeya123 reopened this Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants