Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add literature utils #4

Merged
merged 7 commits into from
Jan 31, 2024
Merged

Add literature utils #4

merged 7 commits into from
Jan 31, 2024

Conversation

cthoyt
Copy link
Member

@cthoyt cthoyt commented Jan 31, 2024

This PR adds a workflow for searching literature, retrieving abstracts, annotating them, and summarizing the results.

Specifically, it's a demo of how you can quickly re-use the pre-defined lexica. For example, querying for diabetes (limited to 300 recent articles) and grounding using the phenotype lexicon in the following code gives:

from biolexica import load_grounder
from biolexica.literature import annotate_abstracts_from_search
from biolexica.literature.analyze import count_cooccurrences, count_references

query = "diabetes"
grounder = load_grounder("phenotype")
annotated_articles = annotate_abstracts_from_search(query, grounder=grounder, limit=300)

# Analysis
reference_counter = count_references(annotated_articles)
co_occurrence_counter = count_cooccurrences(annotated_articles)

Occurrences

Reference Name Count
doid:9351 diabetes mellitus 95
doid:9352 type 2 diabetes mellitus 54
mesh:D013812 treatment 52
doid:4 disease 49
mesh:D012380 Role 31
efo:0001461 control 29
efo:0000246 age 28
mesh:D001244 Association 26
mondo:0021137 not rare 25
efo:0003919 risk factor 24

Co-occurrences

Left Reference Left Name Right Reference Right Name Count
doid:9351 diabetes mellitus mesh:D013812 treatment 32
doid:4 disease doid:9351 diabetes mellitus 31
doid:9351 diabetes mellitus doid:9352 type 2 diabetes mellitus 21
doid:4 disease mesh:D013812 treatment 20
doid:9351 diabetes mellitus mondo:0021137 not rare 18
doid:9351 diabetes mellitus mesh:D012380 Role 17
doid:9352 type 2 diabetes mellitus efo:0000246 age 16
doid:9352 type 2 diabetes mellitus efo:0001461 control 16
doid:9351 diabetes mellitus efo:0003919 risk factor 16
doid:4 disease doid:9352 type 2 diabetes mellitus 15

A few things that are immediately obvious:

  1. Need some notion of exclude lists in the lexica construction to remove frequently occurring, but not specific entities like "Disease" and "Role"

Copy link

codecov bot commented Jan 31, 2024

Codecov Report

Attention: 164 lines in your changes are missing coverage. Please review.

Comparison is base (7b16129) 23.83% compared to head (d965793) 13.17%.

❗ Current head d965793 differs from pull request most recent head 874d926. Consider uploading reports for the commit 874d926 to get more accurate results

Files Patch % Lines
src/biolexica/literature/annotate.py 0.00% 53 Missing ⚠️
src/biolexica/literature/retrieve.py 0.00% 46 Missing and 1 partial ⚠️
src/biolexica/literature/search.py 0.00% 24 Missing ⚠️
src/biolexica/literature/__main__.py 0.00% 15 Missing ⚠️
src/biolexica/literature/analyze.py 0.00% 11 Missing ⚠️
src/biolexica/api.py 23.07% 8 Missing and 2 partials ⚠️
src/biolexica/literature/__init__.py 0.00% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main       #4       +/-   ##
===========================================
- Coverage   23.83%   13.17%   -10.67%     
===========================================
  Files           6       12        +6     
  Lines         172      334      +162     
  Branches       37       63       +26     
===========================================
+ Hits           41       44        +3     
- Misses        129      286      +157     
- Partials        2        4        +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cthoyt cthoyt merged commit ce27dde into main Jan 31, 2024
6 checks passed
@cthoyt cthoyt deleted the literature branch January 31, 2024 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant