-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add in NER stats from SemMedDB to the semmeddb2 API #606
Comments
FYI: I also saw some possible predicate info and sentence-predication confidence info:
originally posted here |
Example#1:
|
Statistics of all NER stats
So NER shows high confidence in the connection between the entity texts and concepts. A threshold around 800 seems weak. |
Statistics of predication list lengths (i.e.
|
STAT | predication_count |
---|---|
TOTAL | 24481939 |
MIN | 1 |
MAX | 64451 |
MEAN | 3.65 |
MEDIAN | 1 |
2.5TH PERCENTILE | 1 |
25TH PERCENTILE | 1 |
50TH PERCENTILE | 1 |
75TH PERCENTILE | 2 |
97.5TH PERCENTILE | 15 |
The documents with the max predication_count
is exactly C0023884-PART_OF-C0034693
(Liver
PART_OF Rattus norvegicus
) which caused the BSONObjectTooLarge
error to MongoDB.
If we apply a threshold of 1000 to the length of predication lists, 4,293
documents out of 24,481,939
(i.e. 0.0175%
) will be affected.
14 predication records, invovling 10
|
@colleenXu @andrewsu {
"_id": "...",
"predication": [
{
"object_score": <int>,
"object_text": <str>,
"subject_score": <int>,
"subject_text": <str>
},
# omitted
]
} |
Super, this looks great, thanks! |
Now that we've created the new https://biothings.ncats.io/semmeddb2 API as part of #569 to investigate filtering strategies to improve signal/noise, let's also join in information about the Named Entity Recognition (NER) from the PREDICATION_AUX table (https://lhncbc.nlm.nih.gov/ii/tools/SemRep_SemMedDB_SKR/dbinfo.html):
I can really only imagine us using the SUBJECT_TEXT AND SUBJECT_SCORE values (plus the corresponding OBJECT_ values), so let's focus on those. We can add these values to the
predication
object at the same level as thepredication_id
:The text was updated successfully, but these errors were encountered: