Update scispacy version on streamlit demo #342

JohnGiorgi · 2021-03-23T15:10:24Z

I am getting different results for the same input text when I use the streamlit demo vs. when I run the code locally. The text in question:

text = "The structural unit of the secretory Na+-K+-2Cl- cotransporter (NKCC1) is a homodimer."

NER results using "en_ner_jnlpba_md" on streamlit demo

Then, running things locally:

import spacy

model = "en_ner_jnlpba_md"
nlp = spacy.load(model)

doc = nlp("The structural unit of the secretory Na+-K+-2Cl- cotransporter (NKCC1) is a homodimer.")

for ent in doc.ents:
    print(f"{ent.text}\t{ent.label_}")

"""
The above prints:

NKCC1	DNA
homodimer	PROTEIN
"""

My pyproject.toml has the following dependencies:

scispacy = "^0.4.0"
en_ner_jnlpba_md = {url = "https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_ner_jnlpba_md-0.4.0.tar.gz"}

Any idea what might be causing this? I consider the streamlit demo response to be more correct, and am interesting in getting the same result locally!

Also, I am only showing one example here, but I found I could quickly come up with other examples where the streamlit demo specialized NER results were better (IMO) than the results I got locally. A second example is:

text = "Fourteen residues of the U1 snRNP-specific U1A protein are required for homodimerization, cooperative RNA binding, and inhibition of polyadenylation."

The text was updated successfully, but these errors were encountered:

MichalMalyska · 2021-03-23T15:22:31Z

The streamlit seems to be running scispacy_lg version 0.2.4 and the one you can currently download is 0.4.0, that might be the issue (I ran into this a couple of times when dealing with incompatibilities with spacy 2.X):

"lang":"en"
"name":"core_sci_lg"
"version":"0.2.4"
"spacy_version":">=2.2.1"

EDIT: (The only way I know of to get the specific version of models is to wget from the old link and pip install from file, you can't specify versions in pip install from the links, it automatically gets overwritten with the most up-to-date one)

Second EDIT: (I just tried the pip install with specific version link and it works)

JohnGiorgi · 2021-03-23T15:32:37Z

You are not referring to the "specialized NER model" here though, right? (and specifically, "en_ner_jnlpba_md")

I can see that the streamlit demo loads two spacy models, spacy_model (here) and ner_model (here). My question is about ner_model, which should be unaffected by spacy_model.

AFAICT, the results of the specialized NER in the streamlit demo depends only on ner_model (see here and here).

MichalMalyska · 2021-03-23T15:42:46Z

You're totally right. I couldn't find which version of en_ner_jnlpba_md they are using on streamlit demo, but given that en_core_sci_lg was older, it wouldn't surprise me if the en_ner_jnlpba_md was too.

EDIT:

with version 0.3.0 of en_ner_jnlpba_md and spacy 2.3.2 I got:
secretory Na+-K+-2Cl- cotransporter PROTEIN
NKCC1 PROTEIN
homodimer PROTEIN

while with 0.4.0 (and spacy 3.0.5) I got:
NKCC1 DNA
homodimer PROTEIN

JohnGiorgi · 2021-03-23T16:17:25Z

@MichalMalyska Yeah you are totally right. When I load the 0.3.0 version of model and repo, I get results that closely match the streamlit demo for both of my examples. Weird, because at least for these examples, the output of the 0.4.0 version model/code is worse (IMO).

MichalMalyska · 2021-03-24T19:37:35Z

@danielkingai2 I guess the bigger underlying problem is why are the 0.4.0 models so much worse than the older versions.

MichalMalyska · 2021-07-20T19:02:50Z

I think this could be one reason:
explosion/spaCy#8138

dakinggg · 2021-07-22T21:06:58Z

I did my best to match everything to the old versions, and our reported accuracy didn't drop much I don't think, but there are a bunch of hyperparams that we haven't really done any search over, just tried to use whatever spacy is using. If you wanted to play around with retraining with different hyperparameters or something, all the training scripts should be clear from project.yml

mbrunecky · 2021-07-26T16:07:00Z

As an original author of explosion/spaCy#8138 (which has been closed), I still keep trying to figure out what has changed.
I have a case where the 'accuracy' in the downstream application has dropped over 20%, despite Spacy training validation scores dropping less than 5%. There is a clear, consistent case where for my triplet of entities such as:
JOHN BROWN and JANE BROWN as trustees of JOHN AND JANE FAMILY TRUST
the Spacy-2 correctly predicts all 3 entities above
whereas Spacy-3 only predicts the first one (JANE BROWN) in 200 out of 1000 test documents.
Honnibal suggested there was some change in 'dropping entities' that can not be predicted, and perhaps that change is doing more than envisioned. I am trying to see if I can reproduce the same behavior using other data sets.

JohnGiorgi changed the title ~~Different results for identical input on streamlit demo Vs. running locally~~ Different results for identical input on streamlit demo vs. running locally Mar 23, 2021

dakinggg changed the title ~~Different results for identical input on streamlit demo vs. running locally~~ Update scispacy version on streamlit demo Mar 24, 2021

dakinggg added the bug Something isn't working label Mar 24, 2021

MichalMalyska mentioned this issue Mar 25, 2021

what controls the order of UMLS linked entities from scispacy if the scores are all 1 #344

Closed

MichalMalyska mentioned this issue May 17, 2021

The online demo and local results are different in Specialized NER #354

Closed

MichalMalyska mentioned this issue Jun 7, 2021

Demo and offiline version of Scispacy gives different results . #363

Closed

dakinggg mentioned this issue Apr 29, 2023

Different results between Scispacy - Demo and my code for the same sentence #475

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update scispacy version on streamlit demo #342

Update scispacy version on streamlit demo #342

JohnGiorgi commented Mar 23, 2021 •

edited

Loading

MichalMalyska commented Mar 23, 2021 •

edited

Loading

JohnGiorgi commented Mar 23, 2021 •

edited

Loading

MichalMalyska commented Mar 23, 2021 •

edited

Loading

JohnGiorgi commented Mar 23, 2021

MichalMalyska commented Mar 24, 2021

MichalMalyska commented Jul 20, 2021

dakinggg commented Jul 22, 2021

mbrunecky commented Jul 26, 2021

Update scispacy version on streamlit demo #342

Update scispacy version on streamlit demo #342

Comments

JohnGiorgi commented Mar 23, 2021 • edited Loading

MichalMalyska commented Mar 23, 2021 • edited Loading

JohnGiorgi commented Mar 23, 2021 • edited Loading

MichalMalyska commented Mar 23, 2021 • edited Loading

JohnGiorgi commented Mar 23, 2021

MichalMalyska commented Mar 24, 2021

MichalMalyska commented Jul 20, 2021

dakinggg commented Jul 22, 2021

mbrunecky commented Jul 26, 2021

JohnGiorgi commented Mar 23, 2021 •

edited

Loading

MichalMalyska commented Mar 23, 2021 •

edited

Loading

JohnGiorgi commented Mar 23, 2021 •

edited

Loading

MichalMalyska commented Mar 23, 2021 •

edited

Loading