Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new install attempt with SNOMED model throws spacy/thinc config validation error #293

Closed
ryleyb opened this issue Jan 13, 2023 · 4 comments

Comments

@ryleyb
Copy link

ryleyb commented Jan 13, 2023

I'm just trying to get up and running with code from the tutorial for SNOMED. I have downloaded mc_modelpack_snomed_int_16_mar_2022_25be3857ba34bdd5.zip. I did not get far unfortunately... I run this code:

from medcat.cat import CAT

# Download the model_pack from the models section in the github repo.
cat = CAT.load_model_pack('<path to file>/mc_modelpack_snomed_int_16_mar_2022_25be3857ba34bdd5.zip')

Output is this:

Traceback (most recent call last):
  File "./test.py", line 4, in <module>
    cat = CAT.load_model_pack('<path>/mc_modelpack_snomed_int_16_mar_2022_25be3857ba34bdd5.zip')
  File "./venv/lib/python3.9/site-packages/medcat/cat.py", line 352, in load_model_pack
    cat = cls(cdb=cdb, config=cdb.config, vocab=vocab, meta_cats=meta_cats, addl_ner=addl_ner)
  File "./venv/lib/python3.9/site-packages/medcat/cat.py", line 101, in __init__
    self._create_pipeline(self.config)
  File "./venv/lib/python3.9/site-packages/medcat/cat.py", line 109, in _create_pipeline
    self.pipe.add_tagger(tagger=tag_skip_and_punct,
  File "./venv/lib/python3.9/site-packages/medcat/pipe.py", line 66, in add_tagger
    self._nlp.add_pipe(component_factory_name, name=name, first=True)
  File "./venv/lib/python3.9/site-packages/spacy/language.py", line 776, in add_pipe
    pipe_component = self.create_pipe(
  File "./venv/lib/python3.9/site-packages/spacy/language.py", line 660, in create_pipe
    resolved = registry.resolve(cfg, validate=validate)
  File "./venv/lib/python3.9/site-packages/thinc/config.py", line 746, in resolve
    resolved, _ = cls._make(
  File "./venv/lib/python3.9/site-packages/thinc/config.py", line 795, in _make
    filled, _, resolved = cls._fill(
  File "./venv/lib/python3.9/site-packages/thinc/config.py", line 850, in _fill
    filled[key], validation[v_key], final[key] = cls._fill(
  File "./venv/lib/python3.9/site-packages/thinc/config.py", line 929, in _fill
    filled, final = cls._update_from_parsed(validation, filled, final)
  File "./venv/lib/python3.9/site-packages/thinc/config.py", line 951, in _update_from_parsed
    filled[key], final[key] = cls._update_from_parsed(
  File "./venv/lib/python3.9/site-packages/thinc/config.py", line 947, in _update_from_parsed
    filled[key] = value
  File "./venv/lib/python3.9/site-packages/medcat/config.py", line 34, in __setitem__
    setattr(self, arg, val)
  File "./venv/lib/python3.9/site-packages/pydantic/main.py", line 445, in __setattr__
    raise ValidationError([error_], self.__class__)
pydantic.error_wrappers.ValidationError: 1 validation error for Config
linking -> filters -> cuis
  value is not a valid set (type=type_error.set)

In my python env this is what is installed (a fresh python 3.9 virtual environment in which all I've done is pip install medcat):

Package                  Version
------------------------ -----------
aiofiles                 0.8.0
aiohttp                  3.8.3
aiosignal                1.3.1
anyio                    3.6.2
appnope                  0.1.3
argon2-cffi              21.3.0
argon2-cffi-bindings     21.2.0
arrow                    1.2.3
asttokens                2.2.1
async-timeout            4.0.2
attrs                    22.2.0
backcall                 0.2.0
beautifulsoup4           4.11.1
bleach                   5.0.1
blis                     0.7.5
catalogue                2.0.8
certifi                  2022.12.7
cffi                     1.15.1
charset-normalizer       2.1.1
click                    8.0.4
comm                     0.1.2
contourpy                1.0.7
cycler                   0.11.0
cymem                    2.0.7
datasets                 2.2.2
debugpy                  1.6.5
decorator                5.1.1
defusedxml               0.7.1
dill                     0.3.4
eland                    8.3.0
elastic-transport        8.4.0
elasticsearch            8.6.0
entrypoints              0.4
executing                1.2.0
fastjsonschema           2.16.2
filelock                 3.9.0
fonttools                4.38.0
fqdn                     1.5.1
frozenlist               1.3.3
fsspec                   2022.11.0
gensim                   4.1.2
huggingface-hub          0.11.1
idna                     3.4
importlib-metadata       6.0.0
interchange              2021.0.4
ipykernel                6.20.1
ipython                  8.8.0
ipython-genutils         0.2.0
ipywidgets               7.6.6
isoduration              20.11.0
jedi                     0.18.2
Jinja2                   3.1.2
joblib                   1.2.0
jsonpickle               2.0.0
jsonpointer              2.3
jsonschema               4.17.3
jupyter_client           7.4.9
jupyter_core             5.1.3
jupyter-events           0.6.3
jupyter_server           2.1.0
jupyter_server_terminals 0.4.4
jupyterlab-pygments      0.2.2
jupyterlab-widgets       1.1.1
kiwisolver               1.4.4
MarkupSafe               2.1.1
matplotlib               3.6.3
matplotlib-inline        0.1.6
medcat                   1.6.0
mistune                  2.0.4
monotonic                1.6
multidict                6.0.4
multiprocess             0.70.12
murmurhash               1.0.9
nbclassic                0.4.8
nbclient                 0.7.2
nbconvert                7.2.7
nbformat                 5.7.3
nest-asyncio             1.5.6
notebook                 6.5.2
notebook_shim            0.2.2
numpy                    1.24.1
packaging                23.0
pandas                   1.5.2
pandocfilters            1.5.0
pansi                    2020.7.3
parso                    0.8.3
pathy                    0.10.1
pexpect                  4.8.0
pickleshare              0.7.5
Pillow                   9.4.0
pip                      22.3.1
platformdirs             2.6.2
preshed                  3.0.8
prometheus-client        0.15.0
prompt-toolkit           3.0.36
psutil                   5.9.4
ptyprocess               0.7.0
pure-eval                0.2.2
py2neo                   2021.2.3
pyarrow                  10.0.1
pycparser                2.21
pydantic                 1.8.2
Pygments                 2.14.0
pyparsing                3.0.9
pyrsistent               0.19.3
python-dateutil          2.8.2
python-json-logger       2.0.4
pytz                     2022.7
PyYAML                   6.0
pyzmq                    25.0.0
regex                    2022.10.31
requests                 2.28.2
responses                0.18.0
rfc3339-validator        0.1.4
rfc3986-validator        0.1.1
scikit-learn             1.1.3
scipy                    1.10.0
Send2Trash               1.8.0
setuptools               65.6.3
six                      1.16.0
smart-open               5.2.1
sniffio                  1.3.0
soupsieve                2.3.2.post1
spacy                    3.1.3
spacy-legacy             3.0.11
srsly                    2.4.5
stack-data               0.6.2
terminado                0.17.1
thinc                    8.0.17
threadpoolctl            3.1.0
tinycss2                 1.2.1
tokenizers               0.12.1
torch                    1.13.1
tornado                  6.2
tqdm                     4.64.1
traitlets                5.8.1
transformers             4.19.4
typer                    0.4.2
typing_extensions        4.4.0
uri-template             1.2.0
urllib3                  1.26.14
wasabi                   0.10.1
wcwidth                  0.2.5
webcolors                1.12
webencodings             0.5.1
websocket-client         1.4.2
widgetsnbextension       3.5.2
xxhash                   3.0.0
yarl                     1.8.2
zipp                     3.11.0
@ryleyb ryleyb changed the title new install attempt with SNOMED model throws spacy/thnc config validation error new install attempt with SNOMED model throws spacy/thinc config validation error Jan 13, 2023
@ryleyb
Copy link
Author

ryleyb commented Jan 13, 2023

To answer my own question, I did the other suggested example in the tutorial, and added an extra couple lines to fix that issue:

from medcat.vocab import Vocab
from medcat.cdb import CDB
from medcat.cat import CAT
from medcat.meta_cat import MetaCAT

unzip = '<path>/mc_modelpack_snomed_int_16_mar_2022_25be3857ba34bdd5/'
# Load the vocab model you downloaded
vocab = Vocab.load(unzip+'vocab.dat')
# Load the cdb model you downloaded
cdb = CDB.load(unzip+'cdb.dat')

#needed to add these two lines
cdb.config.linking.filters.cuis = set()
cdb.config.general.spacy_model = unzip+'spacy_model'

# Download the mc_status model from the models section below and unzip it
mc_status = MetaCAT.load(unzip+'meta_Status/')
cat = CAT(cdb=cdb, config=cdb.config, vocab=vocab, meta_cats=[mc_status])

@ryleyb ryleyb closed this as completed Jan 13, 2023
@vvcb
Copy link

vvcb commented Feb 24, 2023

@ryleyb , I ran into the same problem with medcat==1.7.0 and can confirm that your solution resolved it.

@mart-r
Copy link
Collaborator

mart-r commented Feb 24, 2023

The problem is the fact that the model you guys are trying to run is from medcat v1.2. However, version 1.3 introduced some changes (mostly to do with data validation) that mean the default of {} (which is parsed as a dictionary) is not correct where a set is expected.

I've created a PR (#313) that will allow one to fix the issue for a model pack without in-code manual intervention.

@KimBenjaminTang
Copy link

The problem also occured for me today but using this code snipppet also fixed it for me.

To answer my own question, I did the other suggested example in the tutorial, and added an extra couple lines to fix that issue:

from medcat.vocab import Vocab
from medcat.cdb import CDB
from medcat.cat import CAT
from medcat.meta_cat import MetaCAT

unzip = '<path>/mc_modelpack_snomed_int_16_mar_2022_25be3857ba34bdd5/'
# Load the vocab model you downloaded
vocab = Vocab.load(unzip+'vocab.dat')
# Load the cdb model you downloaded
cdb = CDB.load(unzip+'cdb.dat')

#needed to add these two lines
cdb.config.linking.filters.cuis = set()
cdb.config.general.spacy_model = unzip+'spacy_model'

# Download the mc_status model from the models section below and unzip it
mc_status = MetaCAT.load(unzip+'meta_Status/')
cat = CAT(cdb=cdb, config=cdb.config, vocab=vocab, meta_cats=[mc_status])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants