Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
651d503
CU-8699nk284: Update requirements to v2
mart-r Jul 4, 2025
d8caef7
CU-8699nk284: Add DeID requirement
mart-r Jul 4, 2025
238502a
CU-8699nk284: Update code to be in line with v2
mart-r Jul 4, 2025
f18b216
CU-8699nk284: Update README with v2 link
mart-r Jul 4, 2025
35fe531
CU-8699nk284: Update README with v2 link (2nd place)
mart-r Jul 4, 2025
87893fe
CU-8699nk284: Fix examples tutorial link
mart-r Jul 4, 2025
4b2d5f2
CU-8699nk284: Update model card stuff for v2 compatibility
mart-r Jul 4, 2025
0f9cfc2
CU-8699nk284: Avoid running docker hub push on pull requets
mart-r Jul 4, 2025
3740d4c
CU-8699nk284: Fix typo in import path
mart-r Jul 4, 2025
d3b0f9c
CU-8699nk284: Fix further typo
mart-r Jul 4, 2025
3689b4d
CU-8699nk284: Fix access path for MetaCAT config categery name
mart-r Jul 4, 2025
b6434df
CU-8699nk284: Update to latest medcat v2 release
mart-r Jul 4, 2025
273ce6a
CU-8699nk284: Bump supported version to latest (to fix legacy CDB con…
mart-r Jul 4, 2025
289923d
U-8699nk284: Fix some config access
mart-r Jul 4, 2025
0b60b34
U-8699nk284: Fix some more config access
mart-r Jul 4, 2025
4772ac2
Merge branch 'main' into CU-8699nk284-update-service-to-v2
mart-r Jul 4, 2025
f3666c7
CU-8699nk284: Bump dependency to latest
mart-r Jul 4, 2025
2ac04e4
CU-8699nk284: Update to latest v2 version
mart-r Jul 4, 2025
1da7ebf
Merge branch 'main' into CU-8699nk284-update-service-to-v2
mart-r Jul 4, 2025
c80ca15
CU-8699nk284: Bump to latest medcat version again (this time finally,…
mart-r Jul 4, 2025
cb9d356
CU-8699nk284: Fix config access
mart-r Jul 4, 2025
0370641
CU-8699nk284: Mock CDB load during test to use en_core_web_md instead…
mart-r Jul 4, 2025
2e44eec
CU-8699nk284: [TEMP] Add debug output
mart-r Jul 4, 2025
63081e5
CU-8699nk284: Add client test within mocked / changed spacy model con…
mart-r Jul 4, 2025
14798c4
CU-8699nk284: [TEMP] Add debug (more) output
mart-r Jul 4, 2025
d9becbd
CU-8699nk284: Mock a different method and more generally during testing
mart-r Jul 7, 2025
dc0df31
CU-8699nk284: Fix import during testing
mart-r Jul 7, 2025
28539c3
Merge branch 'main' into CU-8699nk284-update-service-to-v2
mart-r Jul 7, 2025
015205d
CU-8699nk284: Remove (most of) the debug output
mart-r Jul 7, 2025
49af0dc
CU-8699nk284: Bump requirements to latest
mart-r Jul 7, 2025
9130864
CU-8699nk284: Update ultiprocessing method to v2
mart-r Jul 7, 2025
a62f92f
CU-8699nk284: Fix keyword argument name
mart-r Jul 7, 2025
05b02a5
CU-8699nk284: Update multiprocessing
mart-r Jul 7, 2025
34e8292
Merge branch 'main' into CU-8699nk284-update-service-to-v2
mart-r Jul 7, 2025
bf677cc
CU-8699nk284: Update to latest MedCAT version
mart-r Jul 7, 2025
edf108f
Merge branch 'main' into CU-8699nk284-update-service-to-v2
mart-r Jul 8, 2025
ea9834c
CU-8699nk284: Update to latest requirements
mart-r Jul 8, 2025
43a9e28
CU-8699nk284: Use newer python in workflow
mart-r Jul 8, 2025
f6f6528
Merge branch 'main' into CU-8699nk284-update-service-to-v2
mart-r Jul 8, 2025
0ce5060
CU-8699nk284: Bump requirements to latest
mart-r Jul 8, 2025
20f9601
Merge branch 'main' into CU-8699nk284-update-service-to-v2
mart-r Jul 9, 2025
e792b20
CU-8699nk284: Bump requirements to latest
mart-r Jul 9, 2025
d014bb9
Revert "CU-8699nk284: Avoid running docker hub push on pull requets"
mart-r Jul 9, 2025
803d417
CU-8699nk284: Fix Dockerfile for GitHub URL-based installs
mart-r Jul 9, 2025
04919d9
CU-8699nk284: Fix hash attribute path
mart-r Jul 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/medcat-service_run-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ jobs:
- name: Install Python 3
uses: actions/setup-python@v5
with:
python-version: 3.9
python-version: 3.11
cache: 'pip' # caching pip dependencies

- name: Install dependencies
Expand Down
3 changes: 3 additions & 0 deletions medcat-service/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ ENV CRYPTOGRAPHY_DONT_BUILD_RUST=1
WORKDIR /cat
COPY ./requirements.txt /cat

# NOTE: need git for URL based installs
RUN apt-get update && apt-get install -y git
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a later improvement, I guess we can switch to the multi stage builds like you made for other things. This at least is highlighting the build dependency vs runtime anyway. Though I get here it's going to change like 0.01% of the total image size...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in the long (or medium) run, we should be able to remove this. Once we're installing based on PyPI instead of a GH link, we don't need git anymore.


# Install Python dependencies
ARG USE_CPU_TORCH=true
# NOTE: Allow building without GPU so as to lower image size (GPU is disabled by default)
Expand Down
4 changes: 2 additions & 2 deletions medcat-service/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Introduction

This project implements the [MedCAT](https://github.com/CogStack/MedCAT/) NLP application as a service behind a REST API. The general idea is to be able send the text to MedCAT NLP service and receive back the annotations. The REST API is built using [Flask](https://flask.palletsprojects.com/).
This project implements the [MedCAT](https://github.com/CogStack/cogstack-nlp/blob/main/medcat-v2/) NLP application as a service behind a REST API. The general idea is to be able send the text to MedCAT NLP service and receive back the annotations. The REST API is built using [Flask](https://flask.palletsprojects.com/).

Git Branches:
- devel: development branch, latest updates and features, might be unstable.
Expand Down Expand Up @@ -327,4 +327,4 @@ The main settings that can be used to improve the performance when querying larg
## MedCAT library
MedCAT parameters are defined in selected `envs/env_medcat*` file.

For details on available MedCAT parameters please refer to [the official GitHub repository](https://github.com/CogStack/MedCAT/).
For details on available MedCAT parameters please refer to [the official GitHub repository](https://github.com/CogStack/cogstack-nlp/blob/main/medcat-v2/).
52 changes: 30 additions & 22 deletions medcat-service/medcat_service/nlp_processor/medcat_processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,9 @@
from medcat.cat import CAT
from medcat.cdb import CDB
from medcat.config import Config
from medcat.meta_cat import MetaCAT
from medcat.utils.ner.deid import DeIdModel
from medcat.config.config_meta_cat import ConfigMetaCAT
from medcat.components.addons.meta_cat import MetaCATAddon
from medcat.components.ner.trf.deid import DeIdModel
from medcat.vocab import Vocab


Expand Down Expand Up @@ -188,7 +189,7 @@ def process_content_bulk(self, content):
# use generators both to provide input documents and to provide resulting annotations
# to avoid too many mem-copies
invalid_doc_ids = []
ann_res = []
ann_res = {}

start_time_ns = time.time_ns()

Expand All @@ -197,11 +198,14 @@ def process_content_bulk(self, content):
ann_res = self.cat.deid_multi_texts(MedCatProcessor._generate_input_doc(content, invalid_doc_ids),
redact=self.DEID_REDACT)
else:
ann_res = self.cat.multiprocessing_batch_char_size(
MedCatProcessor._generate_input_doc(content, invalid_doc_ids), nproc=self.bulk_nproc)

text_input = MedCatProcessor._generate_input_doc(content, invalid_doc_ids)
ann_res = {
ann_id: res for ann_id, res in
self.cat.get_entities_multi_texts(
text_input, n_process=self.bulk_nproc)
}
except Exception as e:
self.log.error(repr(e))
self.log.error("Unable to process data", exc_info=e)

additional_info = {"elapsed_time": str((time.time_ns() - start_time_ns) / 10e8)}

Expand Down Expand Up @@ -239,11 +243,12 @@ def _populate_model_card_info(self, config: Config):
Args:
config (Config): MedCAT configuration object.
"""
self.model_card_info["ontologies"] = config.version.ontology \
if (isinstance(config.version.ontology, list)) else str(config.version.ontology)
self.model_card_info["meta_cat_model_names"] = [i["Category Name"] for i in config.version.meta_cats] \
if (isinstance(config.version.meta_cats, list)) else str(config.version.meta_cats)
self.model_card_info["model_last_modified_on"] = str(config.version.last_modified)
self.model_card_info["ontologies"] = config.meta.ontology \
if (isinstance(config.meta.ontology, list)) else str(config.meta.ontology)
self.model_card_info["meta_cat_model_names"] = [
cnf.general.category_name for cnf in config.components.addons
if (isinstance(cnf, ConfigMetaCAT))]
self.model_card_info["model_last_modified_on"] = str(config.meta.last_saved)

# helper MedCAT methods
#
Expand Down Expand Up @@ -281,7 +286,7 @@ def _create_cat(self):
cat.cdb.filter_by_cui(cuis_to_keep)

if self.app_model.lower() in ["", "unknown", "medmen"]:
self.app_model = cat.config.version.id
self.app_model = cat.config.meta.hash

self._populate_model_card_info(cat.config)

Expand All @@ -305,13 +310,13 @@ def _create_cat(self):
spacy_model = os.getenv("SPACY_MODEL", "")

if spacy_model != "":
cdb.config.general["spacy_model"] = spacy_model
cdb.config.general.nlp.modelname = spacy_model
else:
logging.warning("SPACY_MODEL environment var not set" +
", attempting to load the spacy model found within the CDB : "
+ cdb.config.general["spacy_model"])
+ cdb.config.general.nlp.modelname)

if cdb.config.general["spacy_model"] == "":
if cdb.config.general.nlp.modelname == "":
raise ValueError("No SPACY_MODEL env var declared, the CDB loaded does not have a\
spacy_model set in the config variable! \
To solve this declare the SPACY_MODEL in the env_medcat file.")
Expand All @@ -330,18 +335,21 @@ def _create_cat(self):
if os.getenv("APP_MODEL_META_PATH_LIST", None) is not None:
self.log.debug("Loading META annotations ...")
for model_path in os.getenv("APP_MODEL_META_PATH_LIST").split(":"):
m = MetaCAT.load(model_path)
m = MetaCATAddon.deserialise_from(model_path)
meta_models.append(m)

if cat:
meta_models.extend(cat._meta_cats)
# if cat:
# meta_models.extend(cat._meta_cats)

if self.app_model.lower() in [None, "unknown"]:
self.app_model = cdb.config.version.id
self.app_model = cdb.config.meta.hash

config.general["log_level"] = os.getenv("LOG_LEVEL", logging.INFO)
config.general.log_level = os.getenv("LOG_LEVEL", logging.INFO)

cat = CAT(cdb=cdb, config=config, vocab=vocab, meta_cats=meta_models)
cat = CAT(cdb=cdb, config=config, vocab=vocab)
# add MetaCATs
for mc in meta_models:
cat.add_addon(mc)

self._populate_model_card_info(cat.config)

Expand Down
2 changes: 1 addition & 1 deletion medcat-service/models/examples/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## [example-medcat-v1-model-pack][(models/examples/example-medcat-v1-model-pack.zip)
- This model pack is built by running the MedCAT V1 Tutorial Part 3.1.
- https://github.com/CogStack/MedCATtutorials/blob/5a07e4d77da404631cc16b47d3f1c6bd028de396/notebooks/introductory/Part_3_1_Building_a_Concept_Database_and_Vocabulary.ipynb
- https://github.com/CogStack/cogstack-nlp/blob/main/medcat-v1-tutorials/notebooks/introductory/Part_3_1_Building_a_Concept_Database_and_Vocabulary.ipynb

It isn't a trained model, but has the concepts "Kidney Failure" and "Failure of Kidneys" built in

4 changes: 2 additions & 2 deletions medcat-service/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ setuptools==78.1.1
simplejson==3.19.3
werkzeug==3.1.3
setuptools-rust==1.11.0
medcat==1.16.0
medcat[meta-cat,spacy,deid] @ git+https://github.com/CogStack/cogstack-nlp.git@refs/tags/medcat/v0.13.5#subdirectory=medcat-v2
# pinned because of issues with de-id models and past models (it will not do any de-id)
transformers>=4.34.0,<5.0.0
requests==2.32.4
requests==2.32.4
Loading