CogStack · vladd-bit · Sep 12, 2025 · Sep 12, 2025 · Sep 12, 2025 · Sep 12, 2025
diff --git a/medcat-service/.gitignore b/medcat-service/.gitignore
@@ -1,10 +1,13 @@
+# IDE envs and system folders
 .DS_Store
+**/.DS_Store
 .idea
 .vscode
+.venv
+.ruff_cache
+**__pycache__
 venv
 venv-test
-tmp_cat.log
-__pycache__
 .mypy_cache
 *.pyc
 .pyc
@@ -21,7 +24,11 @@ models/examples/example-deid-model-pack
 tmp/*
 
 # env folder
-env/*
+.env
+*.env
+**/.env
+**/*.env
 
 # log files
+tmp_cat.log
 medcat.log
diff --git a/medcat-service/README.md b/medcat-service/README.md
@@ -4,17 +4,17 @@ A REST API wrapper for [MedCAT](https://github.com/CogStack/cogstack-nlp/blob/ma
 
 Feel free to ask questions on the github issue tracker or on our [discourse website](https://discourse.cogstack.org) which is frequently used by our development team!
 
-# API specification
+## API specification
 
 The API definition follows the one defined in [CogStack GATE NLP Service](https://github.com/CogStack/gate-nlp-service/). Currently, there are 3 endpoints defined, that consume and return data in JSON format:
+
 - *GET* `/api/info` - displays general information about the MedCAT application,
 - *POST* `/api/process` - processes the provided documents and returns back the annotations,
 - *POST* `/api/process_bulk` - processes the provided list of documents and returns back the annotations.
 
 The full specification is available is [OpenAPI](https://github.com/CogStack/gate-nlp-service/tree/devel/api-specs) specification.
 
-
-# Running the application
+## Running the application
 
 The application can be run either as a standalone Python application or as running inside the Docker container (recommended).
 
@@ -23,12 +23,13 @@ The application can be run either as a standalone Python application or as runni
 Please note that prior running the application a number of requirements need to installed (see: `requirements.txt`).
 
 There are two scripts provided implementing starting the application:
+
 - `start_service_debug.sh` - starts the application in the development mode
 - `start_service_production.sh` - starts the application in 'production' mode and using `gunicorn` server.
 
 ## Running in a Docker container
 
-The recommended way to run the application is to use the provided Docker image. The Docker image can be either downloaded from the Docker Hub (`cogstacksystems/medcat-service:latest`) or build manually using the provided `Dockerfile`. 
+The recommended way to run the application is to use the provided Docker image. The Docker image can be either downloaded from the Docker Hub (`cogstacksystems/medcat-service:latest`) or build manually using the provided `Dockerfile`.
 Please note that by default the built docker image will run the Flask application in 'production' mode running `start_service_production.sh` script.
 
 To build the Docker image manually:
@@ -38,36 +39,40 @@ To build the Docker image manually:
 To run the container using the built image:
 
 ```
-docker run -it -p 5000:5000 \
-  --env-file=envs/env_app --env-file=envs/env_medcat \
-  -v <models-local-dir>:/cat/models:ro \
-  cogstacksystems/medcat-service:latest
+  docker run -it -p 5000:5000 \
+    --env-file=envs/env_app --env-file=envs/env_medcat \
+    -v <models-local-dir>:/cat/models:ro \
+    cogstacksystems/medcat-service:latest
 ```
 
-By default the MedCAT service will be running on port `5000`. MedCAT models will be mounted from local directory `<models-local-dir>` into the container at `/cat/models`. 
+By default the MedCAT service will be running on port `5000`. MedCAT models will be mounted from local directory `<models-local-dir>` into the container at `/cat/models`.
 
 ### GPU support
 
 If you have a gpu and wish to use it, please change the `docker/docker-compose.yml` file, use the `cogstacksystems/medcat-service-gpu:latest` image or change the `build:` directive to build `../Dockerfile_gpu`.
 
 ### <span style="color:red">IMPORTANT !</span>
-If you wish to run this docker service manually, use the docker/docker-compose.yml file, execute `docker compose up -d` whilst in the `docker` folder. 
+
+If you wish to run this docker service manually, use the docker/docker-compose.yml file, execute `docker compose up -d` whilst in the `docker` folder.
 
 Alternatively, an example script `./docker/run_example_medmen.sh` was provided to run the Docker container with MedCAT service. The script will download an example model (using the `./scripts/download_medmen.sh` script),it will use an example environment configuration, then it will build and start the service using the provided Docker Compose file, the service <b><span style="color:red">WONT WORK</span></b> without the model being present.
 
 All models should be mounted from the `models/` folder.
 
 <br>
 
-### Manual docker start-up steps:
+### Manual docker start-up steps
+
 ```
   1. cd ./models/
   2. bash ./download_medmen.sh
   3. cd ../docker/
   4. docker compose up -d
   DONE!
 ```
+
 Or, if you wish to use the above mentioned script ( the sample model is downloaded via script, you don't need to do anything):
+
 ```
   1. cd ./docker/
   2. bash ./run_example_medmen.sh
@@ -77,6 +82,7 @@ Or, if you wish to use the above mentioned script ( the sample model is download
 # API Example use
 
 Assuming that the application is running on the `localhost` with the API exposed on port `5000`, one can run:
+
 ```
 curl -XPOST http://localhost:5000/api/process \
   -H 'Content-Type: application/json' \
@@ -100,9 +106,9 @@ and the received result:
 
 Additional DE-ID query sample (make sure you have a de-id model loaded):
 
-curl -XPOST http://localhost:5555/api/process \
+curl -XPOST <http://localhost:5555/api/process> \
   -H 'Content-Type: application/json' \
-  -d '{"content":{"text":"Patient Information: Full Name: John Michael Doe \n Gender: Male \n Date of Birth: January 15, 1975 (Age: 49) \n Patient ID: 567890123 \n Address: 1234 Elm Street, Springfield, IL 62701 \n Phone Number: (555) 123-4567 \n Email: johnmdoe@example.com \n Emergency Contact: Jane Doe (Wife) \n Phone: (555) 987-6543 \n Relationship: Spouse"}}'
+  -d '{"content":{"text":"Patient Information: Full Name: John Michael Doe \n Gender: Male \n Date of Birth: January 15, 1975 (Age: 49) \n Patient ID: 567890123 \n Address: 1234 Elm Street, Springfield, IL 62701 \n Phone Number: (555) 123-4567 \n Email: <johnmdoe@example.com> \n Emergency Contact: Jane Doe (Wife) \n Phone: (555) 987-6543 \n Relationship: Spouse"}}'
 
 Make sure you have the following option enabled in `envs/env_medcat` , `DEID_MODE=True`.
 
@@ -114,7 +120,7 @@ curl -XPOST http://localhost:5000/api/process_bulk \
  -d '{"content": [{"text":"The patient was diagnosed with leukemia."}, {"text": "The patient was diagnosed with cancer."}] }'
 ```
 
-example bulk result : 
+example bulk result :
 
 ```
 {
@@ -269,48 +275,56 @@ As the changes from MedCAT intoduced dictionary annotation/entity output.
 
 The mode in which annotation entities should be outputted in the JSON response,
    by default this was outputted as a "list" of dicts in older versions, so the output would be :
+
    ```
     {"annotations": [{"id": "0", "cui" : "C1X..", ..}, {"id":"1", "cui": "...."}]}
    ```
+
    newer versions of MedCAT (1.2+) output entities as a dict, where the id of the entity is a key and the rest of the data is a value, so for "dict",
    the output is
+
    ```
     {"annotations": [{"0": {"cui": "C0027361", "id": 0,.....}, "1": {"cui": "C001111", "id": 1......}}]}
    ```
+
 This setting can be configured in the ```./env/env_medcat``` file, using the ```ANNOTATIONS_ENTITY_OUTPUT_MODE``` variable.
 By default, the output of these entities is set to respect the output of the MedCAT package, hence the latter will be used. Please change the above mentioned env variable and make sure your CogStack-Nifi annotation script is adapted accordingly.
 <br>
 Please note that the returned NLP annotations will depend on the underlying model used. For evaluation, we can only provide a very basic model trained on [MedMentions](https://github.com/chanzuckerberg/MedMentions). Models utilising [SNOMED CT](https://www.england.nhs.uk/digitaltechnology/digital-primary-care/snomed-ct/) or [UMLS](https://www.nlm.nih.gov/research/umls/index.html) may require applying for licenses from the copyright holders.
 <br>
 <br>
 
-# Configuration
+## Configuration
 
 In the current implementation, configuration for both MedCAT Service application and MedCAT NLP library is based on environment variables. These will be provided usually in two files in `env` directory:
+
 - `env_app` - configuration of MedCAT Service app,
 - `env_medcat` - configuration of MedCAT library.
 
 Both files allow tailoring MedCAT for specific use-cases. When running MedCAT Service, these variables need to be loaded into the current working environment.
 
 ## spaCy models
+
 When using MedCAT for a different language than English, it can be useful to use a different spaCy model. A spaCy model can be included in the MedCAT model pack, but when not using this functionality, it can be useful to install models in the Docker image. This can be done by setting a build-time variable. See the `SPACY_MODELS` variable in [Dockerfile](Dockerfile) for default value and usage.
 
-## MedCAT Service
+## Service Environment vars
+
 MedCAT Service application are defined in `envs/env_app` file.
 
 The following environment variables are available for tailoring the MedCAT Service `gunicorn` server:
+
 - `SERVER_HOST` - specifies the host address (default: `0.0.0.0`),
 - `SERVER_PORT` - the port number used (default: `5000`),
 - `SERVER_WORKERS` - the number of workers serving the Flask app working in parallel (default: `1` ; only used in production server).
 - `SERVER_WORKER_TIMEOUT` - the max timeout (in sec) for receiving response from worker (default: `300` ; only used with production server).
 
 The following environment variables are available for tailoring the MedCAT Service wrapper:
-- `APP_MODEL_NAME` - an informative name of the model used by MedCAT (optional), 
+
+- `APP_MODEL_NAME` - an informative name of the model used by MedCAT (optional),
 - `APP_MODEL_CDB_PATH` - the path to the model's concept database,
 - `APP_MODEL_VOCAB_PATH` - the path to the model's vocabulary,
 - `APP_MODEL_META_PATH_LIST` - the list of paths to meta-annotation models, each separated by `:` character (optional),
 - `APP_BULK_NPROC` - the number of threads used in bulk processing (default: `8`),
-- `APP_TRAINING_MODE` - whether to run the application with MedCAT in training mode (default: `False`).
 - `APP_MEDCAT_MODEL_PACK` -  MedCAT Model Pack path, if this parameter has a value IT WILL BE LOADED FIRST OVER EVERYTHING ELSE (CDB, Vocab, MetaCATs, etc.) declared above.
 
 ## Performance Tuning
@@ -319,6 +333,7 @@ Theres a range of factors that might impact the performance of this service, the
 The main settings that can be used to improve the performance when querying large amounts of documents are : `SERVER_WORKERS` (number of flask web workers that chan handle parallel requests) and `APP_BULK_NPROC` (threads for annotation processing).
 
 ## MedCAT library
-MedCAT parameters are defined in selected `envs/env_medcat*`  file. 
+
+MedCAT parameters are defined in selected `envs/medcat*`  file.
 
 For details on available MedCAT parameters please refer to [the official GitHub repository](https://github.com/CogStack/cogstack-nlp/blob/main/medcat-v2/).
diff --git a/medcat-service/env/app.env b/medcat-service/env/app.env
@@ -13,6 +13,7 @@ APP_MODEL_VOCAB_PATH=/cat/models/medmen/vocab.dat
 
 # optionally, can include multiple models for meta tasks, separated using ':'
 APP_MODEL_META_PATH_LIST=/cat/models/medmen/Status
+APP_MODEL_REL_PATH_LIST=
 
 # MedCAT Model Pack path
 # IMPORTANT: if this parameter has value IT WILL BE LOADED FIRST OVER EVERYTHING ELSE (CDB, Vocab, MetaCATs, etc.) declared above.
@@ -39,4 +40,4 @@ APP_TORCH_THREADS=8
 
 # GPU SETTING
 # CAUTION, use only if you are using the GPU docker image.
-APP_CUDA_DEVICE_COUNT=1
+APP_CUDA_DEVICE_COUNT=1
diff --git a/medcat-service/env/app_deid.env b/medcat-service/env/app_deid.env
@@ -13,11 +13,12 @@ APP_MODEL_VOCAB_PATH=/cat/models/medmen/vocab.dat
 
 # optionally, can include multiple models for meta tasks, separated using ':'
 APP_MODEL_META_PATH_LIST=/cat/models/medmen/Status
+APP_MODEL_REL_PATH_LIST=
 
 # MedCAT Model Pack path
 # IMPORTANT: if this parameter has value IT WILL BE LOADED FIRST OVER EVERYTHING ELSE (CDB, Vocab, MetaCATs, etc.) declared above.
 # Respect the same paths as above : /cat/models/model_pack_name.zip
-APP_MEDCAT_MODEL_PACK=/cat/models/de_id_base.zip
+APP_MEDCAT_MODEL_PACK=
 
 # optionally, an filter the reported concepts by CUIs
 # APP_MODEL_CUI_FILTER_PATH=/cat/models/cui_filter.txt
@@ -39,4 +40,4 @@ APP_TORCH_THREADS=8
 
 # GPU SETTING
 # CAUTION, use only if you are using the GPU docker image.
-APP_CUDA_DEVICE_COUNT=1
+APP_CUDA_DEVICE_COUNT=1
diff --git a/medcat-service/env/general.env b/medcat-service/env/general.env
@@ -5,4 +5,4 @@
 # remove if this causes issues on any other platform
 # possible values: amd64, arm64
 CPU_ARCHITECTURE=amd64
-DOCKER_DEFAULT_PLATFORM=linux/${CPU_ARCHITECTURE:-amd64}
+DOCKER_DEFAULT_PLATFORM=linux/${CPU_ARCHITECTURE:-amd64}
diff --git a/medcat-service/env/medcat.env b/medcat-service/env/medcat.env
@@ -1,25 +1,10 @@
-TYPE=NOT_UMLS
-
 # IMPORTANT : log level set 
 # CRITICAL - 50, ERROR - 40, WARNING - 30, INFO - 20, DEBUG - 10, NOTSET - 0
-LOG_LEVEL=40
-
-NESTED_ENTITIES=False
-
-CNTX_SPAN=9
-CNTX_SPAN_SHORT=3
-MIN_CUI_COUNT=30000
-MIN_CUI_COUNT_STRICT=-1
-MIN_ACC=0.2
-MIN_ACC_TH=0.2
-
-LEARNING_RATE=0.1
-ANNEAL=False
-KEEP_PUNCT=":|."
+MEDCAT_LOG_LEVEL=40
 
 # can be left empty if you want to use the CDB spacy model, otherwise, possible values are : en_core_sci_lg, en_core_sci_md, etc...
 # please note that only SpaCy models that are installed on the docker container are usable (take a look at the .Dockerfile and install any other custom models you may want to use)
-SPACY_MODEL=
+MEDCAT_SPACY_MODEL=
 
 # IMPORTANT:
 #   Mode in which annotation entities should be outputted in the JSON response,
@@ -30,7 +15,7 @@ SPACY_MODEL=
 #   Be mindful of this option as it can affect other services that rely directly on the responses of the service 
 #    (the NiFi groovy scripts and annotation ingester are two such services that process the output, and so they might require further customisation)
 # POSSIBLE VALUES: [list, dict], if left empty then "dict" is the default.
-ANNOTATIONS_ENTITY_OUTPUT_MODE=dict
+MEDCAT_ANNOTATIONS_ENTITY_OUTPUT_MODE=dict
 
-DEID_MODE=False
-DEID_REDACT=False
+MEDCAT_DEID_MODE=False
+MEDCAT_DEID_REDACT=False
diff --git a/medcat-service/env/medcat_deid.env b/medcat-service/env/medcat_deid.env
@@ -1,25 +1,10 @@
-TYPE=NOT_UMLS
-
 # IMPORTANT : log level set 
 # CRITICAL - 50, ERROR - 40, WARNING - 30, INFO - 20, DEBUG - 10, NOTSET - 0
-LOG_LEVEL=40
-
-NESTED_ENTITIES=False
-
-CNTX_SPAN=9
-CNTX_SPAN_SHORT=3
-MIN_CUI_COUNT=30000
-MIN_CUI_COUNT_STRICT=-1
-MIN_ACC=0.2
-MIN_ACC_TH=0.2
-
-LEARNING_RATE=0.1
-ANNEAL=False
-KEEP_PUNCT=:|.
+MEDCAT_LOG_LEVEL=40
 
 # can be left empty if you want to use the CDB spacy model, otherwise, possible values are : en_core_sci_lg, en_core_sci_md, etc...
 # please note that only SpaCy models that are installed on the docker container are usable (take a look at the .Dockerfile and install any other custom models you may want to use)
-SPACY_MODEL=
+MEDCAT_SPACY_MODEL=
 
 # IMPORTANT:
 #   Mode in which annotation entities should be outputted in the JSON response,
@@ -30,7 +15,7 @@ SPACY_MODEL=
 #   Be mindful of this option as it can affect other services that rely directly on the responses of the service 
 #    (the NiFi groovy scripts and annotation ingester are two such services that process the output, and so they might require further customisation)
 # POSSIBLE VALUES: [list, dict], if left empty then "dict" is the default.
-ANNOTATIONS_ENTITY_OUTPUT_MODE=dict
+MEDCAT_ANNOTATIONS_ENTITY_OUTPUT_MODE=dict
 
-DEID_MODE=True
-DEID_REDACT=True
+MEDCAT_DEID_MODE=True
+MEDCAT_DEID_REDACT=True
diff --git a/medcat-service/export_env_vars.sh b/medcat-service/export_env_vars.sh
@@ -1,4 +1,10 @@
-#!/bin/bash
+#!/usr/bin/env bash
+
+# Enable strict mode (without -e to avoid exit-on-error)
+set -uo pipefail
+
+
+echo "🔧 Running $(basename "${BASH_SOURCE[0]}")..."
 
 set -a
 
@@ -10,8 +16,20 @@ env_files=("env/general.env"
            )
 
 
-for env_file in ${env_files[@]}; do
-  source $env_file
+for env_file in "${env_files[@]}"; do
+  if [ -f "$env_file" ]; then
+    echo "✅ Sourcing $env_file"
+    # shellcheck disable=SC1090
+    source "$env_file"
+  else
+    echo "⚠️ Skipping missing env file: $env_file"
+  fi
 done
 
-set +a
+
+# Disable auto-export
+set +a
+
+# Restore safe defaults for interactive/dev shell
+set +u
+set +o pipefail