chore(medcat-trainer): CU-869a4br6j Create a copy of the v1 medcat-trainer in the v1 folder #97

alhendrickson · 2025-08-13T14:44:23Z

…ainer in the /v1 folder

v1/medcat-trainer/webapp/frontend/src/components/common/ClinicalText.vue

+    formattedText () {
+      if (this.loading || !this.text || !this.ents) { return '' }
+      if (this.ents.length === 0) {
+        let text = this.text.replace('&', '&amp').replace('<', '&gt').replace('>', '&gt')


To fix the problem, we need to ensure that all occurrences of the meta-characters (&, <, >) are replaced, not just the first. The best way to do this is to use regular expressions with the global (g) flag in the replace method. Specifically, replace replace('&', '&amp') with replace(/&/g, '&'), and similarly for < and >. This change should be made on line 76, within the formattedText computed property. No new imports or methods are needed, as this is standard JavaScript functionality.

v1/medcat-trainer/webapp/frontend/src/components/common/ClinicalText.vue

+    formattedText () {
+      if (this.loading || !this.text || !this.ents) { return '' }
+      if (this.ents.length === 0) {
+        let text = this.text.replace('&', '&amp').replace('<', '&gt').replace('>', '&gt')


To fix the problem, we should ensure that all occurrences of the special characters are replaced, not just the first. The best way to do this is to use regular expressions with the global (g) flag in the replace method. Specifically, we should replace all & with &, all < with <, and all > with >. The order of replacements is important: & should be replaced first to avoid double-escaping the ampersands introduced by the other replacements. The fix should be applied to line 76 in the formattedText computed property in v1/medcat-trainer/webapp/frontend/src/components/common/ClinicalText.vue. No new imports are needed, as this is standard JavaScript functionality.

v1/medcat-trainer/webapp/frontend/src/components/common/ClinicalText.vue

+    formattedText () {
+      if (this.loading || !this.text || !this.ents) { return '' }
+      if (this.ents.length === 0) {
+        let text = this.text.replace('&', '&amp').replace('<', '&gt').replace('>', '&gt')


To fix the problem, all occurrences of the special characters should be replaced, not just the first. The best way to do this is to use regular expressions with the global (g) flag in the replace calls. Additionally, the current code incorrectly escapes < as &gt (should be &lt) and both < and > are replaced with &gt, which is not correct. The correct HTML entities are & for &, < for <, and > for >. The fix should use the correct entities and replace all occurrences. This can be done directly in the code shown, within the formattedText computed property in ClinicalText.vue.

No new imports are needed, as this is standard JavaScript functionality.

v1/medcat-trainer/webapp/api/api/data_utils.py

+        if len(proj['cuis']) > 1000:
+            # store large CUI lists in a json file.
+            cuis_file_name = MEDIA_ROOT + '/' + re.sub('/|\.', '_', p.name + '_cuis_file') + '.json'
+            json.dump(proj["cuis"].split(','), open(cuis_file_name, 'w'))


To fix the problem, we need to ensure that any file path constructed from user input is safely contained within the intended directory (MEDIA_ROOT). The best way to do this is to:

Use os.path.join to construct the path, rather than string concatenation.

Normalize the resulting path using os.path.normpath or os.path.realpath.

Check that the normalized path starts with MEDIA_ROOT (after normalizing both).

Optionally, use werkzeug.utils.secure_filename to further sanitize the filename portion.

The changes should be made in v1/medcat-trainer/webapp/api/api/data_utils.py:

Replace the string concatenation for cuis_file_name and ds_file_name with a safe join and normalization.

Add a check that the resulting path is within MEDIA_ROOT.

Raise an exception or handle the error if the check fails.

We will need to import os (if not already imported) and, optionally, werkzeug.utils.secure_filename for robust filename sanitization.

v1/medcat-trainer/webapp/api/api/metrics.py

+    metrics = ProjectMetrics(project_data, cat)
+    report = metrics.generate_report(meta_ann=loaded_model_pack)
+    report_file_path = f'{MEDIA_ROOT}/{report_name}.json'
+    json.dump(report, open(report_file_path, 'w'))


To fix the problem, we need to ensure that the constructed file path cannot be manipulated by user input to escape the intended directory (MEDIA_ROOT). The best way to do this is to sanitize and validate report_name before using it in the file path. We can use werkzeug.utils.secure_filename to ensure the filename is safe, and then join and normalize the path, checking that it remains within MEDIA_ROOT. This prevents path traversal and other attacks. We should also ensure that the filename does not contain directory separators or other problematic characters.

Steps:

Import secure_filename from werkzeug.utils.

Before constructing report_file_path, sanitize report_name using secure_filename.

Construct the path using os.path.join.

Normalize the path using os.path.normpath.

Check that the resulting path starts with MEDIA_ROOT.

If the check fails, raise an exception or handle the error appropriately.

All changes are to be made in v1/medcat-trainer/webapp/api/api/metrics.py.