Skip to content

Conversation

@alhendrickson
Copy link
Collaborator

…ainer in the /v1 folder

formattedText () {
if (this.loading || !this.text || !this.ents) { return '' }
if (this.ents.length === 0) {
let text = this.text.replace('&', '&amp').replace('<', '&gt').replace('>', '&gt')

Check failure

Code scanning / CodeQL

Incomplete string escaping or encoding High

This replaces only the first occurrence of '>'.

Copilot Autofix

AI 3 months ago

To fix the problem, we need to ensure that all occurrences of the meta-characters (&, <, >) are replaced, not just the first. The best way to do this is to use regular expressions with the global (g) flag in the replace method. Specifically, replace replace('&', '&amp') with replace(/&/g, '&amp;'), and similarly for < and >. This change should be made on line 76, within the formattedText computed property. No new imports or methods are needed, as this is standard JavaScript functionality.

Suggested changeset 1
v1/medcat-trainer/webapp/frontend/src/components/common/ClinicalText.vue

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/v1/medcat-trainer/webapp/frontend/src/components/common/ClinicalText.vue b/v1/medcat-trainer/webapp/frontend/src/components/common/ClinicalText.vue
--- a/v1/medcat-trainer/webapp/frontend/src/components/common/ClinicalText.vue
+++ b/v1/medcat-trainer/webapp/frontend/src/components/common/ClinicalText.vue
@@ -73,7 +73,7 @@
     formattedText () {
       if (this.loading || !this.text || !this.ents) { return '' }
       if (this.ents.length === 0) {
-        let text = this.text.replace('&', '&amp').replace('<', '&gt').replace('>', '&gt')
+        let text = this.text.replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;')
         text = text === 'nan' ? '' : text
         return this.addAnnos ? `<div @contextmenu.prevent.stop="showCtxMenu($event)">${text}</div>` : `<div>${text}</div>`
       }
EOF
@@ -73,7 +73,7 @@
formattedText () {
if (this.loading || !this.text || !this.ents) { return '' }
if (this.ents.length === 0) {
let text = this.text.replace('&', '&amp').replace('<', '&gt').replace('>', '&gt')
let text = this.text.replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;')
text = text === 'nan' ? '' : text
return this.addAnnos ? `<div @contextmenu.prevent.stop="showCtxMenu($event)">${text}</div>` : `<div>${text}</div>`
}
Copilot is powered by AI and may make mistakes. Always verify output.
formattedText () {
if (this.loading || !this.text || !this.ents) { return '' }
if (this.ents.length === 0) {
let text = this.text.replace('&', '&amp').replace('<', '&gt').replace('>', '&gt')

Check failure

Code scanning / CodeQL

Incomplete string escaping or encoding High

This replaces only the first occurrence of '<'.

Copilot Autofix

AI 3 months ago

To fix the problem, we should ensure that all occurrences of the special characters are replaced, not just the first. The best way to do this is to use regular expressions with the global (g) flag in the replace method. Specifically, we should replace all & with &amp;, all < with &lt;, and all > with &gt;. The order of replacements is important: & should be replaced first to avoid double-escaping the ampersands introduced by the other replacements. The fix should be applied to line 76 in the formattedText computed property in v1/medcat-trainer/webapp/frontend/src/components/common/ClinicalText.vue. No new imports are needed, as this is standard JavaScript functionality.


Suggested changeset 1
v1/medcat-trainer/webapp/frontend/src/components/common/ClinicalText.vue

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/v1/medcat-trainer/webapp/frontend/src/components/common/ClinicalText.vue b/v1/medcat-trainer/webapp/frontend/src/components/common/ClinicalText.vue
--- a/v1/medcat-trainer/webapp/frontend/src/components/common/ClinicalText.vue
+++ b/v1/medcat-trainer/webapp/frontend/src/components/common/ClinicalText.vue
@@ -73,7 +73,7 @@
     formattedText () {
       if (this.loading || !this.text || !this.ents) { return '' }
       if (this.ents.length === 0) {
-        let text = this.text.replace('&', '&amp').replace('<', '&gt').replace('>', '&gt')
+        let text = this.text.replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;')
         text = text === 'nan' ? '' : text
         return this.addAnnos ? `<div @contextmenu.prevent.stop="showCtxMenu($event)">${text}</div>` : `<div>${text}</div>`
       }
EOF
@@ -73,7 +73,7 @@
formattedText () {
if (this.loading || !this.text || !this.ents) { return '' }
if (this.ents.length === 0) {
let text = this.text.replace('&', '&amp').replace('<', '&gt').replace('>', '&gt')
let text = this.text.replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;')
text = text === 'nan' ? '' : text
return this.addAnnos ? `<div @contextmenu.prevent.stop="showCtxMenu($event)">${text}</div>` : `<div>${text}</div>`
}
Copilot is powered by AI and may make mistakes. Always verify output.
formattedText () {
if (this.loading || !this.text || !this.ents) { return '' }
if (this.ents.length === 0) {
let text = this.text.replace('&', '&amp').replace('<', '&gt').replace('>', '&gt')

Check failure

Code scanning / CodeQL

Incomplete string escaping or encoding High

This replaces only the first occurrence of '&'.

Copilot Autofix

AI 3 months ago

To fix the problem, all occurrences of the special characters should be replaced, not just the first. The best way to do this is to use regular expressions with the global (g) flag in the replace calls. Additionally, the current code incorrectly escapes < as &gt (should be &lt) and both < and > are replaced with &gt, which is not correct. The correct HTML entities are &amp; for &, &lt; for <, and &gt; for >. The fix should use the correct entities and replace all occurrences. This can be done directly in the code shown, within the formattedText computed property in ClinicalText.vue.

No new imports are needed, as this is standard JavaScript functionality.


Suggested changeset 1
v1/medcat-trainer/webapp/frontend/src/components/common/ClinicalText.vue

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/v1/medcat-trainer/webapp/frontend/src/components/common/ClinicalText.vue b/v1/medcat-trainer/webapp/frontend/src/components/common/ClinicalText.vue
--- a/v1/medcat-trainer/webapp/frontend/src/components/common/ClinicalText.vue
+++ b/v1/medcat-trainer/webapp/frontend/src/components/common/ClinicalText.vue
@@ -73,7 +73,10 @@
     formattedText () {
       if (this.loading || !this.text || !this.ents) { return '' }
       if (this.ents.length === 0) {
-        let text = this.text.replace('&', '&amp').replace('<', '&gt').replace('>', '&gt')
+        let text = this.text
+          .replace(/&/g, '&amp;')
+          .replace(/</g, '&lt;')
+          .replace(/>/g, '&gt;')
         text = text === 'nan' ? '' : text
         return this.addAnnos ? `<div @contextmenu.prevent.stop="showCtxMenu($event)">${text}</div>` : `<div>${text}</div>`
       }
EOF
@@ -73,7 +73,10 @@
formattedText () {
if (this.loading || !this.text || !this.ents) { return '' }
if (this.ents.length === 0) {
let text = this.text.replace('&', '&amp').replace('<', '&gt').replace('>', '&gt')
let text = this.text
.replace(/&/g, '&amp;')
.replace(/</g, '&lt;')
.replace(/>/g, '&gt;')
text = text === 'nan' ? '' : text
return this.addAnnos ? `<div @contextmenu.prevent.stop="showCtxMenu($event)">${text}</div>` : `<div>${text}</div>`
}
Copilot is powered by AI and may make mistakes. Always verify output.
if len(proj['cuis']) > 1000:
# store large CUI lists in a json file.
cuis_file_name = MEDIA_ROOT + '/' + re.sub('/|\.', '_', p.name + '_cuis_file') + '.json'
json.dump(proj["cuis"].split(','), open(cuis_file_name, 'w'))

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.

Copilot Autofix

AI 3 months ago

To fix the problem, we need to ensure that any file path constructed from user input is safely contained within the intended directory (MEDIA_ROOT). The best way to do this is to:

  1. Use os.path.join to construct the path, rather than string concatenation.
  2. Normalize the resulting path using os.path.normpath or os.path.realpath.
  3. Check that the normalized path starts with MEDIA_ROOT (after normalizing both).
  4. Optionally, use werkzeug.utils.secure_filename to further sanitize the filename portion.

The changes should be made in v1/medcat-trainer/webapp/api/api/data_utils.py:

  • Replace the string concatenation for cuis_file_name and ds_file_name with a safe join and normalization.
  • Add a check that the resulting path is within MEDIA_ROOT.
  • Raise an exception or handle the error if the check fails.

We will need to import os (if not already imported) and, optionally, werkzeug.utils.secure_filename for robust filename sanitization.


Suggested changeset 1
v1/medcat-trainer/webapp/api/api/data_utils.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/v1/medcat-trainer/webapp/api/api/data_utils.py b/v1/medcat-trainer/webapp/api/api/data_utils.py
--- a/v1/medcat-trainer/webapp/api/api/data_utils.py
+++ b/v1/medcat-trainer/webapp/api/api/data_utils.py
@@ -1,6 +1,7 @@
 import json
 import logging
 import re
+import os
 from collections import defaultdict
 from datetime import datetime
 from typing import Dict
@@ -72,7 +73,11 @@
         p.name = proj['name'] + ' IMPORTED'
         if len(proj['cuis']) > 1000:
             # store large CUI lists in a json file.
-            cuis_file_name = MEDIA_ROOT + '/' + re.sub('/|\.', '_', p.name + '_cuis_file') + '.json'
+            # Sanitize filename and ensure path is within MEDIA_ROOT
+            safe_filename = re.sub(r'[^A-Za-z0-9_\-]', '_', p.name + '_cuis_file') + '.json'
+            cuis_file_name = os.path.normpath(os.path.join(MEDIA_ROOT, safe_filename))
+            if not os.path.commonpath([os.path.abspath(cuis_file_name), os.path.abspath(MEDIA_ROOT)]) == os.path.abspath(MEDIA_ROOT):
+                raise Exception("Attempted to write file outside of MEDIA_ROOT")
             json.dump(proj["cuis"].split(','), open(cuis_file_name, 'w'))
             p.cuis = ""
             p.cuis_file.name = cuis_file_name
@@ -94,7 +99,11 @@
             for rel in doc.get('relations', []):
                 rels.add(rel['relation'])
         # escape - filename
-        ds_file_name = MEDIA_ROOT + '/' + re.sub('/|\.', '_', p.name + '_dataset') + '.csv'
+        # Sanitize filename and ensure path is within MEDIA_ROOT
+        safe_ds_filename = re.sub(r'[^A-Za-z0-9_\-]', '_', p.name + '_dataset') + '.csv'
+        ds_file_name = os.path.normpath(os.path.join(MEDIA_ROOT, safe_ds_filename))
+        if not os.path.commonpath([os.path.abspath(ds_file_name), os.path.abspath(MEDIA_ROOT)]) == os.path.abspath(MEDIA_ROOT):
+            raise Exception("Attempted to write file outside of MEDIA_ROOT")
         names = [doc['name'] for doc in proj['documents']]
         if len(set(names)) != len(names):  # ensure names are unique for docs
             names = [f'{i} - {names[i]}' for i in range(len(names))]
EOF
@@ -1,6 +1,7 @@
import json
import logging
import re
import os
from collections import defaultdict
from datetime import datetime
from typing import Dict
@@ -72,7 +73,11 @@
p.name = proj['name'] + ' IMPORTED'
if len(proj['cuis']) > 1000:
# store large CUI lists in a json file.
cuis_file_name = MEDIA_ROOT + '/' + re.sub('/|\.', '_', p.name + '_cuis_file') + '.json'
# Sanitize filename and ensure path is within MEDIA_ROOT
safe_filename = re.sub(r'[^A-Za-z0-9_\-]', '_', p.name + '_cuis_file') + '.json'
cuis_file_name = os.path.normpath(os.path.join(MEDIA_ROOT, safe_filename))
if not os.path.commonpath([os.path.abspath(cuis_file_name), os.path.abspath(MEDIA_ROOT)]) == os.path.abspath(MEDIA_ROOT):
raise Exception("Attempted to write file outside of MEDIA_ROOT")
json.dump(proj["cuis"].split(','), open(cuis_file_name, 'w'))
p.cuis = ""
p.cuis_file.name = cuis_file_name
@@ -94,7 +99,11 @@
for rel in doc.get('relations', []):
rels.add(rel['relation'])
# escape - filename
ds_file_name = MEDIA_ROOT + '/' + re.sub('/|\.', '_', p.name + '_dataset') + '.csv'
# Sanitize filename and ensure path is within MEDIA_ROOT
safe_ds_filename = re.sub(r'[^A-Za-z0-9_\-]', '_', p.name + '_dataset') + '.csv'
ds_file_name = os.path.normpath(os.path.join(MEDIA_ROOT, safe_ds_filename))
if not os.path.commonpath([os.path.abspath(ds_file_name), os.path.abspath(MEDIA_ROOT)]) == os.path.abspath(MEDIA_ROOT):
raise Exception("Attempted to write file outside of MEDIA_ROOT")
names = [doc['name'] for doc in proj['documents']]
if len(set(names)) != len(names): # ensure names are unique for docs
names = [f'{i} - {names[i]}' for i in range(len(names))]
Copilot is powered by AI and may make mistakes. Always verify output.
metrics = ProjectMetrics(project_data, cat)
report = metrics.generate_report(meta_ann=loaded_model_pack)
report_file_path = f'{MEDIA_ROOT}/{report_name}.json'
json.dump(report, open(report_file_path, 'w'))

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.

Copilot Autofix

AI 3 months ago

To fix the problem, we need to ensure that the constructed file path cannot be manipulated by user input to escape the intended directory (MEDIA_ROOT). The best way to do this is to sanitize and validate report_name before using it in the file path. We can use werkzeug.utils.secure_filename to ensure the filename is safe, and then join and normalize the path, checking that it remains within MEDIA_ROOT. This prevents path traversal and other attacks. We should also ensure that the filename does not contain directory separators or other problematic characters.

Steps:

  • Import secure_filename from werkzeug.utils.
  • Before constructing report_file_path, sanitize report_name using secure_filename.
  • Construct the path using os.path.join.
  • Normalize the path using os.path.normpath.
  • Check that the resulting path starts with MEDIA_ROOT.
  • If the check fails, raise an exception or handle the error appropriately.

All changes are to be made in v1/medcat-trainer/webapp/api/api/metrics.py.


Suggested changeset 2
v1/medcat-trainer/webapp/api/api/metrics.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/v1/medcat-trainer/webapp/api/api/metrics.py b/v1/medcat-trainer/webapp/api/api/metrics.py
--- a/v1/medcat-trainer/webapp/api/api/metrics.py
+++ b/v1/medcat-trainer/webapp/api/api/metrics.py
@@ -2,6 +2,7 @@
 import logging
 import math
 import os
+from werkzeug.utils import secure_filename
 import warnings
 from collections import Counter
 from typing import List, Dict
@@ -56,7 +57,11 @@
     project_data = retrieve_project_data(projects)
     metrics = ProjectMetrics(project_data, cat)
     report = metrics.generate_report(meta_ann=loaded_model_pack)
-    report_file_path = f'{MEDIA_ROOT}/{report_name}.json'
+    safe_report_name = secure_filename(report_name)
+    report_file_path = os.path.normpath(os.path.join(MEDIA_ROOT, f"{safe_report_name}.json"))
+    # Ensure the report_file_path is within MEDIA_ROOT
+    if not report_file_path.startswith(os.path.abspath(MEDIA_ROOT)):
+        raise Exception("Invalid report name: path traversal detected")
     json.dump(report, open(report_file_path, 'w'))
     apm = AppProjectMetrics()
     apm.report_name_generated = report_name
EOF
@@ -2,6 +2,7 @@
import logging
import math
import os
from werkzeug.utils import secure_filename
import warnings
from collections import Counter
from typing import List, Dict
@@ -56,7 +57,11 @@
project_data = retrieve_project_data(projects)
metrics = ProjectMetrics(project_data, cat)
report = metrics.generate_report(meta_ann=loaded_model_pack)
report_file_path = f'{MEDIA_ROOT}/{report_name}.json'
safe_report_name = secure_filename(report_name)
report_file_path = os.path.normpath(os.path.join(MEDIA_ROOT, f"{safe_report_name}.json"))
# Ensure the report_file_path is within MEDIA_ROOT
if not report_file_path.startswith(os.path.abspath(MEDIA_ROOT)):
raise Exception("Invalid report name: path traversal detected")
json.dump(report, open(report_file_path, 'w'))
apm = AppProjectMetrics()
apm.report_name_generated = report_name
v1/medcat-trainer/webapp/requirements.txt
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/v1/medcat-trainer/webapp/requirements.txt b/v1/medcat-trainer/webapp/requirements.txt
--- a/v1/medcat-trainer/webapp/requirements.txt
+++ b/v1/medcat-trainer/webapp/requirements.txt
@@ -1,4 +1,5 @@
 --extra-index-url https://download.pytorch.org/whl/cpu/
+werkzeug==3.1.3
 uwsgi==2.0.*
 Django==5.1.*
 django-filter==24.2.*
EOF
@@ -1,4 +1,5 @@
--extra-index-url https://download.pytorch.org/whl/cpu/
werkzeug==3.1.3
uwsgi==2.0.*
Django==5.1.*
django-filter==24.2.*
This fix introduces these dependencies
Package Version Security advisories
werkzeug (pypi) 3.1.3 None
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +308 to +310
return Response({'message': e.args[0] if len(e.args) > 0 else 'Internal Server Error',
'description': e.args[1] if len(e.args) > 1 else '',
'stacktrace': stack}, status=500)

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 3 months ago

To fix this problem, the stack trace should not be included in the response sent to the user. Instead, the stack trace should be logged on the server for debugging purposes, and the user should receive a generic error message. This can be achieved by removing the stacktrace field from the response and using Python's logging module to log the stack trace. The fix should be applied to the exception handler in the prepare_documents function in v1/medcat-trainer/webapp/api/api/views.py. If the logging module is not already imported, it should be imported at the top of the file.


Suggested changeset 1
v1/medcat-trainer/webapp/api/api/views.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/v1/medcat-trainer/webapp/api/api/views.py b/v1/medcat-trainer/webapp/api/api/views.py
--- a/v1/medcat-trainer/webapp/api/api/views.py
+++ b/v1/medcat-trainer/webapp/api/api/views.py
@@ -1,3 +1,4 @@
+import logging
 import traceback
 from smtplib import SMTPException
 from tempfile import NamedTemporaryFile
@@ -304,10 +305,10 @@
                 project.save()
 
     except Exception as e:
-        stack = traceback.format_exc()
+        logging.error("Exception in prepare_documents: %s", traceback.format_exc())
         return Response({'message': e.args[0] if len(e.args) > 0 else 'Internal Server Error',
-                         'description': e.args[1] if len(e.args) > 1 else '',
-                         'stacktrace': stack}, status=500)
+                         'description': e.args[1] if len(e.args) > 1 else ''}, status=500)
+
     return Response({'message': 'Documents prepared successfully'})
 
 
EOF
@@ -1,3 +1,4 @@
import logging
import traceback
from smtplib import SMTPException
from tempfile import NamedTemporaryFile
@@ -304,10 +305,10 @@
project.save()

except Exception as e:
stack = traceback.format_exc()
logging.error("Exception in prepare_documents: %s", traceback.format_exc())
return Response({'message': e.args[0] if len(e.args) > 0 else 'Internal Server Error',
'description': e.args[1] if len(e.args) > 1 else '',
'stacktrace': stack}, status=500)
'description': e.args[1] if len(e.args) > 1 else ''}, status=500)

return Response({'message': 'Documents prepared successfully'})


Copilot is powered by AI and may make mistakes. Always verify output.
return HttpResponseBadRequest('User is not super user, and not allowed to download project outputs')
cdb_id = request.data.get('cdb_id')
if cdb_id is None or len(ConceptDB.objects.filter(id=cdb_id)) == 0:
return HttpResponseBadRequest(f'No CDB found for cdb_id{cdb_id}')

Check warning

Code scanning / CodeQL

Reflected server-side cross-site scripting Medium

Cross-site scripting vulnerability due to a
user-provided value
.

Copilot Autofix

AI 3 months ago

To fix this issue, we should ensure that any user input interpolated into an HTTP response is properly escaped to prevent XSS. The best way to do this in Django is to use the django.utils.html.escape() function to sanitize the cdb_id before including it in the response. This will convert any HTML special characters in cdb_id to their safe, escaped equivalents, preventing malicious code injection. The fix should be applied directly to line 457 in v1/medcat-trainer/webapp/api/api/views.py, and we need to ensure that escape is imported from django.utils.html.

Suggested changeset 1
v1/medcat-trainer/webapp/api/api/views.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/v1/medcat-trainer/webapp/api/api/views.py b/v1/medcat-trainer/webapp/api/api/views.py
--- a/v1/medcat-trainer/webapp/api/api/views.py
+++ b/v1/medcat-trainer/webapp/api/api/views.py
@@ -7,6 +7,7 @@
 from django.core.exceptions import ObjectDoesNotExist
 from django.db import transaction
 from django.http import HttpResponseBadRequest, HttpResponseServerError, HttpResponse
+from django.utils.html import escape
 from django.shortcuts import render
 from django.utils import timezone
 from django_filters import rest_framework as drf
@@ -454,7 +455,7 @@
         return HttpResponseBadRequest('User is not super user, and not allowed to download project outputs')
     cdb_id = request.data.get('cdb_id')
     if cdb_id is None or len(ConceptDB.objects.filter(id=cdb_id)) == 0:
-        return HttpResponseBadRequest(f'No CDB found for cdb_id{cdb_id}')
+        return HttpResponseBadRequest(f'No CDB found for cdb_id{escape(cdb_id)}')
     import_concepts_from_cdb(cdb_id)
     return Response({'message': 'submitted cdb import job.'})
 
EOF
@@ -7,6 +7,7 @@
from django.core.exceptions import ObjectDoesNotExist
from django.db import transaction
from django.http import HttpResponseBadRequest, HttpResponseServerError, HttpResponse
from django.utils.html import escape
from django.shortcuts import render
from django.utils import timezone
from django_filters import rest_framework as drf
@@ -454,7 +455,7 @@
return HttpResponseBadRequest('User is not super user, and not allowed to download project outputs')
cdb_id = request.data.get('cdb_id')
if cdb_id is None or len(ConceptDB.objects.filter(id=cdb_id)) == 0:
return HttpResponseBadRequest(f'No CDB found for cdb_id{cdb_id}')
return HttpResponseBadRequest(f'No CDB found for cdb_id{escape(cdb_id)}')
import_concepts_from_cdb(cdb_id)
return Response({'message': 'submitted cdb import job.'})

Copilot is powered by AI and may make mistakes. Always verify output.
except ProjectAnnotateEntities.DoesNotExist:
return Response(f'Project with id:{project_id} does not exist', 404)
except Exception as e:
return Response({'message': f'{str(e)}'}, 500)

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 3 months ago

To fix the problem, we should avoid returning the raw exception message to the client. Instead, return a generic error message, such as "An internal error has occurred." For debugging purposes, log the exception and stack trace on the server side using Python's logging module. This ensures developers can still diagnose issues without exposing sensitive details to users.

Steps:

  • In the except Exception as e: block (lines 714-715), replace the response with a generic error message.
  • Log the exception and stack trace using the logging module.
  • Add an import for logging at the top of the file if not already present.

Suggested changeset 1
v1/medcat-trainer/webapp/api/api/views.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/v1/medcat-trainer/webapp/api/api/views.py b/v1/medcat-trainer/webapp/api/api/views.py
--- a/v1/medcat-trainer/webapp/api/api/views.py
+++ b/v1/medcat-trainer/webapp/api/api/views.py
@@ -1,3 +1,4 @@
+import logging
 import traceback
 from smtplib import SMTPException
 from tempfile import NamedTemporaryFile
@@ -712,7 +713,8 @@
     except ProjectAnnotateEntities.DoesNotExist:
         return Response(f'Project with id:{project_id} does not exist', 404)
     except Exception as e:
-        return Response({'message': f'{str(e)}'}, 500)
+        logging.error("Error in cache_model for project_id %s: %s", project_id, traceback.format_exc())
+        return Response({'message': 'An internal error has occurred.'}, 500)
 
 
 
EOF
@@ -1,3 +1,4 @@
import logging
import traceback
from smtplib import SMTPException
from tempfile import NamedTemporaryFile
@@ -712,7 +713,8 @@
except ProjectAnnotateEntities.DoesNotExist:
return Response(f'Project with id:{project_id} does not exist', 404)
except Exception as e:
return Response({'message': f'{str(e)}'}, 500)
logging.error("Error in cache_model for project_id %s: %s", project_id, traceback.format_exc())
return Response({'message': 'An internal error has occurred.'}, 500)



Copilot is powered by AI and may make mistakes. Always verify output.
running_pending_report = Task.objects.filter(id=report_id, queue='metrics').first()
completed_report = CompletedTask.objects.filter(id=report_id, queue='metrics').first()
if running_pending_report is None and completed_report is None:
HttpResponseBadRequest(f'Cannot find report_id:{report_id} in either pending, running or complete report lists. ')

Check warning

Code scanning / CodeQL

Reflected server-side cross-site scripting Medium

Cross-site scripting vulnerability due to a
user-provided value
.

Copilot Autofix

AI 3 months ago

To fix the reflected server-side XSS vulnerability, we should escape the user-supplied report_id before including it in the response. In Django, the recommended way to escape HTML is to use django.utils.html.escape. We should import this function and use it to sanitize report_id in the error message returned by HttpResponseBadRequest. The change should be made in the view_metrics function, specifically on line 810. Additionally, the function should return the response object (i.e., return HttpResponseBadRequest(...)) rather than just calling it, to ensure the error is properly sent to the client.

Required changes:

  • Import escape from django.utils.html.
  • Use escape(report_id) in the error message on line 810.
  • Add return before HttpResponseBadRequest(...) on line 810.

Suggested changeset 1
v1/medcat-trainer/webapp/api/api/views.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/v1/medcat-trainer/webapp/api/api/views.py b/v1/medcat-trainer/webapp/api/api/views.py
--- a/v1/medcat-trainer/webapp/api/api/views.py
+++ b/v1/medcat-trainer/webapp/api/api/views.py
@@ -9,6 +9,7 @@
 from django.http import HttpResponseBadRequest, HttpResponseServerError, HttpResponse
 from django.shortcuts import render
 from django.utils import timezone
+from django.utils.html import escape
 from django_filters import rest_framework as drf
 from medcat.utils.helpers import tkns_from_doc
 from rest_framework import viewsets
@@ -807,9 +808,9 @@
         running_pending_report = Task.objects.filter(id=report_id, queue='metrics').first()
         completed_report = CompletedTask.objects.filter(id=report_id, queue='metrics').first()
         if running_pending_report is None and completed_report is None:
-            HttpResponseBadRequest(f'Cannot find report_id:{report_id} in either pending, running or complete report lists. ')
+            return HttpResponseBadRequest(f'Cannot find report_id:{escape(report_id)} in either pending, running or complete report lists. ')
         elif running_pending_report is not None:
-            HttpResponseBadRequest(f'Cannot view a running or pending metrics report with id:{report_id}')
+            return HttpResponseBadRequest(f'Cannot view a running or pending metrics report with id:{escape(report_id)}')
         pm_obj = ProjectMetrics.objects.filter(report_name_generated=completed_report.verbose_name).first()
         out = {
             'results': {
EOF
@@ -9,6 +9,7 @@
from django.http import HttpResponseBadRequest, HttpResponseServerError, HttpResponse
from django.shortcuts import render
from django.utils import timezone
from django.utils.html import escape
from django_filters import rest_framework as drf
from medcat.utils.helpers import tkns_from_doc
from rest_framework import viewsets
@@ -807,9 +808,9 @@
running_pending_report = Task.objects.filter(id=report_id, queue='metrics').first()
completed_report = CompletedTask.objects.filter(id=report_id, queue='metrics').first()
if running_pending_report is None and completed_report is None:
HttpResponseBadRequest(f'Cannot find report_id:{report_id} in either pending, running or complete report lists. ')
return HttpResponseBadRequest(f'Cannot find report_id:{escape(report_id)} in either pending, running or complete report lists. ')
elif running_pending_report is not None:
HttpResponseBadRequest(f'Cannot view a running or pending metrics report with id:{report_id}')
return HttpResponseBadRequest(f'Cannot view a running or pending metrics report with id:{escape(report_id)}')
pm_obj = ProjectMetrics.objects.filter(report_name_generated=completed_report.verbose_name).first()
out = {
'results': {
Copilot is powered by AI and may make mistakes. Always verify output.
if running_pending_report is None and completed_report is None:
HttpResponseBadRequest(f'Cannot find report_id:{report_id} in either pending, running or complete report lists. ')
elif running_pending_report is not None:
HttpResponseBadRequest(f'Cannot view a running or pending metrics report with id:{report_id}')

Check warning

Code scanning / CodeQL

Reflected server-side cross-site scripting Medium

Cross-site scripting vulnerability due to a
user-provided value
.

Copilot Autofix

AI 3 months ago

To fix this issue, we should ensure that any user-provided data interpolated into an HTTP response is properly escaped or validated. Since report_id is expected to be an integer, the best fix is to explicitly cast it to an integer before using it in the response. This will prevent any malicious input from being injected into the response. If the cast fails, we should return a generic error message without including the user input. The changes should be made in the view_metrics function in v1/medcat-trainer/webapp/api/api/views.py, specifically on lines where report_id is used in an HTTP response.

No new imports are needed, as Python's built-in int() function suffices for type enforcement.


Suggested changeset 1
v1/medcat-trainer/webapp/api/api/views.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/v1/medcat-trainer/webapp/api/api/views.py b/v1/medcat-trainer/webapp/api/api/views.py
--- a/v1/medcat-trainer/webapp/api/api/views.py
+++ b/v1/medcat-trainer/webapp/api/api/views.py
@@ -803,13 +803,17 @@
 
 @api_view(http_method_names=['GET', 'PUT'])
 def view_metrics(request, report_id):
+    try:
+        safe_report_id = int(report_id)
+    except (ValueError, TypeError):
+        return HttpResponseBadRequest('Invalid report_id.')
     if request.method == 'GET':
-        running_pending_report = Task.objects.filter(id=report_id, queue='metrics').first()
-        completed_report = CompletedTask.objects.filter(id=report_id, queue='metrics').first()
+        running_pending_report = Task.objects.filter(id=safe_report_id, queue='metrics').first()
+        completed_report = CompletedTask.objects.filter(id=safe_report_id, queue='metrics').first()
         if running_pending_report is None and completed_report is None:
-            HttpResponseBadRequest(f'Cannot find report_id:{report_id} in either pending, running or complete report lists. ')
+            return HttpResponseBadRequest(f'Cannot find report_id:{safe_report_id} in either pending, running or complete report lists.')
         elif running_pending_report is not None:
-            HttpResponseBadRequest(f'Cannot view a running or pending metrics report with id:{report_id}')
+            return HttpResponseBadRequest(f'Cannot view a running or pending metrics report with id:{safe_report_id}')
         pm_obj = ProjectMetrics.objects.filter(report_name_generated=completed_report.verbose_name).first()
         out = {
             'results': {
EOF
@@ -803,13 +803,17 @@

@api_view(http_method_names=['GET', 'PUT'])
def view_metrics(request, report_id):
try:
safe_report_id = int(report_id)
except (ValueError, TypeError):
return HttpResponseBadRequest('Invalid report_id.')
if request.method == 'GET':
running_pending_report = Task.objects.filter(id=report_id, queue='metrics').first()
completed_report = CompletedTask.objects.filter(id=report_id, queue='metrics').first()
running_pending_report = Task.objects.filter(id=safe_report_id, queue='metrics').first()
completed_report = CompletedTask.objects.filter(id=safe_report_id, queue='metrics').first()
if running_pending_report is None and completed_report is None:
HttpResponseBadRequest(f'Cannot find report_id:{report_id} in either pending, running or complete report lists. ')
return HttpResponseBadRequest(f'Cannot find report_id:{safe_report_id} in either pending, running or complete report lists.')
elif running_pending_report is not None:
HttpResponseBadRequest(f'Cannot view a running or pending metrics report with id:{report_id}')
return HttpResponseBadRequest(f'Cannot view a running or pending metrics report with id:{safe_report_id}')
pm_obj = ProjectMetrics.objects.filter(report_name_generated=completed_report.verbose_name).first()
out = {
'results': {
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +16 to +49
runs-on: ubuntu-latest
steps:
- name: Checkout main
uses: actions/checkout@v4
with:
ref: ${{ github.ref }}

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install requests pytest build

- name: Install client package in development mode
run: |
cd client
pip install -e .

- name: Run client tests
run: |
cd client
python -m pytest tests/ -v

- name: Build client package
run: |
cd client
python -m build

# Build and test webapp container
build-and-push:

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 3 months ago

To fix the problem, add a permissions block to the workflow file to explicitly set the minimal required permissions for the GITHUB_TOKEN. Since the workflow only checks out code, installs dependencies, runs tests, and builds packages/containers, it does not require any write permissions. The minimal required permission is contents: read, which allows the workflow to read repository contents. This block should be added at the top level of the workflow file (after the name: and before on:), so it applies to all jobs in the workflow. No changes to the jobs or steps are needed.


Suggested changeset 1
.github/workflows/medcat-trainer-v1_ci.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/medcat-trainer-v1_ci.yml b/.github/workflows/medcat-trainer-v1_ci.yml
--- a/.github/workflows/medcat-trainer-v1_ci.yml
+++ b/.github/workflows/medcat-trainer-v1_ci.yml
@@ -1,3 +1,5 @@
+permissions:
+  contents: read
 name: medcat-trainer-v1 ci-build
 
 on:
EOF
@@ -1,3 +1,5 @@
permissions:
contents: read
name: medcat-trainer-v1 ci-build

on:
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +50 to +68
runs-on: ubuntu-latest
needs: test-client
steps:
- name: Checkout main
uses: actions/checkout@v4
with:
ref: ${{ github.ref }}

- name: Build
env:
IMAGE_TAG: ${{ env.RELEASE_VERSION }}
run: |
docker build -t cogstacksystems/medcat-trainer:dev-latest webapp/.
- name: Run Django Tests
env:
IMAGE_TAG: ${{ env.RELEASE_VERSION }}
run: |
# run tests
docker run --rm cogstacksystems/medcat-trainer:dev-latest python manage.py test

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 3 months ago

To fix the problem, add a permissions block to the workflow file .github/workflows/medcat-trainer-v1_ci.yml. The block should be placed at the top level, just after the name and before the on key, so that it applies to all jobs in the workflow. The minimal required permission for this workflow is contents: read, as none of the jobs require write access to the repository. No additional imports or definitions are needed; this is a YAML configuration change only.

Suggested changeset 1
.github/workflows/medcat-trainer-v1_ci.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/medcat-trainer-v1_ci.yml b/.github/workflows/medcat-trainer-v1_ci.yml
--- a/.github/workflows/medcat-trainer-v1_ci.yml
+++ b/.github/workflows/medcat-trainer-v1_ci.yml
@@ -1,3 +1,5 @@
+permissions:
+  contents: read
 name: medcat-trainer-v1 ci-build
 
 on:
EOF
@@ -1,3 +1,5 @@
permissions:
contents: read
name: medcat-trainer-v1 ci-build

on:
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +14 to +54
runs-on: ubuntu-latest
steps:
- name: Checkout main
uses: actions/checkout@v4
with:
ref: 'main'

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install requests pytest build

- name: Install client package in development mode
run: |
cd client
pip install -e .

- name: Run client tests
run: |
cd client
python -m pytest tests/ -v

- name: Build client package
run: |
cd client
python -m build

# - name: Publish dev distribution to Test PyPI
# uses: pypa/gh-action-pypi-publish@v1.4.2
# with:
# password: ${{ secrets.MEDCAT_TRAINER_TEST_PYPI_API_TOKEN }}
# repository_url: https://test.pypi.org/legacy/
# packages_dir: v1/medcat-trainer/client/dist

# Build and test webapp container
build-and-push:

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 3 months ago

To fix the problem, add a permissions block to the workflow file to explicitly restrict the GITHUB_TOKEN permissions. The best way to do this is to add the block at the top level of the workflow, so it applies to all jobs unless overridden. In this case, the jobs only need to read repository contents (for checkout), so set contents: read. This change should be made near the top of the file, after the name: and before the on: block. No additional imports or definitions are needed.

Suggested changeset 1
.github/workflows/medcat-trainer-v1_qa.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/medcat-trainer-v1_qa.yml b/.github/workflows/medcat-trainer-v1_qa.yml
--- a/.github/workflows/medcat-trainer-v1_qa.yml
+++ b/.github/workflows/medcat-trainer-v1_qa.yml
@@ -1,3 +1,5 @@
+permissions:
+  contents: read
 name: medcat-trainer-v1 qa-build
 
 on:
EOF
@@ -1,3 +1,5 @@
permissions:
contents: read
name: medcat-trainer-v1 qa-build

on:
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +55 to +82
runs-on: ubuntu-latest
needs: test-client
steps:
- name: Checkout main
uses: actions/checkout@v4
with:
ref: 'main'

- name: Build
run: |
docker build -t cogstacksystems/medcat-trainer:v1-latest webapp/.

- name: Run Django Tests
env:
IMAGE_TAG: latest
run: |
# run tests
docker run --rm cogstacksystems/medcat-trainer:v1-latest python manage.py test

- name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKER_HUB_ACCESS_TOKEN }}

- name: Push to DockerHub
run: |
docker push cogstacksystems/medcat-trainer:v1-latest

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 3 months ago

To fix the problem, you should add a permissions block to the workflow file. This can be done at the root level (applies to all jobs) or at the job level (for more granular control). Since neither job appears to require write access to repository contents, issues, or pull requests, the minimal permission of contents: read is appropriate. This change should be made at the top level of the workflow file, immediately after the name: and before the on: block, to ensure all jobs inherit these permissions unless overridden. No additional imports or definitions are required.

Suggested changeset 1
.github/workflows/medcat-trainer-v1_qa.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/medcat-trainer-v1_qa.yml b/.github/workflows/medcat-trainer-v1_qa.yml
--- a/.github/workflows/medcat-trainer-v1_qa.yml
+++ b/.github/workflows/medcat-trainer-v1_qa.yml
@@ -1,3 +1,5 @@
+permissions:
+  contents: read
 name: medcat-trainer-v1 qa-build
 
 on:
EOF
@@ -1,3 +1,5 @@
permissions:
contents: read
name: medcat-trainer-v1 qa-build

on:
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +15 to +60
runs-on: ubuntu-latest
steps:
- name: Checkout main
uses: actions/checkout@v4
with:
ref: "main"

- name: Release Tag
# If GITHUB_REF=refs/tags/medcat-trainer/v0.1.2, this returns v0.1.2. Note it's including the "v" though it probably shouldnt
run: echo "RELEASE_VERSION=${GITHUB_REF##refs/*/}" >> $GITHUB_ENV

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install requests pytest build twine

- name: Install client package in development mode
run: |
cd client
pip install -e .

- name: Run client tests
run: |
cd client
python -m pytest tests/ -v

- name: Build client package
run: |
cd client
python -m build

- name: Publish production distribution to PyPI
if: startsWith(github.ref, 'refs/tags') && ! github.event.release.prerelease
uses: pypa/gh-action-pypi-publish@v1.4.2
with:
# TODO CU-869a25n7e Use Trusted Platform Publisher based PyPI release
password: ${{ secrets.PYPI_API_TOKEN }}
packages_dir: v1/medcat-trainer/client/dist

# Build and test webapp container
build-and-push:

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 3 months ago

To fix the problem, add a permissions block to the workflow file to explicitly set the minimum required permissions for the jobs. The best way to do this is to add the permissions key at the top level of the workflow (just after the name and before on), which will apply to all jobs unless overridden. Since the jobs do not appear to require any write access to repository contents, the minimal permission of contents: read is sufficient. No changes to the jobs themselves are necessary.

Steps:

  • Insert the following at the top of .github/workflows/medcat-trainer-v1_release.yml, after the name field and before on:
    permissions:
      contents: read
  • No additional imports, methods, or definitions are needed.
Suggested changeset 1
.github/workflows/medcat-trainer-v1_release.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/medcat-trainer-v1_release.yml b/.github/workflows/medcat-trainer-v1_release.yml
--- a/.github/workflows/medcat-trainer-v1_release.yml
+++ b/.github/workflows/medcat-trainer-v1_release.yml
@@ -1,3 +1,5 @@
+permissions:
+  contents: read
 name: medcat-trainer-v1 release-build
 
 on:
EOF
@@ -1,3 +1,5 @@
permissions:
contents: read
name: medcat-trainer-v1 release-build

on:
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +61 to +94
runs-on: ubuntu-latest
needs: test-and-publish-client
steps:
- name: Checkout main
uses: actions/checkout@v4
with:
ref: "main"

- name: Release Tag
# If GITHUB_REF=refs/tags/medcat-trainer/v0.1.2, this returns v0.1.2. Note it's including the "v" though it probably shouldnt
run: echo "RELEASE_VERSION=${GITHUB_REF##refs/*/}" >> $GITHUB_ENV
- name: Build
env:
IMAGE_TAG: ${{ env.RELEASE_VERSION }}
run: |
docker build -t cogstacksystems/medcat-trainer:$IMAGE_TAG webapp/.
- name: Run Django Tests
env:
IMAGE_TAG: ${{ env.RELEASE_VERSION }}
run: |
# run tests
docker run --rm cogstacksystems/medcat-trainer:$IMAGE_TAG python manage.py test

- name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKER_HUB_ACCESS_TOKEN }}

- name: Push to DockerHub
env:
IMAGE_TAG: ${{ env.RELEASE_VERSION }}
run: |
docker push cogstacksystems/medcat-trainer:$IMAGE_TAG

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 3 months ago

To fix the problem, add an explicit permissions block to the workflow file. The block should be placed at the top level (applies to all jobs) or at the job level (for each job that does not need write access). Since neither job in this workflow appears to require write access to the repository via the GITHUB_TOKEN, the minimal required permission is contents: read. This should be added near the top of the file, after the name and before or after the on block. No other code changes are needed.


Suggested changeset 1
.github/workflows/medcat-trainer-v1_release.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/medcat-trainer-v1_release.yml b/.github/workflows/medcat-trainer-v1_release.yml
--- a/.github/workflows/medcat-trainer-v1_release.yml
+++ b/.github/workflows/medcat-trainer-v1_release.yml
@@ -1,5 +1,8 @@
 name: medcat-trainer-v1 release-build
 
+permissions:
+  contents: read
+
 on:
   push:
     tags:
EOF
@@ -1,5 +1,8 @@
name: medcat-trainer-v1 release-build

permissions:
contents: read

on:
push:
tags:
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Collaborator

@mart-r mart-r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to make a note of the CodeQL issues for the future.

But since they're already in the other one anyway, I don't see it as a priority.

@alhendrickson alhendrickson merged commit f713469 into main Aug 14, 2025
9 of 10 checks passed
@alhendrickson alhendrickson deleted the chore/prep-medcat-trainer-v2 branch August 14, 2025 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants