Skip to content

threading for instrument records export#1261

Merged
joshunrau merged 42 commits intomainfrom
fix-threading-clone-issue
Jan 21, 2026
Merged

threading for instrument records export#1261
joshunrau merged 42 commits intomainfrom
fix-threading-clone-issue

Conversation

@joshunrau
Copy link
Copy Markdown
Collaborator

@joshunrau joshunrau commented Jan 21, 2026

Summary by CodeRabbit

  • Improvements

    • Parallelized export processing for faster exports and clearer error propagation.
    • Export rows now include richer, consistent fields (session details, timestamps, masked subject IDs, usernames).
    • Stronger validation of subject identification data to reduce malformed exports.
  • Refactor

    • Export processing moved to a worker-based pipeline for chunked, concurrent handling.
  • Tests

    • Added unit tests for export behavior, error cases, and output shape.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 21, 2026

Caution

Review failed

The pull request is closed.

Walkthrough

Adds a worker-based, multi-threaded export pipeline for instrument records, refactors exportRecords to use workers and a raw-query helper, introduces inter-worker message types, adds unit tests for InstrumentRecordsService, and tightens clinical subject input validation.

Changes

Cohort / File(s) Summary
Export worker & types
apps/api/src/instrument-records/export-worker.ts, apps/api/src/instrument-records/thread-types.ts
New worker implementation and TypeScript message/type definitions for INIT / BEGIN_CHUNK_PROCESSING messaging, record-to-export-row transformation, array-measure expansion, and error propagation.
Service refactor & tests
apps/api/src/instrument-records/instrument-records.service.ts, apps/api/src/instrument-records/__tests__/instrument-records.service.spec.ts
exportRecords refactored to spawn Worker threads, prefetch instruments, chunk records, require explicit ability param, add queryRecordsRaw helper, remove in-service expandData, and add unit tests covering findById and export behavior.
Subject validation util
packages/subject-utils/src/index.ts
Adds internal runtime validator parseClinicalSubjectIdentificationData and uses it in subject hash generation to enforce dateOfBirth/firstName/lastName/sex types.

Sequence Diagram

sequenceDiagram
    participant Service as InstrumentRecordsService
    participant Worker as Worker Thread
    participant DB as Database
    participant Instruments as Instruments Service

    Service->>DB: queryRecordsRaw(appAbility, groupId?)
    DB-->>Service: records[]
    Service->>Instruments: fetch instruments by ids (bulk)
    Instruments-->>Service: instrumentsMap
    Service->>Worker: INIT (availableInstrumentArray)
    Worker-->>Service: InitialMessage { success: true }
    Service->>Worker: BEGIN_CHUNK_PROCESSING (records chunk)
    Worker->>Worker: expand array measures, map to export rows
    alt expansion error
        Worker-->>Service: WorkerMessage { success: false, error }
    else success
        Worker-->>Service: WorkerMessage { success: true, data: exportRows[] }
    end
    Service->>Service: aggregate chunk results
    Service-->>Client: final exportRows[]
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • PR#1244: Overlaps export pipeline and exported row shape changes (username and export field updates).
  • PR#1146: Related work on array-measure expansion; that PR added in-service expansion which this PR moves to workers.
  • PR#1253: Touches findById service/controller behavior that the new unit tests cover.
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: implementing multithreading for the instrument records export functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In
`@apps/api/src/instrument-records/__tests__/instrument-records.service.spec.ts`:
- Around line 81-112: The mockRecords in instrument-records.service.spec.ts uses
Date objects for fields like createdAt, date, session.date, subject.dateOfBirth,
and updatedAt while queryRecordsRaw returns serialized date strings; change
those Date instances to ISO date strings (e.g., "2023-01-01T00:00:00.000Z") for
createdAt, date, session.date, subject.dateOfBirth, updatedAt (and any other
date fields) so the mock matches the export shape expected by
expect.any(String); update the mockRecords declaration (and its type assertion
if needed) to accept strings (cast to the same (InstrumentRecord & {...})[] or
to unknown then to that type) so TypeScript still compiles.

In `@apps/api/src/instrument-records/instrument-records.service.ts`:
- Around line 183-193: The INIT message listener (worker.on('message', (message:
InitialMessage) => { ... })) is also matching chunk responses because both
include success, causing duplicate CHUNK_COMPLETE posts; change the INIT handler
to either use worker.once('message', ...) or explicitly gate on message.type ===
'INIT' (or similar discriminant) before posting CHUNK_COMPLETE, and keep the
chunk response handling separate (the inner worker.on('message', (message:
WorkerMessage) => { ... })) so resolve/reject and worker.terminate() only run
for chunk responses.
- Around line 454-460: The $expr currently gets an empty object when groupId is
undefined which is invalid; update the aggregation so the conditional is applied
at the $match level instead of inside $expr: in the method that builds the
pipeline (referencing groupId and the $match stage in
instrument-records.service.ts), make $match either include a filter that
compares groupId to {$toObjectId: groupId} (when groupId is present) or be
omitted/empty (when groupId is absent) rather than passing {} into $expr; ensure
you remove the conditional from $expr and construct $match accordingly so the
pipeline receives a valid expression only when groupId is provided.
- Around line 167-169: The current use of Iterator.prototype.toArray() on
instruments.values() (creating availableInstrumentArray of type InitData) will
crash on Node.js 20 LTS; replace the toArray() call with Array.from(...) to
materialize the iterator (e.g., build availableInstrumentArray =
Array.from(instruments.values()) while preserving the InitData typing) so it
works on Node.js 20.

Comment thread apps/api/src/instrument-records/__tests__/instrument-records.service.spec.ts Outdated
Comment on lines +167 to +169
const availableInstrumentArray: InitData = instruments
.values()
.toArray()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Check the Node.js version to understand what LTS versions are current
cat /etc/os-release 2>/dev/null || echo "OS info unavailable"

# Try to find information about Node.js version in the project
find . -type f \( -name ".nvmrc" -o -name ".node-version" -o -name "package.json" \) | head -5

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 223


🏁 Script executed:

# Read the actual file to see the context at lines 167-169
head -180 apps/api/src/instrument-records/instrument-records.service.ts | tail -20

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 747


🌐 Web query:

Iterator.prototype.toArray() Node.js support LTS availability 2024 2025

💡 Result:

Short answer: Iterator.prototype.toArray() is available in Node.js starting with v22 (V8 12.4) and therefore in Node.js 22.x LTS (entered LTS October 2024). Sources: MDN and the Node.js v22 release notes. [1][2]

Details / dates:

  • Node.js 22.0.0 (added V8 12.4 which includes iterator helpers / toArray): released Apr 24, 2024. [2]
  • Node.js 22 was promoted to LTS in October 2024, so toArray is available on Node.js LTS starting Oct 2024 (and in all 22.x releases thereafter). [2][3]

References:
[1] MDN — Iterator.prototype.toArray (compatibility shows Node.js 22.0.0+).
[2] Node.js v22.0.0 release notes (Apr 24, 2024) — V8 12.4 / iterator helpers.
[3] Node.js documentation/blog noting Node.js 22 enters LTS in October 2024.


🏁 Script executed:

# Check what Node.js version the project targets
cat .nvmrc

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 90


🏁 Script executed:

# Also check package.json for engines specification
cat apps/api/package.json | grep -A 5 '"engines"'

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 65


🏁 Script executed:

# Let me verify the actual lines in context to see the full picture
cat -n apps/api/src/instrument-records/instrument-records.service.ts | sed -n '160,180p'

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 976


🏁 Script executed:

# Search for the actual toArray() call in the file
rg -n "toArray" apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 103


🏁 Script executed:

# Also search for availableInstrumentArray to find where it's defined
rg -n "availableInstrumentArray" apps/api/src/instrument-records/instrument-records.service.ts -A 3

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 404


🏁 Script executed:

# Verify Node.js 20 (Jod) LTS does NOT have Iterator.prototype.toArray()
# Let's check the Node.js documentation for version 20
python3 << 'EOF'
# Node.js 20 (Jod) was released April 2023, entered LTS October 2023
# Iterator.prototype.toArray() was added in Node.js 22 (April 2024)
print("Node.js 20 LTS (Jod) - released April 2023, LTS October 2023")
print("Iterator.prototype.toArray() - added in Node.js 22.0.0 (April 2024)")
print("\nConclusion: Node.js 20 LTS does NOT support Iterator.prototype.toArray()")
EOF

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 169


Avoid Iterator.prototype.toArray() to prevent runtime crashes on Node.js 20 LTS.

The project targets Node.js 20 LTS (lts/jod), which does not support Iterator helpers. The toArray() method was added in Node.js 22.0.0 (April 2024). Use Array.from() instead.

Lines 147-149:

♻️ Proposed fix
    const availableInstrumentArray: InitData = instruments
-      .values()
-      .toArray()
-      .map((item) => {
+      = Array.from(instruments.values()).map((item) => {
       return {
         edition: item.internal.edition,
         id: item.id!,
         name: item.internal.name
       };
     });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const availableInstrumentArray: InitData = instruments
.values()
.toArray()
const availableInstrumentArray: InitData = Array.from(instruments.values()).map((item) => {
return {
edition: item.internal.edition,
id: item.id!,
name: item.internal.name
};
});
🤖 Prompt for AI Agents
In `@apps/api/src/instrument-records/instrument-records.service.ts` around lines
167 - 169, The current use of Iterator.prototype.toArray() on
instruments.values() (creating availableInstrumentArray of type InitData) will
crash on Node.js 20 LTS; replace the toArray() call with Array.from(...) to
materialize the iterator (e.g., build availableInstrumentArray =
Array.from(instruments.values()) while preserving the InitData typing) so it
works on Node.js 20.

Comment on lines +183 to +193
worker.on('message', (message: InitialMessage) => {
if (message.success) {
worker.postMessage({ data: chunk, type: 'CHUNK_COMPLETE' });
worker.on('message', (message: WorkerMessage) => {
if (message.success) {
resolve(message.data);
} else {
reject(new Error(message.error));
}
void worker.terminate();
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

INIT listener also handles chunk responses, causing extra CHUNK_COMPLETE posts.

Both responses include success, so the INIT handler will fire on the chunk response and resend work. Use once or differentiate message shapes.

🛠️ Proposed fix
-        worker.on('message', (message: InitialMessage) => {
-          if (message.success) {
-            worker.postMessage({ data: chunk, type: 'CHUNK_COMPLETE' });
-            worker.on('message', (message: WorkerMessage) => {
-              if (message.success) {
-                resolve(message.data);
-              } else {
-                reject(new Error(message.error));
-              }
-              void worker.terminate();
-            });
-          }
-        });
+        worker.once('message', (message: InitialMessage) => {
+          if (!message.success) {
+            reject(new Error('Worker init failed'));
+            void worker.terminate();
+            return;
+          }
+          worker.once('message', (message: WorkerMessage) => {
+            if (message.success) resolve(message.data);
+            else reject(new Error(message.error));
+            void worker.terminate();
+          });
+          worker.postMessage({ data: chunk, type: 'CHUNK_COMPLETE' });
+        });
🤖 Prompt for AI Agents
In `@apps/api/src/instrument-records/instrument-records.service.ts` around lines
183 - 193, The INIT message listener (worker.on('message', (message:
InitialMessage) => { ... })) is also matching chunk responses because both
include success, causing duplicate CHUNK_COMPLETE posts; change the INIT handler
to either use worker.once('message', ...) or explicitly gate on message.type ===
'INIT' (or similar discriminant) before posting CHUNK_COMPLETE, and keep the
chunk response handling separate (the inner worker.on('message', (message:
WorkerMessage) => { ... })) so resolve/reject and worker.terminate() only run
for chunk responses.

Comment on lines +454 to +460
$match: {
$expr: groupId
? {
$eq: ['$groupId', { $toObjectId: groupId }]
}
: {}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

$expr receives {} when groupId is undefined.

This is likely an invalid expression and can break exports without a group filter. Move the conditional to $match instead.

🛠️ Proposed fix
-      {
-        $match: {
-          $expr: groupId
-            ? {
-                $eq: ['$groupId', { $toObjectId: groupId }]
-              }
-            : {}
-        }
-      },
+      {
+        $match: groupId
+          ? {
+              $expr: { $eq: ['$groupId', { $toObjectId: groupId }] }
+            }
+          : {}
+      },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
$match: {
$expr: groupId
? {
$eq: ['$groupId', { $toObjectId: groupId }]
}
: {}
}
{
$match: groupId
? {
$expr: { $eq: ['$groupId', { $toObjectId: groupId }] }
}
: {}
},
🤖 Prompt for AI Agents
In `@apps/api/src/instrument-records/instrument-records.service.ts` around lines
454 - 460, The $expr currently gets an empty object when groupId is undefined
which is invalid; update the aggregation so the conditional is applied at the
$match level instead of inside $expr: in the method that builds the pipeline
(referencing groupId and the $match stage in instrument-records.service.ts),
make $match either include a filter that compares groupId to {$toObjectId:
groupId} (when groupId is present) or be omitted/empty (when groupId is absent)
rather than passing {} into $expr; ensure you remove the conditional from $expr
and construct $match accordingly so the pipeline receives a valid expression
only when groupId is provided.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In
`@apps/api/src/instrument-records/__tests__/instrument-records.service.spec.ts`:
- Around line 81-107: The mockRecords entry uses Date objects for record.date
and session.date but RecordType and the export worker expect ISO strings; update
mockRecords (the object in the test) so that the properties "date" and
"session.date" are ISO strings (e.g., via toISOString or literal ISO strings) to
match the RecordType contract and ensure the export worker produces a string
"timestamp" as asserted in the test.

In `@apps/api/src/instrument-records/instrument-records.service.ts`:
- Around line 160-174: The Worker message handlers currently use
worker.on('message') causing duplicate listeners; change both the outer handler
that listens for the INIT response (the callback receiving InitialMessage) and
the inner handler that listens for CHUNK_COMPLETE (the callback receiving
WorkerMessage) to use once('message') so each handler runs only once, and ensure
the outer INIT handler checks message.success === false and immediately rejects
with message.error and terminates the Worker (calling worker.terminate()) to
avoid cascading failures; keep the existing worker.postMessage({ data:
availableInstrumentArray, type: 'INIT' }) and worker.postMessage({ data: chunk,
type: 'CHUNK_COMPLETE' }) calls but only send CHUNK_COMPLETE from the
INIT-success path once.
- Around line 445-449: The export projection is using the database creation
timestamp field '$createdAt' instead of the measurement timestamp field '$date',
so change the $dateToString entry in the export pipeline/projection to use
'date: { $dateToString: { date: '$date', format: '%Y-%m-%d' } }' (i.e., replace
'$createdAt' with '$date') so exported rows reflect the measurement timestamp;
keep the output field name 'date' unchanged.

In `@packages/subject-utils/src/index.ts`:
- Around line 8-31: The runtime checks in parseClinicalSubjectIdentificationData
are too strict and out of sync with the Zod schema; update the function to
delegate validation to the schema (use the schema's parse/safeParse so
z.coerce.date() can handle strings/numbers, the .min(1) constraints on
firstName/lastName are enforced, and invalid Date instances are rejected by the
schema), remove the manual typeof/instanceof checks, and return the parsed value
from the schema (or throw the schema error) so callers get consistent,
canonicalized data.

Comment thread apps/api/src/instrument-records/__tests__/instrument-records.service.spec.ts Outdated
Comment on lines +160 to +174
const worker = new Worker(join(import.meta.dirname, 'export-worker.ts'));
worker.postMessage({ data: availableInstrumentArray, type: 'INIT' });

worker.on('message', (message: InitialMessage) => {
if (message.success) {
worker.postMessage({ data: chunk, type: 'CHUNK_COMPLETE' });
worker.on('message', (message: WorkerMessage) => {
if (message.success) {
resolve(message.data);
} else {
reject(new Error(message.error));
}
void worker.terminate();
});
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

wc -l apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 222


🏁 Script executed:

sed -n '150,190p' apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 1323


🏁 Script executed:

sed -n '1,50p' apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 2291


🏁 Script executed:

find . -name "thread-types.ts" -o -name "thread-types.d.ts"

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 132


🏁 Script executed:

cat apps/api/src/instrument-records/thread-types.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 1102


Use once for Worker message handlers to prevent duplicate listener stacking.

The outer on('message') handler remains registered after the INIT response arrives. When the CHUNK_COMPLETE response is received later, both the original outer handler and the newly registered inner handler fire. The outer handler then incorrectly re-sends CHUNK_COMPLETE and registers another listener, causing duplicate work.

Replace on with once for both message handlers. Additionally, add error handling for failed INIT messages to avoid cascading failures.

Proposed fix
-        worker.on('message', (message: InitialMessage) => {
-          if (message.success) {
-            worker.postMessage({ data: chunk, type: 'CHUNK_COMPLETE' });
-            worker.on('message', (message: WorkerMessage) => {
-              if (message.success) {
-                resolve(message.data);
-              } else {
-                reject(new Error(message.error));
-              }
-              void worker.terminate();
-            });
-          }
-        });
+        worker.once('message', (message: InitialMessage) => {
+          if (!message.success) {
+            reject(new Error('Worker init failed'));
+            void worker.terminate();
+            return;
+          }
+          worker.postMessage({ data: chunk, type: 'CHUNK_COMPLETE' });
+          worker.once('message', (message: WorkerMessage) => {
+            if (message.success) {
+              resolve(message.data);
+            } else {
+              reject(new Error(message.error));
+            }
+            void worker.terminate();
+          });
+        });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const worker = new Worker(join(import.meta.dirname, 'export-worker.ts'));
worker.postMessage({ data: availableInstrumentArray, type: 'INIT' });
worker.on('message', (message: InitialMessage) => {
if (message.success) {
worker.postMessage({ data: chunk, type: 'CHUNK_COMPLETE' });
worker.on('message', (message: WorkerMessage) => {
if (message.success) {
resolve(message.data);
} else {
reject(new Error(message.error));
}
void worker.terminate();
});
}
const worker = new Worker(join(import.meta.dirname, 'export-worker.ts'));
worker.postMessage({ data: availableInstrumentArray, type: 'INIT' });
worker.once('message', (message: InitialMessage) => {
if (!message.success) {
reject(new Error('Worker init failed'));
void worker.terminate();
return;
}
worker.postMessage({ data: chunk, type: 'CHUNK_COMPLETE' });
worker.once('message', (message: WorkerMessage) => {
if (message.success) {
resolve(message.data);
} else {
reject(new Error(message.error));
}
void worker.terminate();
});
});
🤖 Prompt for AI Agents
In `@apps/api/src/instrument-records/instrument-records.service.ts` around lines
160 - 174, The Worker message handlers currently use worker.on('message')
causing duplicate listeners; change both the outer handler that listens for the
INIT response (the callback receiving InitialMessage) and the inner handler that
listens for CHUNK_COMPLETE (the callback receiving WorkerMessage) to use
once('message') so each handler runs only once, and ensure the outer INIT
handler checks message.success === false and immediately rejects with
message.error and terminates the Worker (calling worker.terminate()) to avoid
cascading failures; keep the existing worker.postMessage({ data:
availableInstrumentArray, type: 'INIT' }) and worker.postMessage({ data: chunk,
type: 'CHUNK_COMPLETE' }) calls but only send CHUNK_COMPLETE from the
INIT-success path once.

Comment on lines +445 to +449
date: {
$dateToString: {
date: '$createdAt',
format: '%Y-%m-%d'
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd "instrument-records.service.ts" --type f

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 144


🏁 Script executed:

wc -l apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 148


🏁 Script executed:

sed -n '430,465p' apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 904


🏁 Script executed:

fd "instrument-record" --type f -e ts -e schema -e model | grep -E "(schema|model|interface|entity|type)"

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 144


🏁 Script executed:

cat apps/api/src/instrument-records/instrument-records.service.ts | head -100

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 3989


🏁 Script executed:

cat packages/schemas/src/instrument-records/instrument-records.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 2603


🏁 Script executed:

cat packages/schemas/src/core/core.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 2386


🏁 Script executed:

sed -n '400,450p' apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 1501


🏁 Script executed:

sed -n '440,480p' apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 1121


🏁 Script executed:

sed -n '60,120p' apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 2009


🏁 Script executed:

rg "queryRecordsRaw" apps/api/src/instrument-records/instrument-records.service.ts -A 5 -B 5

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 925


🏁 Script executed:

sed -n '330,380p' apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 1830


Use the date field instead of createdAt in the export projection.

The date field stores the actual measurement timestamp provided when creating records, while createdAt is the database creation timestamp. Using createdAt causes exports to reflect when records were entered into the system rather than when measurements were taken—problematic for retrospective data and downstream analyses.

Proposed fix
-          date: {
-            $dateToString: {
-              date: '$createdAt',
-              format: '%Y-%m-%d'
-            }
-          },
+          date: {
+            $dateToString: {
+              date: '$date',
+              format: '%Y-%m-%d'
+            }
+          },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
date: {
$dateToString: {
date: '$createdAt',
format: '%Y-%m-%d'
}
date: {
$dateToString: {
date: '$date',
format: '%Y-%m-%d'
}
},
🤖 Prompt for AI Agents
In `@apps/api/src/instrument-records/instrument-records.service.ts` around lines
445 - 449, The export projection is using the database creation timestamp field
'$createdAt' instead of the measurement timestamp field '$date', so change the
$dateToString entry in the export pipeline/projection to use 'date: {
$dateToString: { date: '$date', format: '%Y-%m-%d' } }' (i.e., replace
'$createdAt' with '$date') so exported rows reflect the measurement timestamp;
keep the output field name 'date' unchanged.

Comment on lines +8 to +31
function parseClinicalSubjectIdentificationData({
dateOfBirth,
firstName,
lastName,
sex
}: {
[key: string]: unknown;
}): ClinicalSubjectIdentificationData {
if (!(dateOfBirth instanceof Date)) {
throw new Error('Invalid or missing dateOfBirth: Expected a Date object.');
} else if (typeof firstName !== 'string') {
throw new Error('Invalid or missing firstName: Expected a string.');
} else if (typeof lastName !== 'string') {
throw new Error('Invalid or missing lastName: Expected a string.');
} else if (!(sex === 'MALE' || sex === 'FEMALE')) {
throw new Error('Invalid or missing sex: Expected "MALE" or "FEMALE".');
}
return {
dateOfBirth,
firstName,
lastName,
sex
};
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's understand the repository structure and locate relevant files
find packages/subject-utils -type f -name "*.ts" -o -name "*.js" | head -20

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 222


🏁 Script executed:

# Search for ClinicalSubjectIdentificationData type definition
rg "ClinicalSubjectIdentificationData" packages/subject-utils/ -t typescript

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 117


🏁 Script executed:

# Look for schema definitions, likely in the same directory or a related location
rg "schema\|zod\|yup" packages/subject-utils/ -t typescript -i

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 117


🏁 Script executed:

# Check the full context of the index.ts file
cat packages/subject-utils/src/index.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 2437


🏁 Script executed:

# Search for generateSubjectHash function
rg "generateSubjectHash" packages/subject-utils/ -t typescript -A 10

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 117


🏁 Script executed:

# The type is imported from '@opendatacapture/schemas/subject'
# Let's find this schema file
find . -type f -path "*schemas*subject*" | head -20

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 124


🏁 Script executed:

# Try broader search for the schemas package
find . -type f -name "*.ts" -path "*schemas*" | head -30

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 1526


🏁 Script executed:

# Look for ClinicalSubjectIdentificationData definition
rg "ClinicalSubjectIdentificationData" . -A 5 -B 2 2>/dev/null | head -50

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 3389


🏁 Script executed:

# Check the structure around the repository root
ls -la

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 2620


🏁 Script executed:

# Read the subject schema file to understand ClinicalSubjectIdentificationData definition
cat packages/schemas/src/subject/subject.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 1226


🏁 Script executed:

# Also check the test file to understand how the function is being used
cat packages/subject-utils/src/test.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 1207


🏁 Script executed:

# Find all usages of parseClinicalSubjectIdentificationData and generateSubjectHash
rg "parseClinicalSubjectIdentificationData|generateSubjectHash" . --no-heading -B 2 -A 2 2>/dev/null | head -100

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 6320


🏁 Script executed:

# Check if generateSubjectHash is called with schema-validated data or raw data
rg "generateSubjectHash\(" . --no-heading -B 5 -A 2 2>/dev/null | head -80

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 4458


🏁 Script executed:

# Check the IdentificationForm to see if it validates with schema before calling generateSubjectHash
cat apps/web/src/components/IdentificationForm/IdentificationForm.tsx | head -150

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 5509


🏁 Script executed:

# Check the StartSessionForm similarly
cat apps/web/src/components/StartSessionForm/StartSessionForm.tsx | head -150

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 5508


🏁 Script executed:

# Check demo.service.ts to see the data flow there
cat apps/api/src/demo/demo.service.ts | grep -A 20 -B 5 "generateSubjectHash"

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 2139


Align runtime validation with the schema definition.

The schema allows dateOfBirth: z.coerce.date() (accepts strings, numbers, or Date), but the implementation requires instanceof Date, which will reject coerced values. Additionally, firstName and lastName pass the typeof string check but do not verify the .min(1) constraint, allowing empty strings. Invalid Date instances (NaN timestamps) will cause toISOString() to throw at runtime.

Delegate validation to the schema:

Proposed fix
-function parseClinicalSubjectIdentificationData({
-  dateOfBirth,
-  firstName,
-  lastName,
-  sex
-}: {
-  [key: string]: unknown;
-}): ClinicalSubjectIdentificationData {
-  if (!(dateOfBirth instanceof Date)) {
-    throw new Error('Invalid or missing dateOfBirth: Expected a Date object.');
-  } else if (typeof firstName !== 'string') {
-    throw new Error('Invalid or missing firstName: Expected a string.');
-  } else if (typeof lastName !== 'string') {
-    throw new Error('Invalid or missing lastName: Expected a string.');
-  } else if (!(sex === 'MALE' || sex === 'FEMALE')) {
-    throw new Error('Invalid or missing sex: Expected "MALE" or "FEMALE".');
-  }
-  return {
-    dateOfBirth,
-    firstName,
-    lastName,
-    sex
-  };
-}
+function parseClinicalSubjectIdentificationData(data: unknown): ClinicalSubjectIdentificationData {
+  const result = $ClinicalSubjectIdentificationData.safeParse(data);
+  if (!result.success) {
+    throw new Error('Invalid clinical subject identification data.');
+  }
+  return result.data;
+}
🤖 Prompt for AI Agents
In `@packages/subject-utils/src/index.ts` around lines 8 - 31, The runtime checks
in parseClinicalSubjectIdentificationData are too strict and out of sync with
the Zod schema; update the function to delegate validation to the schema (use
the schema's parse/safeParse so z.coerce.date() can handle strings/numbers, the
.min(1) constraints on firstName/lastName are enforced, and invalid Date
instances are rejected by the schema), remove the manual typeof/instanceof
checks, and return the parsed value from the schema (or throw the schema error)
so callers get consistent, canonicalized data.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Fix all issues with AI agents
In `@apps/api/src/instrument-records/export-worker.ts`:
- Around line 40-47: The Map declaration for initData uses a key type of `string
| undefined`, which allows accidental lookups like
`instrumentsMap.get(undefined)`; change the key type to `string` for `initData:
Map<string, { edition: number; id: string; name: string }>` and then audit
usages of `initData`/`instrumentsMap` to ensure callers never pass
`undefined`—add guards or filter out undefined ids before calling `set`/`get`
where necessary (e.g., where ids are derived from possibly-absent values) so all
keys are guaranteed to be strings.
- Around line 126-135: The code assumes parentPort is always present and will
throw when imported on the main thread; add a null-check guard before attaching
the message handler so the file is safe to import outside a worker. Surround the
parentPort!.on(...) logic with a conditional that verifies parentPort !== null
(or return early if null) and only calls parentPort.on('message', ...) to
dispatch to handleChunkComplete and handleInit for ParentMessage types; keep the
existing default error for unexpected message types but ensure it only runs
inside the worker context.
- Around line 61-66: The function processRecord currently uses a non-null
assertion on instrumentsMap.get(record.instrumentId) which can throw if the key
is missing; update processRecord to guard by retrieving const instrument =
instrumentsMap.get(record.instrumentId); if (!instrument) return []; (or handle
missing instrument appropriately) before using it, remove the dead commented
line ("//const instrument = instrumentsMap.get(record.instrumentId)!;"), and
ensure downstream logic references the guarded instrument variable instead of
assuming it exists.

In `@apps/api/src/instrument-records/instrument-records.service.ts`:
- Around line 140-142: instrumentIds.values() returns an iterator, so calling
.map() on it fails; convert the iterator to an array before mapping. Replace the
expression that builds instrumentsArray to first create an array from
instrumentIds (e.g., Array.from(instrumentIds) or
Array.from(instrumentIds.values()) or [...instrumentIds]) then map over that
array calling this.instrumentsService.findById(id) (returning
Promise<ScalarInstrument>) and wrap with Promise.all to produce
instrumentsArray. Ensure you reference instrumentIds, instrumentsArray, and
instrumentsService.findById in the updated code.
- Around line 146-148: The code computes numWorkers and chunkSize from
records.length and will produce numWorkers = 0 and chunkSize = Infinity when
records.length === 0; add an early return at the start of the processing
function (before computing numWorkers/chunkSize/chunks) that returns an empty
array or appropriate empty result when records.length === 0, referencing the
existing variables records, numWorkers, and chunkSize so you avoid division by
zero and unnecessary work.
- Around line 418-509: queryRecordsRaw is applying authorization after fetching
data (using appAbility.can), which can load unauthorized records; fix by pushing
the ability constraints into the DB aggregation instead of post-filtering:
derive the same access-filter used by accessibleQuery (or build equivalent
filter from AppAbility) and merge it into the aggregation pipeline's $match
(before projection), or switch to a Prisma query that uses accessibleQuery and
then perform any necessary $lookup/$project only on the already-filtered ids;
target symbols: queryRecordsRaw, accessibleQuery,
instrumentRecordModel.aggregateRaw, appAbility.can and ensure the $match added
to the pipeline encodes the user's allowed conditions so unauthorized documents
are never returned from aggregateRaw.
♻️ Duplicate comments (4)
apps/api/src/instrument-records/instrument-records.service.ts (4)

154-156: Iterator.prototype.toArray() unavailable on Node.js 20 LTS.

Project targets lts/jod (Node.js 20), but toArray() was added in Node.js 22. This will crash at runtime.

Proposed fix
-    const availableInstrumentArray: InitData = instruments
-      .values()
-      .toArray()
-      .map((item) => {
+    const availableInstrumentArray: InitData = Array.from(instruments.values()).map((item) => {
         return {
           edition: item.internal.edition,
           id: item.id!,
           name: item.internal.name
         };
       });

170-182: Use once to prevent listener stacking and duplicate CHUNK_COMPLETE posts.

Using on instead of once means the outer handler remains registered and fires again on the chunk response, causing duplicate work and potential race conditions.

Proposed fix
-        worker.on('message', (message: InitialMessage) => {
-          if (message.success) {
-            worker.postMessage({ data: chunk, type: 'CHUNK_COMPLETE' } satisfies ChunkCompleteMessage);
-            worker.on('message', (message: WorkerMessage) => {
-              if (message.success) {
-                resolve(message.data);
-              } else {
-                reject(new Error(message.error));
-              }
-              void worker.terminate();
-            });
-          }
+        worker.once('message', (message: InitialMessage) => {
+          if (!message.success) {
+            reject(new Error('Worker init failed'));
+            void worker.terminate();
+            return;
+          }
+          worker.postMessage({ data: chunk, type: 'CHUNK_COMPLETE' } satisfies ChunkCompleteMessage);
+          worker.once('message', (message: WorkerMessage) => {
+            if (message.success) {
+              resolve(message.data);
+            } else {
+              reject(new Error(message.error));
+            }
+            void worker.terminate();
+          });
         });

440-448: $expr: {} is invalid when groupId is undefined.

An empty object in $expr can cause unexpected behavior or errors. Move the conditional to $match level.

Proposed fix
       {
-        $match: {
-          $expr: groupId
-            ? {
-                $eq: ['$groupId', { $toObjectId: groupId }]
-              }
-            : {}
-        }
+        $match: groupId
+          ? {
+              $expr: { $eq: ['$groupId', { $toObjectId: groupId }] }
+            }
+          : {}
       },

452-457: Using $createdAt instead of $date for the timestamp.

The date field stores the actual measurement timestamp, while createdAt is the database insertion time. For retrospective data, these differ significantly.

Proposed fix
           date: {
             $dateToString: {
-              date: '$createdAt',
+              date: '$date',
               format: '%Y-%m-%d'
             }
           },
🧹 Nitpick comments (1)
apps/api/src/instrument-records/export-worker.ts (1)

21-38: Redundant Object.values() on array and unsafe typing.

listEntry is already an array, so Object.values(listEntry) is redundant and confusing—just iterate directly. Also, any[] loses type safety.

Proposed fix
-function expandData(listEntry: any[]): ExpandDataType[] {
+function expandData(listEntry: Array<{ [key: string]: FormTypes.RecordArrayFieldValue }>): ExpandDataType[] {
   const validRecordArrayList: ExpandDataType[] = [];
   if (listEntry.length < 1) {
     throw new Error('Record Array is Empty');
   }
-  for (const objectEntry of Object.values(listEntry)) {
+  for (const objectEntry of listEntry) {
     for (const [dataKey, dataValue] of Object.entries(
       objectEntry as { [key: string]: FormTypes.RecordArrayFieldValue }
     )) {

Comment on lines +40 to +47
let initData: Map<
string | undefined,
{
edition: number;
id: string;
name: string;
}
>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Map key type should not include undefined.

string | undefined as the key type is suspicious—instrumentsMap.get(undefined) would be a bug. This should be string.

Proposed fix
 let initData: Map<
-  string | undefined,
+  string,
   {
     edition: number;
     id: string;
     name: string;
   }
 >;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let initData: Map<
string | undefined,
{
edition: number;
id: string;
name: string;
}
>;
let initData: Map<
string,
{
edition: number;
id: string;
name: string;
}
>;
🤖 Prompt for AI Agents
In `@apps/api/src/instrument-records/export-worker.ts` around lines 40 - 47, The
Map declaration for initData uses a key type of `string | undefined`, which
allows accidental lookups like `instrumentsMap.get(undefined)`; change the key
type to `string` for `initData: Map<string, { edition: number; id: string; name:
string }>` and then audit usages of `initData`/`instrumentsMap` to ensure
callers never pass `undefined`—add guards or filter out undefined ids before
calling `set`/`get` where necessary (e.g., where ids are derived from
possibly-absent values) so all keys are guaranteed to be strings.

Comment on lines +61 to +66
const processRecord = (record: RecordType): InstrumentRecordsExport => {
const instrument = instrumentsMap.get(record.instrumentId)!;

if (!record.computedMeasures) return [];

//const instrument = instrumentsMap.get(record.instrumentId)!;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Non-null assertion risks runtime crash; remove dead code.

If record.instrumentId isn't in the map, line 62 throws. Add a guard. Also, line 66 is commented-out code—remove it.

Proposed fix
   const processRecord = (record: RecordType): InstrumentRecordsExport => {
-    const instrument = instrumentsMap.get(record.instrumentId)!;
+    const instrument = instrumentsMap.get(record.instrumentId);
+    if (!instrument) {
+      throw new Error(`Unknown instrument ID: ${record.instrumentId}`);
+    }

     if (!record.computedMeasures) return [];

-    //const instrument = instrumentsMap.get(record.instrumentId)!;
     const rows: InstrumentRecordsExport = [];
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const processRecord = (record: RecordType): InstrumentRecordsExport => {
const instrument = instrumentsMap.get(record.instrumentId)!;
if (!record.computedMeasures) return [];
//const instrument = instrumentsMap.get(record.instrumentId)!;
const processRecord = (record: RecordType): InstrumentRecordsExport => {
const instrument = instrumentsMap.get(record.instrumentId);
if (!instrument) {
throw new Error(`Unknown instrument ID: ${record.instrumentId}`);
}
if (!record.computedMeasures) return [];
const rows: InstrumentRecordsExport = [];
🤖 Prompt for AI Agents
In `@apps/api/src/instrument-records/export-worker.ts` around lines 61 - 66, The
function processRecord currently uses a non-null assertion on
instrumentsMap.get(record.instrumentId) which can throw if the key is missing;
update processRecord to guard by retrieving const instrument =
instrumentsMap.get(record.instrumentId); if (!instrument) return []; (or handle
missing instrument appropriately) before using it, remove the dead commented
line ("//const instrument = instrumentsMap.get(record.instrumentId)!;"), and
ensure downstream logic references the guarded instrument variable instead of
assuming it exists.

Comment on lines +126 to +135
parentPort!.on('message', (message: ParentMessage) => {
switch (message.type) {
case 'CHUNK_COMPLETE':
return handleChunkComplete(message.data);
case 'INIT':
return handleInit(message.data);
default:
throw new Error(`Unexpected message type: ${(message satisfies never as { [key: string]: any }).type}`);
}
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

parentPort! assertion fails if file is run outside worker context.

If this file is accidentally imported in the main thread, parentPort is null and this throws. Add a guard for robustness.

Proposed fix
+if (!parentPort) {
+  throw new Error('This module must be run as a worker thread');
+}
+
 parentPort!.on('message', (message: ParentMessage) => {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
parentPort!.on('message', (message: ParentMessage) => {
switch (message.type) {
case 'CHUNK_COMPLETE':
return handleChunkComplete(message.data);
case 'INIT':
return handleInit(message.data);
default:
throw new Error(`Unexpected message type: ${(message satisfies never as { [key: string]: any }).type}`);
}
});
if (!parentPort) {
throw new Error('This module must be run as a worker thread');
}
parentPort!.on('message', (message: ParentMessage) => {
switch (message.type) {
case 'CHUNK_COMPLETE':
return handleChunkComplete(message.data);
case 'INIT':
return handleInit(message.data);
default:
throw new Error(`Unexpected message type: ${(message satisfies never as { [key: string]: any }).type}`);
}
});
🤖 Prompt for AI Agents
In `@apps/api/src/instrument-records/export-worker.ts` around lines 126 - 135, The
code assumes parentPort is always present and will throw when imported on the
main thread; add a null-check guard before attaching the message handler so the
file is safe to import outside a worker. Surround the parentPort!.on(...) logic
with a conditional that verifies parentPort !== null (or return early if null)
and only calls parentPort.on('message', ...) to dispatch to handleChunkComplete
and handleInit for ParentMessage types; keep the existing default error for
unexpected message types but ensure it only runs inside the worker context.

Comment on lines +140 to +142
const instrumentsArray = await Promise.all(
instrumentIds.values().map((id) => this.instrumentsService.findById(id) as Promise<ScalarInstrument>)
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Set.values().map() doesn't exist—iterators lack .map() in Node.js 20.

instrumentIds.values() returns an iterator, not an array. Use Array.from() to convert first.

Proposed fix
-    const instrumentsArray = await Promise.all(
-      instrumentIds.values().map((id) => this.instrumentsService.findById(id) as Promise<ScalarInstrument>)
-    );
+    const instrumentsArray = await Promise.all(
+      Array.from(instrumentIds).map((id) => this.instrumentsService.findById(id) as Promise<ScalarInstrument>)
+    );
🤖 Prompt for AI Agents
In `@apps/api/src/instrument-records/instrument-records.service.ts` around lines
140 - 142, instrumentIds.values() returns an iterator, so calling .map() on it
fails; convert the iterator to an array before mapping. Replace the expression
that builds instrumentsArray to first create an array from instrumentIds (e.g.,
Array.from(instrumentIds) or Array.from(instrumentIds.values()) or
[...instrumentIds]) then map over that array calling
this.instrumentsService.findById(id) (returning Promise<ScalarInstrument>) and
wrap with Promise.all to produce instrumentsArray. Ensure you reference
instrumentIds, instrumentsArray, and instrumentsService.findById in the updated
code.

Comment on lines +146 to +148
const numWorkers = Math.min(cpus().length, Math.ceil(records.length / 100)); // Use up to CPU count, chunk size 100
const chunkSize = Math.ceil(records.length / numWorkers);
const chunks = [];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Edge case: zero records results in division by zero chunks.

When records.length is 0, numWorkers becomes 0, and chunkSize becomes Infinity. The loop won't execute, but returning early is cleaner.

Proposed fix
+    if (records.length === 0) {
+      return [];
+    }
+
     const numWorkers = Math.min(cpus().length, Math.ceil(records.length / 100));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const numWorkers = Math.min(cpus().length, Math.ceil(records.length / 100)); // Use up to CPU count, chunk size 100
const chunkSize = Math.ceil(records.length / numWorkers);
const chunks = [];
if (records.length === 0) {
return [];
}
const numWorkers = Math.min(cpus().length, Math.ceil(records.length / 100)); // Use up to CPU count, chunk size 100
const chunkSize = Math.ceil(records.length / numWorkers);
const chunks = [];
🤖 Prompt for AI Agents
In `@apps/api/src/instrument-records/instrument-records.service.ts` around lines
146 - 148, The code computes numWorkers and chunkSize from records.length and
will produce numWorkers = 0 and chunkSize = Infinity when records.length === 0;
add an early return at the start of the processing function (before computing
numWorkers/chunkSize/chunks) that returns an empty array or appropriate empty
result when records.length === 0, referencing the existing variables records,
numWorkers, and chunkSize so you avoid division by zero and unnecessary work.

Comment on lines +418 to +509
private async queryRecordsRaw(appAbility: AppAbility, groupId?: string) {
const pipeline = [
{
// Join with Session collection
$lookup: {
as: 'session',
foreignField: '_id',
from: 'SessionModel',
localField: 'sessionId' // Ensure this matches your @map or field name in Prisma
}
},
{ $unwind: { path: '$session', preserveNullAndEmptyArrays: true } },
{
// Join with Subject collection
$lookup: {
as: 'subject',
foreignField: '_id',
from: 'SubjectModel',
localField: 'subjectId'
}
},
{ $unwind: { path: '$subject', preserveNullAndEmptyArrays: true } },
{
$match: {
$expr: groupId
? {
$eq: ['$groupId', { $toObjectId: groupId }]
}
: {}
}
},
{
$project: {
computedMeasures: 1,
date: {
$dateToString: {
date: '$createdAt',
format: '%Y-%m-%d'
}
},
groupId: {
$toString: '$groupId'
},
id: {
$toString: '$_id'
},
instrumentId: 1,
session: {
date: {
$dateToString: {
date: '$session.date',
format: '%Y-%m-%d'
}
},
id: {
$toString: '$session._id'
},
type: '$session.type',
user: { username: '$session.user.username' } // TBD test this works
},
// sessionId: 1,
subject: {
age: {
$dateDiff: {
endDate: '$$NOW',
startDate: '$subject.dateOfBirth',
unit: 'year'
}
},
dateOfBirth: '$subject.dateOfBirth',
groupIds: '$subject.groupIds', // TBD make sure groupIds is string array
id: {
$toString: '$subject._id'
},
sex: '$subject.sex'
}
}
}
];

const records = (await this.instrumentRecordModel.aggregateRaw({ pipeline })) as unknown as unknown[];

/**
* We need to create a shallow copy of all records here, as the aggregateRaw method returns objects
* with the Symbol(nodejs.util.inspect.custom) property defined, which is not serializable.
*/

// eslint-disable-next-line @typescript-eslint/no-unsafe-argument
return (records.filter((record) => appAbility.can('read', record as any)) as RecordType[]).map((record) => ({
...record
}));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n "accessibleQuery" apps/api/src/instrument-records/instrument-records.service.ts -B2 -A2

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 1550


🏁 Script executed:

# Also check for other methods using appAbility in this file
rg -n "appAbility" apps/api/src/instrument-records/instrument-records.service.ts -B1 -A3

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 486


🏁 Script executed:

# Check the overall structure of the service
head -100 apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 3899


Post-query filtering bypasses database-level access control.

Unlike other methods in this service (e.g., count(), findById(), findMany()), which use accessibleQuery() to enforce access control at the Prisma query level, queryRecordsRaw() fetches records matching only the groupId filter and applies ability checks afterward via appAbility.can(). This fetches potentially unauthorized records into memory before filtering. Consider pushing ability checks into the aggregation pipeline or documenting if this approach is intentional.

🤖 Prompt for AI Agents
In `@apps/api/src/instrument-records/instrument-records.service.ts` around lines
418 - 509, queryRecordsRaw is applying authorization after fetching data (using
appAbility.can), which can load unauthorized records; fix by pushing the ability
constraints into the DB aggregation instead of post-filtering: derive the same
access-filter used by accessibleQuery (or build equivalent filter from
AppAbility) and merge it into the aggregation pipeline's $match (before
projection), or switch to a Prisma query that uses accessibleQuery and then
perform any necessary $lookup/$project only on the already-filtered ids; target
symbols: queryRecordsRaw, accessibleQuery, instrumentRecordModel.aggregateRaw,
appAbility.can and ensure the $match added to the pipeline encodes the user's
allowed conditions so unauthorized documents are never returned from
aggregateRaw.

@joshunrau joshunrau merged commit da7d638 into main Jan 21, 2026
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants