threading for instrument records export by joshunrau · Pull Request #1261 · DouglasNeuroInformatics/OpenDataCapture

joshunrau · 2026-01-21T17:54:58Z

Summary by CodeRabbit

Improvements
- Parallelized export processing for faster exports and clearer error propagation.
- Export rows now include richer, consistent fields (session details, timestamps, masked subject IDs, usernames).
- Stronger validation of subject identification data to reduce malformed exports.
Refactor
- Export processing moved to a worker-based pipeline for chunked, concurrent handling.
Tests
- Added unit tests for export behavior, error cases, and output shape.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…parate

…er with imports

…convert groupid to string

Add threading

coderabbitai · 2026-01-21T17:55:23Z

Caution

Review failed

The pull request is closed.

Walkthrough

Adds a worker-based, multi-threaded export pipeline for instrument records, refactors exportRecords to use workers and a raw-query helper, introduces inter-worker message types, adds unit tests for InstrumentRecordsService, and tightens clinical subject input validation.

Changes

Cohort / File(s)	Summary
Export worker & types `apps/api/src/instrument-records/export-worker.ts`, `apps/api/src/instrument-records/thread-types.ts`	New worker implementation and TypeScript message/type definitions for INIT / BEGIN_CHUNK_PROCESSING messaging, record-to-export-row transformation, array-measure expansion, and error propagation.
Service refactor & tests `apps/api/src/instrument-records/instrument-records.service.ts`, `apps/api/src/instrument-records/__tests__/instrument-records.service.spec.ts`	`exportRecords` refactored to spawn Worker threads, prefetch instruments, chunk records, require explicit ability param, add `queryRecordsRaw` helper, remove in-service expandData, and add unit tests covering findById and export behavior.
Subject validation util `packages/subject-utils/src/index.ts`	Adds internal runtime validator `parseClinicalSubjectIdentificationData` and uses it in subject hash generation to enforce dateOfBirth/firstName/lastName/sex types.

Sequence Diagram

sequenceDiagram
    participant Service as InstrumentRecordsService
    participant Worker as Worker Thread
    participant DB as Database
    participant Instruments as Instruments Service

    Service->>DB: queryRecordsRaw(appAbility, groupId?)
    DB-->>Service: records[]
    Service->>Instruments: fetch instruments by ids (bulk)
    Instruments-->>Service: instrumentsMap
    Service->>Worker: INIT (availableInstrumentArray)
    Worker-->>Service: InitialMessage { success: true }
    Service->>Worker: BEGIN_CHUNK_PROCESSING (records chunk)
    Worker->>Worker: expand array measures, map to export rows
    alt expansion error
        Worker-->>Service: WorkerMessage { success: false, error }
    else success
        Worker-->>Service: WorkerMessage { success: true, data: exportRows[] }
    end
    Service->>Service: aggregate chunk results
    Service-->>Client: final exportRows[]

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

PR#1244: Overlaps export pipeline and exported row shape changes (username and export field updates).
PR#1146: Related work on array-measure expansion; that PR added in-service expansion which this PR moves to workers.
PR#1253: Touches findById service/controller behavior that the new unit tests cover.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: implementing multithreading for the instrument records export functionality.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In
`@apps/api/src/instrument-records/__tests__/instrument-records.service.spec.ts`:
- Around line 81-112: The mockRecords in instrument-records.service.spec.ts uses
Date objects for fields like createdAt, date, session.date, subject.dateOfBirth,
and updatedAt while queryRecordsRaw returns serialized date strings; change
those Date instances to ISO date strings (e.g., "2023-01-01T00:00:00.000Z") for
createdAt, date, session.date, subject.dateOfBirth, updatedAt (and any other
date fields) so the mock matches the export shape expected by
expect.any(String); update the mockRecords declaration (and its type assertion
if needed) to accept strings (cast to the same (InstrumentRecord & {...})[] or
to unknown then to that type) so TypeScript still compiles.

In `@apps/api/src/instrument-records/instrument-records.service.ts`:
- Around line 183-193: The INIT message listener (worker.on('message', (message:
InitialMessage) => { ... })) is also matching chunk responses because both
include success, causing duplicate CHUNK_COMPLETE posts; change the INIT handler
to either use worker.once('message', ...) or explicitly gate on message.type ===
'INIT' (or similar discriminant) before posting CHUNK_COMPLETE, and keep the
chunk response handling separate (the inner worker.on('message', (message:
WorkerMessage) => { ... })) so resolve/reject and worker.terminate() only run
for chunk responses.
- Around line 454-460: The $expr currently gets an empty object when groupId is
undefined which is invalid; update the aggregation so the conditional is applied
at the $match level instead of inside $expr: in the method that builds the
pipeline (referencing groupId and the $match stage in
instrument-records.service.ts), make $match either include a filter that
compares groupId to {$toObjectId: groupId} (when groupId is present) or be
omitted/empty (when groupId is absent) rather than passing {} into $expr; ensure
you remove the conditional from $expr and construct $match accordingly so the
pipeline receives a valid expression only when groupId is provided.
- Around line 167-169: The current use of Iterator.prototype.toArray() on
instruments.values() (creating availableInstrumentArray of type InitData) will
crash on Node.js 20 LTS; replace the toArray() call with Array.from(...) to
materialize the iterator (e.g., build availableInstrumentArray =
Array.from(instruments.values()) while preserving the InitData typing) so it
works on Node.js 20.

coderabbitai · 2026-01-21T18:10:32Z

+    const availableInstrumentArray: InitData = instruments
+      .values()
+      .toArray()


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Check the Node.js version to understand what LTS versions are current cat /etc/os-release 2>/dev/null || echo "OS info unavailable" # Try to find information about Node.js version in the project find . -type f $ -name ".nvmrc" -o -name ".node-version" -o -name "package.json" $ | head -5

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 223

🏁 Script executed:

# Read the actual file to see the context at lines 167-169 head -180 apps/api/src/instrument-records/instrument-records.service.ts | tail -20

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 747

🌐 Web query:

Iterator.prototype.toArray() Node.js support LTS availability 2024 2025

💡 Result:

Short answer: Iterator.prototype.toArray() is available in Node.js starting with v22 (V8 12.4) and therefore in Node.js 22.x LTS (entered LTS October 2024). Sources: MDN and the Node.js v22 release notes. [1][2]

Details / dates:

Node.js 22.0.0 (added V8 12.4 which includes iterator helpers / toArray): released Apr 24, 2024. [2]

Node.js 22 was promoted to LTS in October 2024, so toArray is available on Node.js LTS starting Oct 2024 (and in all 22.x releases thereafter). [2][3]

References:
[1] MDN — Iterator.prototype.toArray (compatibility shows Node.js 22.0.0+).
[2] Node.js v22.0.0 release notes (Apr 24, 2024) — V8 12.4 / iterator helpers.
[3] Node.js documentation/blog noting Node.js 22 enters LTS in October 2024.

🏁 Script executed:

# Check what Node.js version the project targets cat .nvmrc

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 90

🏁 Script executed:

# Also check package.json for engines specification cat apps/api/package.json | grep -A 5 '"engines"'

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 65

🏁 Script executed:

# Let me verify the actual lines in context to see the full picture cat -n apps/api/src/instrument-records/instrument-records.service.ts | sed -n '160,180p'

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 976

🏁 Script executed:

# Search for the actual toArray() call in the file rg -n "toArray" apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 103

🏁 Script executed:

# Also search for availableInstrumentArray to find where it's defined rg -n "availableInstrumentArray" apps/api/src/instrument-records/instrument-records.service.ts -A 3

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 404

🏁 Script executed:

# Verify Node.js 20 (Jod) LTS does NOT have Iterator.prototype.toArray() # Let's check the Node.js documentation for version 20 python3 << 'EOF' # Node.js 20 (Jod) was released April 2023, entered LTS October 2023 # Iterator.prototype.toArray() was added in Node.js 22 (April 2024) print("Node.js 20 LTS (Jod) - released April 2023, LTS October 2023") print("Iterator.prototype.toArray() - added in Node.js 22.0.0 (April 2024)") print("\nConclusion: Node.js 20 LTS does NOT support Iterator.prototype.toArray()") EOF

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 169

Avoid Iterator.prototype.toArray() to prevent runtime crashes on Node.js 20 LTS.

The project targets Node.js 20 LTS (lts/jod), which does not support Iterator helpers. The toArray() method was added in Node.js 22.0.0 (April 2024). Use Array.from() instead.

Lines 147-149:

♻️ Proposed fix

const availableInstrumentArray: InitData = instruments - .values() - .toArray() - .map((item) => { + = Array.from(instruments.values()).map((item) => { return { edition: item.internal.edition, id: item.id!, name: item.internal.name }; });

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const availableInstrumentArray: InitData = instruments

.values()

.toArray()

const availableInstrumentArray: InitData = Array.from(instruments.values()).map((item) => {

return {

edition: item.internal.edition,

id: item.id!,

name: item.internal.name

};

});

🤖 Prompt for AI Agents

In `@apps/api/src/instrument-records/instrument-records.service.ts` around lines 167 - 169, The current use of Iterator.prototype.toArray() on instruments.values() (creating availableInstrumentArray of type InitData) will crash on Node.js 20 LTS; replace the toArray() call with Array.from(...) to materialize the iterator (e.g., build availableInstrumentArray = Array.from(instruments.values()) while preserving the InitData typing) so it works on Node.js 20.

coderabbitai · 2026-01-21T18:10:32Z

+        worker.on('message', (message: InitialMessage) => {
+          if (message.success) {
+            worker.postMessage({ data: chunk, type: 'CHUNK_COMPLETE' });
+            worker.on('message', (message: WorkerMessage) => {
+              if (message.success) {
+                resolve(message.data);
+              } else {
+                reject(new Error(message.error));
+              }
+              void worker.terminate();
+            });


⚠️ Potential issue | 🟠 Major

INIT listener also handles chunk responses, causing extra CHUNK_COMPLETE posts.

Both responses include success, so the INIT handler will fire on the chunk response and resend work. Use once or differentiate message shapes.

🛠️ Proposed fix

- worker.on('message', (message: InitialMessage) => { - if (message.success) { - worker.postMessage({ data: chunk, type: 'CHUNK_COMPLETE' }); - worker.on('message', (message: WorkerMessage) => { - if (message.success) { - resolve(message.data); - } else { - reject(new Error(message.error)); - } - void worker.terminate(); - }); - } - }); + worker.once('message', (message: InitialMessage) => { + if (!message.success) { + reject(new Error('Worker init failed')); + void worker.terminate(); + return; + } + worker.once('message', (message: WorkerMessage) => { + if (message.success) resolve(message.data); + else reject(new Error(message.error)); + void worker.terminate(); + }); + worker.postMessage({ data: chunk, type: 'CHUNK_COMPLETE' }); + });

🤖 Prompt for AI Agents

In `@apps/api/src/instrument-records/instrument-records.service.ts` around lines 183 - 193, The INIT message listener (worker.on('message', (message: InitialMessage) => { ... })) is also matching chunk responses because both include success, causing duplicate CHUNK_COMPLETE posts; change the INIT handler to either use worker.once('message', ...) or explicitly gate on message.type === 'INIT' (or similar discriminant) before posting CHUNK_COMPLETE, and keep the chunk response handling separate (the inner worker.on('message', (message: WorkerMessage) => { ... })) so resolve/reject and worker.terminate() only run for chunk responses.

coderabbitai · 2026-01-21T18:10:32Z

+        $match: {
+          $expr: groupId
+            ? {
+                $eq: ['$groupId', { $toObjectId: groupId }]
+              }
+            : {}
+        }


⚠️ Potential issue | 🔴 Critical

$expr receives {} when groupId is undefined.

This is likely an invalid expression and can break exports without a group filter. Move the conditional to $match instead.

🛠️ Proposed fix

- { - $match: { - $expr: groupId - ? { - $eq: ['$groupId', { $toObjectId: groupId }] - } - : {} - } - }, + { + $match: groupId + ? { + $expr: { $eq: ['$groupId', { $toObjectId: groupId }] } + } + : {} + },

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

$match: {

$expr: groupId

? {

$eq: ['$groupId', { $toObjectId: groupId }]

}

: {}

}

{

$match: groupId

? {

$expr: { $eq: ['$groupId', { $toObjectId: groupId }] }

}

: {}

},

🤖 Prompt for AI Agents

In `@apps/api/src/instrument-records/instrument-records.service.ts` around lines 454 - 460, The $expr currently gets an empty object when groupId is undefined which is invalid; update the aggregation so the conditional is applied at the $match level instead of inside $expr: in the method that builds the pipeline (referencing groupId and the $match stage in instrument-records.service.ts), make $match either include a filter that compares groupId to {$toObjectId: groupId} (when groupId is present) or be omitted/empty (when groupId is absent) rather than passing {} into $expr; ensure you remove the conditional from $expr and construct $match accordingly so the pipeline receives a valid expression only when groupId is provided.

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In
`@apps/api/src/instrument-records/__tests__/instrument-records.service.spec.ts`:
- Around line 81-107: The mockRecords entry uses Date objects for record.date
and session.date but RecordType and the export worker expect ISO strings; update
mockRecords (the object in the test) so that the properties "date" and
"session.date" are ISO strings (e.g., via toISOString or literal ISO strings) to
match the RecordType contract and ensure the export worker produces a string
"timestamp" as asserted in the test.

In `@apps/api/src/instrument-records/instrument-records.service.ts`:
- Around line 160-174: The Worker message handlers currently use
worker.on('message') causing duplicate listeners; change both the outer handler
that listens for the INIT response (the callback receiving InitialMessage) and
the inner handler that listens for CHUNK_COMPLETE (the callback receiving
WorkerMessage) to use once('message') so each handler runs only once, and ensure
the outer INIT handler checks message.success === false and immediately rejects
with message.error and terminates the Worker (calling worker.terminate()) to
avoid cascading failures; keep the existing worker.postMessage({ data:
availableInstrumentArray, type: 'INIT' }) and worker.postMessage({ data: chunk,
type: 'CHUNK_COMPLETE' }) calls but only send CHUNK_COMPLETE from the
INIT-success path once.
- Around line 445-449: The export projection is using the database creation
timestamp field '$createdAt' instead of the measurement timestamp field '$date',
so change the $dateToString entry in the export pipeline/projection to use
'date: { $dateToString: { date: '$date', format: '%Y-%m-%d' } }' (i.e., replace
'$createdAt' with '$date') so exported rows reflect the measurement timestamp;
keep the output field name 'date' unchanged.

In `@packages/subject-utils/src/index.ts`:
- Around line 8-31: The runtime checks in parseClinicalSubjectIdentificationData
are too strict and out of sync with the Zod schema; update the function to
delegate validation to the schema (use the schema's parse/safeParse so
z.coerce.date() can handle strings/numbers, the .min(1) constraints on
firstName/lastName are enforced, and invalid Date instances are rejected by the
schema), remove the manual typeof/instanceof checks, and return the parsed value
from the schema (or throw the schema error) so callers get consistent,
canonicalized data.

coderabbitai · 2026-01-21T18:20:27Z

+        const worker = new Worker(join(import.meta.dirname, 'export-worker.ts'));
+        worker.postMessage({ data: availableInstrumentArray, type: 'INIT' });
+
+        worker.on('message', (message: InitialMessage) => {
+          if (message.success) {
+            worker.postMessage({ data: chunk, type: 'CHUNK_COMPLETE' });
+            worker.on('message', (message: WorkerMessage) => {
+              if (message.success) {
+                resolve(message.data);
+              } else {
+                reject(new Error(message.error));
+              }
+              void worker.terminate();
+            });
+          }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

wc -l apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 222

🏁 Script executed:

sed -n '150,190p' apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 1323

🏁 Script executed:

sed -n '1,50p' apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 2291

🏁 Script executed:

find . -name "thread-types.ts" -o -name "thread-types.d.ts"

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 132

🏁 Script executed:

cat apps/api/src/instrument-records/thread-types.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 1102

Use once for Worker message handlers to prevent duplicate listener stacking.

The outer on('message') handler remains registered after the INIT response arrives. When the CHUNK_COMPLETE response is received later, both the original outer handler and the newly registered inner handler fire. The outer handler then incorrectly re-sends CHUNK_COMPLETE and registers another listener, causing duplicate work.

Replace on with once for both message handlers. Additionally, add error handling for failed INIT messages to avoid cascading failures.

Proposed fix

- worker.on('message', (message: InitialMessage) => { - if (message.success) { - worker.postMessage({ data: chunk, type: 'CHUNK_COMPLETE' }); - worker.on('message', (message: WorkerMessage) => { - if (message.success) { - resolve(message.data); - } else { - reject(new Error(message.error)); - } - void worker.terminate(); - }); - } - }); + worker.once('message', (message: InitialMessage) => { + if (!message.success) { + reject(new Error('Worker init failed')); + void worker.terminate(); + return; + } + worker.postMessage({ data: chunk, type: 'CHUNK_COMPLETE' }); + worker.once('message', (message: WorkerMessage) => { + if (message.success) { + resolve(message.data); + } else { + reject(new Error(message.error)); + } + void worker.terminate(); + }); + });

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const worker = new Worker(join(import.meta.dirname, 'export-worker.ts'));

worker.postMessage({ data: availableInstrumentArray, type: 'INIT' });

worker.on('message', (message: InitialMessage) => {

if (message.success) {

worker.postMessage({ data: chunk, type: 'CHUNK_COMPLETE' });

worker.on('message', (message: WorkerMessage) => {

if (message.success) {

resolve(message.data);

} else {

reject(new Error(message.error));

}

void worker.terminate();

});

}

const worker = new Worker(join(import.meta.dirname, 'export-worker.ts'));

worker.postMessage({ data: availableInstrumentArray, type: 'INIT' });

worker.once('message', (message: InitialMessage) => {

if (!message.success) {

reject(new Error('Worker init failed'));

void worker.terminate();

return;

}

worker.postMessage({ data: chunk, type: 'CHUNK_COMPLETE' });

worker.once('message', (message: WorkerMessage) => {

if (message.success) {

resolve(message.data);

} else {

reject(new Error(message.error));

}

void worker.terminate();

});

});

🤖 Prompt for AI Agents

In `@apps/api/src/instrument-records/instrument-records.service.ts` around lines 160 - 174, The Worker message handlers currently use worker.on('message') causing duplicate listeners; change both the outer handler that listens for the INIT response (the callback receiving InitialMessage) and the inner handler that listens for CHUNK_COMPLETE (the callback receiving WorkerMessage) to use once('message') so each handler runs only once, and ensure the outer INIT handler checks message.success === false and immediately rejects with message.error and terminates the Worker (calling worker.terminate()) to avoid cascading failures; keep the existing worker.postMessage({ data: availableInstrumentArray, type: 'INIT' }) and worker.postMessage({ data: chunk, type: 'CHUNK_COMPLETE' }) calls but only send CHUNK_COMPLETE from the INIT-success path once.

coderabbitai · 2026-01-21T18:20:27Z

+          date: {
+            $dateToString: {
+              date: '$createdAt',
+              format: '%Y-%m-%d'
+            }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd "instrument-records.service.ts" --type f

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 144

🏁 Script executed:

wc -l apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 148

🏁 Script executed:

sed -n '430,465p' apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 904

🏁 Script executed:

fd "instrument-record" --type f -e ts -e schema -e model | grep -E "(schema|model|interface|entity|type)"

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 144

🏁 Script executed:

cat apps/api/src/instrument-records/instrument-records.service.ts | head -100

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 3989

🏁 Script executed:

cat packages/schemas/src/instrument-records/instrument-records.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 2603

🏁 Script executed:

cat packages/schemas/src/core/core.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 2386

🏁 Script executed:

sed -n '400,450p' apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 1501

🏁 Script executed:

sed -n '440,480p' apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 1121

🏁 Script executed:

sed -n '60,120p' apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 2009

🏁 Script executed:

rg "queryRecordsRaw" apps/api/src/instrument-records/instrument-records.service.ts -A 5 -B 5

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 925

🏁 Script executed:

sed -n '330,380p' apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 1830

Use the date field instead of createdAt in the export projection.

The date field stores the actual measurement timestamp provided when creating records, while createdAt is the database creation timestamp. Using createdAt causes exports to reflect when records were entered into the system rather than when measurements were taken—problematic for retrospective data and downstream analyses.

Proposed fix

- date: { - $dateToString: { - date: '$createdAt', - format: '%Y-%m-%d' - } - }, + date: { + $dateToString: { + date: '$date', + format: '%Y-%m-%d' + } + },

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

date: {

$dateToString: {

date: '$createdAt',

format: '%Y-%m-%d'

}

date: {

$dateToString: {

date: '$date',

format: '%Y-%m-%d'

}

},

🤖 Prompt for AI Agents

In `@apps/api/src/instrument-records/instrument-records.service.ts` around lines 445 - 449, The export projection is using the database creation timestamp field '$createdAt' instead of the measurement timestamp field '$date', so change the $dateToString entry in the export pipeline/projection to use 'date: { $dateToString: { date: '$date', format: '%Y-%m-%d' } }' (i.e., replace '$createdAt' with '$date') so exported rows reflect the measurement timestamp; keep the output field name 'date' unchanged.

coderabbitai · 2026-01-21T18:20:27Z

+function parseClinicalSubjectIdentificationData({
+  dateOfBirth,
+  firstName,
+  lastName,
+  sex
+}: {
+  [key: string]: unknown;
+}): ClinicalSubjectIdentificationData {
+  if (!(dateOfBirth instanceof Date)) {
+    throw new Error('Invalid or missing dateOfBirth: Expected a Date object.');
+  } else if (typeof firstName !== 'string') {
+    throw new Error('Invalid or missing firstName: Expected a string.');
+  } else if (typeof lastName !== 'string') {
+    throw new Error('Invalid or missing lastName: Expected a string.');
+  } else if (!(sex === 'MALE' || sex === 'FEMALE')) {
+    throw new Error('Invalid or missing sex: Expected "MALE" or "FEMALE".');
+  }
+  return {
+    dateOfBirth,
+    firstName,
+    lastName,
+    sex
+  };
+}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's understand the repository structure and locate relevant files find packages/subject-utils -type f -name "*.ts" -o -name "*.js" | head -20

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 222

🏁 Script executed:

# Search for ClinicalSubjectIdentificationData type definition rg "ClinicalSubjectIdentificationData" packages/subject-utils/ -t typescript

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 117

🏁 Script executed:

# Look for schema definitions, likely in the same directory or a related location rg "schema\|zod\|yup" packages/subject-utils/ -t typescript -i

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 117

🏁 Script executed:

# Check the full context of the index.ts file cat packages/subject-utils/src/index.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 2437

🏁 Script executed:

# Search for generateSubjectHash function rg "generateSubjectHash" packages/subject-utils/ -t typescript -A 10

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 117

🏁 Script executed:

# The type is imported from '@opendatacapture/schemas/subject' # Let's find this schema file find . -type f -path "*schemas*subject*" | head -20

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 124

🏁 Script executed:

# Try broader search for the schemas package find . -type f -name "*.ts" -path "*schemas*" | head -30

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 1526

🏁 Script executed:

# Look for ClinicalSubjectIdentificationData definition rg "ClinicalSubjectIdentificationData" . -A 5 -B 2 2>/dev/null | head -50

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 3389

🏁 Script executed:

# Check the structure around the repository root ls -la

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 2620

🏁 Script executed:

# Read the subject schema file to understand ClinicalSubjectIdentificationData definition cat packages/schemas/src/subject/subject.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 1226

🏁 Script executed:

# Also check the test file to understand how the function is being used cat packages/subject-utils/src/test.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 1207

🏁 Script executed:

# Find all usages of parseClinicalSubjectIdentificationData and generateSubjectHash rg "parseClinicalSubjectIdentificationData|generateSubjectHash" . --no-heading -B 2 -A 2 2>/dev/null | head -100

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 6320

🏁 Script executed:

# Check if generateSubjectHash is called with schema-validated data or raw data rg "generateSubjectHash\(" . --no-heading -B 5 -A 2 2>/dev/null | head -80

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 4458

🏁 Script executed:

# Check the IdentificationForm to see if it validates with schema before calling generateSubjectHash cat apps/web/src/components/IdentificationForm/IdentificationForm.tsx | head -150

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 5509

🏁 Script executed:

# Check the StartSessionForm similarly cat apps/web/src/components/StartSessionForm/StartSessionForm.tsx | head -150

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 5508

🏁 Script executed:

# Check demo.service.ts to see the data flow there cat apps/api/src/demo/demo.service.ts | grep -A 20 -B 5 "generateSubjectHash"

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 2139

Align runtime validation with the schema definition.

The schema allows dateOfBirth: z.coerce.date() (accepts strings, numbers, or Date), but the implementation requires instanceof Date, which will reject coerced values. Additionally, firstName and lastName pass the typeof string check but do not verify the .min(1) constraint, allowing empty strings. Invalid Date instances (NaN timestamps) will cause toISOString() to throw at runtime.

Delegate validation to the schema:

Proposed fix

-function parseClinicalSubjectIdentificationData({ - dateOfBirth, - firstName, - lastName, - sex -}: { - [key: string]: unknown; -}): ClinicalSubjectIdentificationData { - if (!(dateOfBirth instanceof Date)) { - throw new Error('Invalid or missing dateOfBirth: Expected a Date object.'); - } else if (typeof firstName !== 'string') { - throw new Error('Invalid or missing firstName: Expected a string.'); - } else if (typeof lastName !== 'string') { - throw new Error('Invalid or missing lastName: Expected a string.'); - } else if (!(sex === 'MALE' || sex === 'FEMALE')) { - throw new Error('Invalid or missing sex: Expected "MALE" or "FEMALE".'); - } - return { - dateOfBirth, - firstName, - lastName, - sex - }; -} +function parseClinicalSubjectIdentificationData(data: unknown): ClinicalSubjectIdentificationData { + const result = $ClinicalSubjectIdentificationData.safeParse(data); + if (!result.success) { + throw new Error('Invalid clinical subject identification data.'); + } + return result.data; +}

🤖 Prompt for AI Agents

In `@packages/subject-utils/src/index.ts` around lines 8 - 31, The runtime checks in parseClinicalSubjectIdentificationData are too strict and out of sync with the Zod schema; update the function to delegate validation to the schema (use the schema's parse/safeParse so z.coerce.date() can handle strings/numbers, the .min(1) constraints on firstName/lastName are enforced, and invalid Date instances are rejected by the schema), remove the manual typeof/instanceof checks, and return the parsed value from the schema (or throw the schema error) so callers get consistent, canonicalized data.

coderabbitai

Actionable comments posted: 6

🤖 Fix all issues with AI agents

In `@apps/api/src/instrument-records/export-worker.ts`:
- Around line 40-47: The Map declaration for initData uses a key type of `string
| undefined`, which allows accidental lookups like
`instrumentsMap.get(undefined)`; change the key type to `string` for `initData:
Map<string, { edition: number; id: string; name: string }>` and then audit
usages of `initData`/`instrumentsMap` to ensure callers never pass
`undefined`—add guards or filter out undefined ids before calling `set`/`get`
where necessary (e.g., where ids are derived from possibly-absent values) so all
keys are guaranteed to be strings.
- Around line 126-135: The code assumes parentPort is always present and will
throw when imported on the main thread; add a null-check guard before attaching
the message handler so the file is safe to import outside a worker. Surround the
parentPort!.on(...) logic with a conditional that verifies parentPort !== null
(or return early if null) and only calls parentPort.on('message', ...) to
dispatch to handleChunkComplete and handleInit for ParentMessage types; keep the
existing default error for unexpected message types but ensure it only runs
inside the worker context.
- Around line 61-66: The function processRecord currently uses a non-null
assertion on instrumentsMap.get(record.instrumentId) which can throw if the key
is missing; update processRecord to guard by retrieving const instrument =
instrumentsMap.get(record.instrumentId); if (!instrument) return []; (or handle
missing instrument appropriately) before using it, remove the dead commented
line ("//const instrument = instrumentsMap.get(record.instrumentId)!;"), and
ensure downstream logic references the guarded instrument variable instead of
assuming it exists.

In `@apps/api/src/instrument-records/instrument-records.service.ts`:
- Around line 140-142: instrumentIds.values() returns an iterator, so calling
.map() on it fails; convert the iterator to an array before mapping. Replace the
expression that builds instrumentsArray to first create an array from
instrumentIds (e.g., Array.from(instrumentIds) or
Array.from(instrumentIds.values()) or [...instrumentIds]) then map over that
array calling this.instrumentsService.findById(id) (returning
Promise<ScalarInstrument>) and wrap with Promise.all to produce
instrumentsArray. Ensure you reference instrumentIds, instrumentsArray, and
instrumentsService.findById in the updated code.
- Around line 146-148: The code computes numWorkers and chunkSize from
records.length and will produce numWorkers = 0 and chunkSize = Infinity when
records.length === 0; add an early return at the start of the processing
function (before computing numWorkers/chunkSize/chunks) that returns an empty
array or appropriate empty result when records.length === 0, referencing the
existing variables records, numWorkers, and chunkSize so you avoid division by
zero and unnecessary work.
- Around line 418-509: queryRecordsRaw is applying authorization after fetching
data (using appAbility.can), which can load unauthorized records; fix by pushing
the ability constraints into the DB aggregation instead of post-filtering:
derive the same access-filter used by accessibleQuery (or build equivalent
filter from AppAbility) and merge it into the aggregation pipeline's $match
(before projection), or switch to a Prisma query that uses accessibleQuery and
then perform any necessary $lookup/$project only on the already-filtered ids;
target symbols: queryRecordsRaw, accessibleQuery,
instrumentRecordModel.aggregateRaw, appAbility.can and ensure the $match added
to the pipeline encodes the user's allowed conditions so unauthorized documents
are never returned from aggregateRaw.

♻️ Duplicate comments (4)

apps/api/src/instrument-records/instrument-records.service.ts (4)

154-156: Iterator.prototype.toArray() unavailable on Node.js 20 LTS.

Project targets lts/jod (Node.js 20), but toArray() was added in Node.js 22. This will crash at runtime.

Proposed fix

-    const availableInstrumentArray: InitData = instruments
-      .values()
-      .toArray()
-      .map((item) => {
+    const availableInstrumentArray: InitData = Array.from(instruments.values()).map((item) => {
         return {
           edition: item.internal.edition,
           id: item.id!,
           name: item.internal.name
         };
       });

170-182: Use once to prevent listener stacking and duplicate CHUNK_COMPLETE posts.

Using on instead of once means the outer handler remains registered and fires again on the chunk response, causing duplicate work and potential race conditions.

Proposed fix

-        worker.on('message', (message: InitialMessage) => {
-          if (message.success) {
-            worker.postMessage({ data: chunk, type: 'CHUNK_COMPLETE' } satisfies ChunkCompleteMessage);
-            worker.on('message', (message: WorkerMessage) => {
-              if (message.success) {
-                resolve(message.data);
-              } else {
-                reject(new Error(message.error));
-              }
-              void worker.terminate();
-            });
-          }
+        worker.once('message', (message: InitialMessage) => {
+          if (!message.success) {
+            reject(new Error('Worker init failed'));
+            void worker.terminate();
+            return;
+          }
+          worker.postMessage({ data: chunk, type: 'CHUNK_COMPLETE' } satisfies ChunkCompleteMessage);
+          worker.once('message', (message: WorkerMessage) => {
+            if (message.success) {
+              resolve(message.data);
+            } else {
+              reject(new Error(message.error));
+            }
+            void worker.terminate();
+          });
         });

440-448: $expr: {} is invalid when groupId is undefined.

An empty object in $expr can cause unexpected behavior or errors. Move the conditional to $match level.

Proposed fix

       {
-        $match: {
-          $expr: groupId
-            ? {
-                $eq: ['$groupId', { $toObjectId: groupId }]
-              }
-            : {}
-        }
+        $match: groupId
+          ? {
+              $expr: { $eq: ['$groupId', { $toObjectId: groupId }] }
+            }
+          : {}
       },

452-457: Using $createdAt instead of $date for the timestamp.

The date field stores the actual measurement timestamp, while createdAt is the database insertion time. For retrospective data, these differ significantly.

Proposed fix

           date: {
             $dateToString: {
-              date: '$createdAt',
+              date: '$date',
               format: '%Y-%m-%d'
             }
           },

🧹 Nitpick comments (1)

apps/api/src/instrument-records/export-worker.ts (1)
21-38: Redundant Object.values() on array and unsafe typing.

listEntry is already an array, so Object.values(listEntry) is redundant and confusing—just iterate directly. Also, any[] loses type safety.
Proposed fix
-function expandData(listEntry: any[]): ExpandDataType[] {
+function expandData(listEntry: Array<{ [key: string]: FormTypes.RecordArrayFieldValue }>): ExpandDataType[] {
   const validRecordArrayList: ExpandDataType[] = [];
   if (listEntry.length < 1) {
     throw new Error('Record Array is Empty');
   }
-  for (const objectEntry of Object.values(listEntry)) {
+  for (const objectEntry of listEntry) {
     for (const [dataKey, dataValue] of Object.entries(
       objectEntry as { [key: string]: FormTypes.RecordArrayFieldValue }
     )) {

coderabbitai · 2026-01-21T18:32:45Z

+let initData: Map<
+  string | undefined,
+  {
+    edition: number;
+    id: string;
+    name: string;
+  }
+>;


⚠️ Potential issue | 🟡 Minor

Map key type should not include undefined.

string | undefined as the key type is suspicious—instrumentsMap.get(undefined) would be a bug. This should be string.

Proposed fix

let initData: Map< - string | undefined, + string, { edition: number; id: string; name: string; } >;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let initData: Map<

string | undefined,

{

edition: number;

id: string;

name: string;

}

>;

let initData: Map<

string,

{

edition: number;

id: string;

name: string;

}

>;

🤖 Prompt for AI Agents

In `@apps/api/src/instrument-records/export-worker.ts` around lines 40 - 47, The Map declaration for initData uses a key type of `string | undefined`, which allows accidental lookups like `instrumentsMap.get(undefined)`; change the key type to `string` for `initData: Map<string, { edition: number; id: string; name: string }>` and then audit usages of `initData`/`instrumentsMap` to ensure callers never pass `undefined`—add guards or filter out undefined ids before calling `set`/`get` where necessary (e.g., where ids are derived from possibly-absent values) so all keys are guaranteed to be strings.

coderabbitai · 2026-01-21T18:32:45Z

+  const processRecord = (record: RecordType): InstrumentRecordsExport => {
+    const instrument = instrumentsMap.get(record.instrumentId)!;
+
+    if (!record.computedMeasures) return [];
+
+    //const instrument = instrumentsMap.get(record.instrumentId)!;


⚠️ Potential issue | 🟠 Major

Non-null assertion risks runtime crash; remove dead code.

If record.instrumentId isn't in the map, line 62 throws. Add a guard. Also, line 66 is commented-out code—remove it.

Proposed fix

const processRecord = (record: RecordType): InstrumentRecordsExport => { - const instrument = instrumentsMap.get(record.instrumentId)!; + const instrument = instrumentsMap.get(record.instrumentId); + if (!instrument) { + throw new Error(`Unknown instrument ID: ${record.instrumentId}`); + } if (!record.computedMeasures) return []; - //const instrument = instrumentsMap.get(record.instrumentId)!; const rows: InstrumentRecordsExport = [];

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const processRecord = (record: RecordType): InstrumentRecordsExport => {

const instrument = instrumentsMap.get(record.instrumentId)!;

if (!record.computedMeasures) return [];

//const instrument = instrumentsMap.get(record.instrumentId)!;

const processRecord = (record: RecordType): InstrumentRecordsExport => {

const instrument = instrumentsMap.get(record.instrumentId);

if (!instrument) {

throw new Error(`Unknown instrument ID: ${record.instrumentId}`);

}

if (!record.computedMeasures) return [];

const rows: InstrumentRecordsExport = [];

🤖 Prompt for AI Agents

In `@apps/api/src/instrument-records/export-worker.ts` around lines 61 - 66, The function processRecord currently uses a non-null assertion on instrumentsMap.get(record.instrumentId) which can throw if the key is missing; update processRecord to guard by retrieving const instrument = instrumentsMap.get(record.instrumentId); if (!instrument) return []; (or handle missing instrument appropriately) before using it, remove the dead commented line ("//const instrument = instrumentsMap.get(record.instrumentId)!;"), and ensure downstream logic references the guarded instrument variable instead of assuming it exists.

coderabbitai · 2026-01-21T18:32:45Z

+parentPort!.on('message', (message: ParentMessage) => {
+  switch (message.type) {
+    case 'CHUNK_COMPLETE':
+      return handleChunkComplete(message.data);
+    case 'INIT':
+      return handleInit(message.data);
+    default:
+      throw new Error(`Unexpected message type: ${(message satisfies never as { [key: string]: any }).type}`);
+  }
+});


⚠️ Potential issue | 🟡 Minor

parentPort! assertion fails if file is run outside worker context.

If this file is accidentally imported in the main thread, parentPort is null and this throws. Add a guard for robustness.

Proposed fix

+if (!parentPort) { + throw new Error('This module must be run as a worker thread'); +} + parentPort!.on('message', (message: ParentMessage) => {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

parentPort!.on('message', (message: ParentMessage) => {

switch (message.type) {

case 'CHUNK_COMPLETE':

return handleChunkComplete(message.data);

case 'INIT':

return handleInit(message.data);

default:

throw new Error(`Unexpected message type: ${(message satisfies never as { [key: string]: any }).type}`);

}

});

if (!parentPort) {

throw new Error('This module must be run as a worker thread');

}

parentPort!.on('message', (message: ParentMessage) => {

switch (message.type) {

case 'CHUNK_COMPLETE':

return handleChunkComplete(message.data);

case 'INIT':

return handleInit(message.data);

default:

throw new Error(`Unexpected message type: ${(message satisfies never as { [key: string]: any }).type}`);

}

});

🤖 Prompt for AI Agents

In `@apps/api/src/instrument-records/export-worker.ts` around lines 126 - 135, The code assumes parentPort is always present and will throw when imported on the main thread; add a null-check guard before attaching the message handler so the file is safe to import outside a worker. Surround the parentPort!.on(...) logic with a conditional that verifies parentPort !== null (or return early if null) and only calls parentPort.on('message', ...) to dispatch to handleChunkComplete and handleInit for ParentMessage types; keep the existing default error for unexpected message types but ensure it only runs inside the worker context.

coderabbitai · 2026-01-21T18:32:45Z

+    const instrumentsArray = await Promise.all(
+      instrumentIds.values().map((id) => this.instrumentsService.findById(id) as Promise<ScalarInstrument>)
+    );


⚠️ Potential issue | 🔴 Critical

Set.values().map() doesn't exist—iterators lack .map() in Node.js 20.

instrumentIds.values() returns an iterator, not an array. Use Array.from() to convert first.

Proposed fix

- const instrumentsArray = await Promise.all( - instrumentIds.values().map((id) => this.instrumentsService.findById(id) as Promise<ScalarInstrument>) - ); + const instrumentsArray = await Promise.all( + Array.from(instrumentIds).map((id) => this.instrumentsService.findById(id) as Promise<ScalarInstrument>) + );

🤖 Prompt for AI Agents

In `@apps/api/src/instrument-records/instrument-records.service.ts` around lines 140 - 142, instrumentIds.values() returns an iterator, so calling .map() on it fails; convert the iterator to an array before mapping. Replace the expression that builds instrumentsArray to first create an array from instrumentIds (e.g., Array.from(instrumentIds) or Array.from(instrumentIds.values()) or [...instrumentIds]) then map over that array calling this.instrumentsService.findById(id) (returning Promise<ScalarInstrument>) and wrap with Promise.all to produce instrumentsArray. Ensure you reference instrumentIds, instrumentsArray, and instrumentsService.findById in the updated code.

coderabbitai · 2026-01-21T18:32:45Z

+    const numWorkers = Math.min(cpus().length, Math.ceil(records.length / 100)); // Use up to CPU count, chunk size 100
+    const chunkSize = Math.ceil(records.length / numWorkers);
+    const chunks = [];


⚠️ Potential issue | 🟡 Minor

Edge case: zero records results in division by zero chunks.

When records.length is 0, numWorkers becomes 0, and chunkSize becomes Infinity. The loop won't execute, but returning early is cleaner.

Proposed fix

+ if (records.length === 0) { + return []; + } + const numWorkers = Math.min(cpus().length, Math.ceil(records.length / 100));

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const numWorkers = Math.min(cpus().length, Math.ceil(records.length / 100)); // Use up to CPU count, chunk size 100

const chunkSize = Math.ceil(records.length / numWorkers);

const chunks = [];

if (records.length === 0) {

return [];

}

const numWorkers = Math.min(cpus().length, Math.ceil(records.length / 100)); // Use up to CPU count, chunk size 100

const chunkSize = Math.ceil(records.length / numWorkers);

const chunks = [];

🤖 Prompt for AI Agents

In `@apps/api/src/instrument-records/instrument-records.service.ts` around lines 146 - 148, The code computes numWorkers and chunkSize from records.length and will produce numWorkers = 0 and chunkSize = Infinity when records.length === 0; add an early return at the start of the processing function (before computing numWorkers/chunkSize/chunks) that returns an empty array or appropriate empty result when records.length === 0, referencing the existing variables records, numWorkers, and chunkSize so you avoid division by zero and unnecessary work.

coderabbitai · 2026-01-21T18:32:46Z

+  private async queryRecordsRaw(appAbility: AppAbility, groupId?: string) {
+    const pipeline = [
+      {
+        // Join with Session collection
+        $lookup: {
+          as: 'session',
+          foreignField: '_id',
+          from: 'SessionModel',
+          localField: 'sessionId' // Ensure this matches your @map or field name in Prisma
+        }
+      },
+      { $unwind: { path: '$session', preserveNullAndEmptyArrays: true } },
+      {
+        // Join with Subject collection
+        $lookup: {
+          as: 'subject',
+          foreignField: '_id',
+          from: 'SubjectModel',
+          localField: 'subjectId'
+        }
+      },
+      { $unwind: { path: '$subject', preserveNullAndEmptyArrays: true } },
+      {
+        $match: {
+          $expr: groupId
+            ? {
+                $eq: ['$groupId', { $toObjectId: groupId }]
+              }
+            : {}
+        }
+      },
+      {
+        $project: {
+          computedMeasures: 1,
+          date: {
+            $dateToString: {
+              date: '$createdAt',
+              format: '%Y-%m-%d'
+            }
+          },
+          groupId: {
+            $toString: '$groupId'
+          },
+          id: {
+            $toString: '$_id'
+          },
+          instrumentId: 1,
+          session: {
+            date: {
+              $dateToString: {
+                date: '$session.date',
+                format: '%Y-%m-%d'
+              }
+            },
+            id: {
+              $toString: '$session._id'
+            },
+            type: '$session.type',
+            user: { username: '$session.user.username' } // TBD test this works
+          },
+          // sessionId: 1,
+          subject: {
+            age: {
+              $dateDiff: {
+                endDate: '$$NOW',
+                startDate: '$subject.dateOfBirth',
+                unit: 'year'
+              }
+            },
+            dateOfBirth: '$subject.dateOfBirth',
+            groupIds: '$subject.groupIds', // TBD make sure groupIds is string array
+            id: {
+              $toString: '$subject._id'
+            },
+            sex: '$subject.sex'
+          }
+        }
+      }
+    ];
+
+    const records = (await this.instrumentRecordModel.aggregateRaw({ pipeline })) as unknown as unknown[];
+
+    /**
+     * We need to create a shallow copy of all records here, as the aggregateRaw method returns objects
+     * with the Symbol(nodejs.util.inspect.custom) property defined, which is not serializable.
+     */
+
+    // eslint-disable-next-line @typescript-eslint/no-unsafe-argument
+    return (records.filter((record) => appAbility.can('read', record as any)) as RecordType[]).map((record) => ({
+      ...record
+    }));
+  }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n "accessibleQuery" apps/api/src/instrument-records/instrument-records.service.ts -B2 -A2

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 1550

🏁 Script executed:

# Also check for other methods using appAbility in this file rg -n "appAbility" apps/api/src/instrument-records/instrument-records.service.ts -B1 -A3

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 486

🏁 Script executed:

# Check the overall structure of the service head -100 apps/api/src/instrument-records/instrument-records.service.ts

Repository: DouglasNeuroInformatics/OpenDataCapture

Length of output: 3899

Post-query filtering bypasses database-level access control.

Unlike other methods in this service (e.g., count(), findById(), findMany()), which use accessibleQuery() to enforce access control at the Prisma query level, queryRecordsRaw() fetches records matching only the groupId filter and applies ability checks afterward via appAbility.can(). This fetches potentially unauthorized records into memory before filtering. Consider pushing ability checks into the aggregation pipeline or documenting if this approach is intentional.

🤖 Prompt for AI Agents

In `@apps/api/src/instrument-records/instrument-records.service.ts` around lines 418 - 509, queryRecordsRaw is applying authorization after fetching data (using appAbility.can), which can load unauthorized records; fix by pushing the ability constraints into the DB aggregation instead of post-filtering: derive the same access-filter used by accessibleQuery (or build equivalent filter from AppAbility) and merge it into the aggregation pipeline's $match (before projection), or switch to a Prisma query that uses accessibleQuery and then perform any necessary $lookup/$project only on the already-filtered ids; target symbols: queryRecordsRaw, accessibleQuery, instrumentRecordModel.aggregateRaw, appAbility.can and ensure the $match added to the pipeline encodes the user's allowed conditions so unauthorized documents are never returned from aggregateRaw.

david-roper added 30 commits January 13, 2026 11:26

feat: cache instrument and there ids concurrently ahead of for loop

0eb7aa1

chore: test concurrency code

f06098f

feat: testing cline worker thread implementation

67f98fa

feat: make types that are exported in thread and parent code

320772f

feat: adjust threading code to have inital process and chunk processing

dbf1c93

feat: adjust functions to have chunk processing and initiation map se…

5d45e64

…parate

test: add instrument service text for export records method

653ca4d

test: adjust test to satisfy types

a60c467

fix: replace zod parse with custom function to prevent errors in work…

41d614e

…er with imports

feat: add a type validation to expandData not using the zod schema

b2f445c

feat: add more code for initial thread method

362ce47

feat: add parent port message to initial thread function

86fed21

feat: make init data and chuck processing be done on the same thread

bd044fb

chore: comment unused code

2d83043

feat: adjust thread object types to be clonable

eb9214d

feat: add a convert records map to make records transferrable

41c2ab7

feat: adjust rows objects

34f53a3

chore: remove commented out code

4d764bf

chore: remove unused imports

e83cec4

chore: adjust number type

b9cab19

chore: clean up database query and commented code

b0922ba

chore: add todo comment

e23bd8f

feat: initial raw query and some cloning issues

06ee538

feat: adjust group id in export record method

c56fff5

feat: add permission to the raw query

83b4837

feat: add test changes

8b8c254

feat: adjust type to include groupId from record

8023457

feat: add match groupId clause, add filter with users accessibility, …

0755164

…convert groupid to string

adjust mock for ability method

7f3bcb3

test: adjust the test file to mock aggregate raw command

631f8d6

david-roper and others added 5 commits January 21, 2026 11:54

chore: clean up unused commented code

788bad9

Merge branch 'threading-and-raw-query' into add-threading

4bb1edf

Merge pull request #1257 from david-roper/add-threading

b152a3d

Add threading

remove json parse stringify of export records

e34d6c5

fix serialization issue with shallow copy to remove symbol

1907c9a

joshunrau added 4 commits January 21, 2026 12:56

remove commented imports

a93fc5e

replace dirname logic with import.meta.dirname

326e030

use native set for instrument ids

6ca5713

move types into thread-types

e21e527

coderabbitai Bot reviewed Jan 21, 2026

View reviewed changes

add some type safety

9dc85b9

coderabbitai Bot reviewed Jan 21, 2026

View reviewed changes

joshunrau added 2 commits January 21, 2026 13:34

rename ChunkComplete to BeginChunkProcessing

bf2afe8

fix test

0816f9a

joshunrau merged commit da7d638 into main Jan 21, 2026
4 of 6 checks passed

coderabbitai Bot mentioned this pull request Jan 28, 2026

use msgpackr to encode exports as optimized buffer rather than massive JSON #1269

Merged

Conversation

joshunrau commented Jan 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joshunrau commented Jan 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jan 21, 2026 •

edited

Loading