🛡️ Sentinel: Fix NoSQL Injection and Auth Bypass#48
🛡️ Sentinel: Fix NoSQL Injection and Auth Bypass#48daggerstuff wants to merge 4 commits intostagingfrom
Conversation
- Re-enabled proper authorization middleware in documents routes. - Sanitized query parameters (status, type, category) to prevent NoSQL Injection. - Fixed Vitest configuration to correctly resolve path aliases and setup files. - Documented security learnings in .jules/sentinel.md. Co-authored-by: daggerstuff <261005129+daggerstuff@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Reviewer's guide (collapsed on small PRs)Reviewer's GuideReinstates real authorization checks on document routes and sanitizes query parameters used in MongoDB filters to prevent NoSQL injection, along with minor test config/path fixes and adding a security notes file. Sequence diagram for document creation with restored authorizationsequenceDiagram
actor User
participant AstroApp
participant ExpressServer
participant DocumentsRouter
participant AuthMiddleware as requirePermissions
participant DocumentService
participant MongoDB
User->>AstroApp: POST /documents
AstroApp->>ExpressServer: Forward request
ExpressServer->>DocumentsRouter: Route POST /
DocumentsRouter->>AuthMiddleware: requirePermissions([edit])
AuthMiddleware-->>DocumentsRouter: Authorization result
alt User has edit permission
DocumentsRouter->>DocumentService: createDocument(title, type, category, content, description, userId)
DocumentService->>MongoDB: insertOne(document)
MongoDB-->>DocumentService: insertResult
DocumentService-->>DocumentsRouter: createdDocument
DocumentsRouter-->>ExpressServer: 201 Created
ExpressServer-->>AstroApp: Response
AstroApp-->>User: Response body
else Missing or invalid permission
AuthMiddleware-->>DocumentsRouter: Permission denied
DocumentsRouter-->>ExpressServer: 403 Forbidden
ExpressServer-->>AstroApp: Error response
AstroApp-->>User: Permission error
end
Class diagram for routes and auth middleware after security fixclassDiagram
class AuthMiddleware {
+requirePermissions(permissions)
+requireRoles(roles)
}
class DocumentsRouter {
+postRoot(req, res)
+getRoot(req, res)
+putDocument(req, res)
+ensureString(param) string
}
class ProjectsRouter {
+getRoot(req, res)
}
class StrategicPlansRouter {
+getRoot(req, res)
}
class DocumentService {
+createDocument(title, type, category, content, description, userId)
+listDocuments(userId, page, limit, status, type, category, search)
+updateDocument(documentId, title, content, status, description)
}
class ProjectsService {
+listProjects(userId, page, limit, category, status)
}
class StrategicPlansService {
+listStrategicPlans(userId, page, limit, status)
}
AuthMiddleware <.. DocumentsRouter : uses
DocumentsRouter ..> DocumentService : calls
ProjectsRouter ..> ProjectsService : calls
StrategicPlansRouter ..> StrategicPlansService : calls
class SanitizedQueryParams {
+page number
+limit number
+status string
+type string
+category string
}
DocumentsRouter ..> SanitizedQueryParams : constructs
ProjectsRouter ..> SanitizedQueryParams : constructs
StrategicPlansRouter ..> SanitizedQueryParams : constructs
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
📝 WalkthroughWalkthroughThis pull request restructures CodeQL security analysis by decomposing monolithic query files into granular, single-concern custom queries for EHR/FHIR compliance. It also integrates proper authentication and role-based access control middleware into API routes, adds defensive query parameter validation for type safety, and documents recently identified security vulnerabilities. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- Now that
requirePermissionsis wired back in and aliased asrequirePermission, double-check that its expected argument type (array vs single string) is consistent across all route files to avoid subtle auth misconfigurations. - The string/Type checks for query parameters in
documents.ts,projects.ts, andstrategic-plans.tsare currently duplicated; consider extracting a small shared helper (or middleware) to normalize and sanitize query params in one place to keep future changes consistent.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Now that `requirePermissions` is wired back in and aliased as `requirePermission`, double-check that its expected argument type (array vs single string) is consistent across all route files to avoid subtle auth misconfigurations.
- The string/Type checks for query parameters in `documents.ts`, `projects.ts`, and `strategic-plans.ts` are currently duplicated; consider extracting a small shared helper (or middleware) to normalize and sanitize query params in one place to keep future changes consistent.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
- Re-enabled proper authorization middleware in documents routes. - Sanitized query parameters (status, type, category) to prevent NoSQL Injection. - Fixed Vitest configuration to correctly resolve path aliases and setup files. - Resolved CodeQL CI failures by simplifying qlpack.yml and splitting monolithic queries. - Updated custom CodeQL queries to use modern TaintTracking::Global API and PathGraph. - Scoped CodeQL validation predicates to their enclosing functions. - Corrected multi-language CodeQL configuration to avoid language mismatch errors. - Removed deprecated setup-python-dependencies from CodeQL workflow. - Documented security learnings in .jules/sentinel.md. Co-authored-by: daggerstuff <261005129+daggerstuff@users.noreply.github.com>
- Re-enabled proper authorization middleware in documents routes. - Sanitized query parameters (status, type, category) to prevent NoSQL Injection. - Fixed Vitest configuration to correctly resolve path aliases and setup files. - Resolved CodeQL CI failures by simplifying qlpack.yml and splitting monolithic queries. - Updated custom CodeQL queries to use modern TaintTracking::Global API and PathGraph. - Scoped CodeQL validation predicates to their enclosing functions. - Corrected multi-language CodeQL configuration to avoid language mismatch errors. - Synchronized pnpm version across GitHub Actions and package.json. - Removed deprecated setup-python-dependencies from CodeQL workflow. - Documented security learnings in .jules/sentinel.md. Co-authored-by: daggerstuff <261005129+daggerstuff@users.noreply.github.com>
- Re-enabled proper authorization middleware in documents routes. - Sanitized query parameters (status, type, category) to prevent NoSQL Injection. - Fixed Vitest configuration to correctly resolve path aliases and setup files. - Resolved CodeQL CI failures by splitting monolithic queries and modernizing TaintTracking API usage. - Fixed pnpm version mismatch in GitHub Actions. - Documented security learnings in .jules/sentinel.md. Co-authored-by: daggerstuff <261005129+daggerstuff@users.noreply.github.com>
| router.post( | ||
| '/', | ||
| requirePermission('edit'), | ||
| requirePermission(['edit']), |
There was a problem hiding this comment.
Bug: The requirePermissions middleware expects a user.permissions array, but the auth middleware only provides user.role, causing all document creation/update requests to fail with a 403 error.
Severity: CRITICAL
Suggested Fix
Modify the authentication middleware in src/lib/auth/auth0-middleware.ts to populate a user.permissions array based on the user's role. This can be done by looking up the permissions associated with the user.role from the access control configuration and attaching them to the user object before calling next().
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/api/routes/documents.ts#L38
Potential issue: The `requirePermissions` middleware, used in document routes, expects a
`permissions` array on the `user` object. However, the authentication middleware in
`auth0-middleware.ts` only attaches a `user.role` string, not a `permissions` array.
Consequently, `user.permissions` is undefined and defaults to an empty array. The
permission check `requiredPermissions.every(p => [].includes(p))` will always fail,
causing the middleware to return a 403 Forbidden error for all users. This effectively
blocks all document creation and update operations.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Actionable comments posted: 12
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/codeql/custom-queries/ehr-insecure-auth.ql:
- Around line 16-24: The isAuthenticationMethod(CallExpr call) predicate
currently uses substring matching and must be replaced with semantic resolution:
resolve the call target (e.g., call.getTarget()/getDecl() or
getCallee().getQualifiedName()) and match against known authentication/MFA
callables by qualified name or library-modeled functions instead of
name.matches("%auth%"). Remove the same-function MFA heuristic and instead add
dataflow/state checks that verify server-side MFA completion before sensitive
access—track session/token mutations (e.g., session.mfaComplete,
session["mfaComplete"], or JWT claims like amr contains "mfa") via taint/flow
predicates and require those explicit state checks or validated token claims are
true before allowing protected resource access. Update predicates that reference
isAuthenticationMethod and the MFA check to use the resolved callables and the
new session/token state predicates so detection relies on signature resolution
and explicit server-side MFA state rather than substring heuristics.
In @.github/codeql/custom-queries/ehr-insecure-config.ql:
- Around line 32-35: The isSink predicate is too broad because
call.getCalleeName().matches("%log%") matches any name containing "log" (e.g.,
"login"); replace it with explicit checks for real logger method names by
updating the DataFlow::CallNode condition in isSink to compare
call.getCalleeName() against a whitelist (e.g., "log", "info", "warn", "error",
"debug", "trace", "fatal") using equality or ORed equality expressions so only
actual logging methods are treated as sinks.
In @.github/codeql/custom-queries/ehr-missing-audit-log.ql:
- Around line 28-39: The current hasLogging predicate only checks for any
logCall in the enclosing function, which can falsely mark an ehrOp as audited;
change the predicate to take the specific ehrOp (rename to
hasLoggingForOp(CallExpr ehrOp)) and require the logCall to be syntactically or
dataflow-associated with that ehrOp (e.g., the logCall must be in the same basic
block or occur after ehrOp and/or one of its arguments or receiver identifiers
appears in the logCall arguments); use CallExpr methods
(getEnclosingBlock/getLocation/getArgument/getReceiver) or a simple dataflow
slice to tie logCall to ehrOp instead of just matching on enclosing function so
each ehrOp is verified to have an operation-specific audit call.
In @.github/codeql/custom-queries/ehr-unencrypted-data.ql:
- Around line 42-49: The query currently does not require that the EHR data node
`data` actually flows into the transmission `call`; update the where-clause to
require `DataFlow::localFlow(data, DataFlow::exprNode(call.getAnArgument()))` so
the EHR data is traced into the call's argument, and also ensure the
`encryptCall` check verifies the encryption result reaches the same sink by
requiring `DataFlow::localFlow(DataFlow::exprNode(encryptCall.getAnArgument()),
DataFlow::exprNode(call.getAnArgument()))`; keep the existing predicates
(`isDataTransmissionCall(call)`, `isEHRData(data)`,
`encryptCall.getCalleeName().matches("%encrypt%")`) but add these explicit
data-flow constraints so only transmissions that actually receive unencrypted
EHR data are flagged.
In @.github/codeql/custom-queries/ehr-unsafe-access.ql:
- Around line 26-31: The sink matcher currently only checks
call.getCalleeName(), which misses member calls like axios.get()/axios.post();
update the condition to also detect calls whose callee is a
PropertyAccess/MemberAccess whose qualifier is the identifier "axios" (e.g.
check call.getCallee() instanceof PropertyAccess/MemberAccess and
call.getCallee().getQualifier().getName() = "axios"), so DataFlow::CallNode
flows through both direct function names (request/fetch/axios-like names) and
member calls on the axios object (axios.get/axios.post/etc.).
- Around line 44-55: The current predicate uses
call.getAnArgument().getStringValue() which only matches string literals and
thus misses template/concatenated/variable URLs (e.g.,
`${provider.baseUrl}/${resourceType}` in src/lib/ehr/services/fhir.client.ts);
change the match to also handle non-literal expressions by inspecting the call
argument expression (use call.getAnArgument().getExpr() or the CodeQL value/expr
API) and either (a) recursively resolve concatenations/template expressions and
variables to their string components to apply the url.matches(...) checks, or
(b) use CodeQL dataflow/taint predicates (DataFlow or StringFlow) to detect
values that may flow into the call argument and match those flow sources against
the EHR URL patterns; update the code that references
call.getAnArgument().getStringValue() to use this expression/flow-based check so
dynamically constructed EHR endpoints are detected.
In @.github/codeql/custom-queries/fhir-insecure-ops.ql:
- Around line 16-35: The current predicates isFHIROperation and
hasSecurityContext are too coarse; update isFHIROperation to match FHIR
operations by exact callee names or qualified member accesses (e.g., require
call.getCalleeName() == "batch" or inspect
call.getCallee().asMethod().getDeclaringType().getName() and exact method names
like "transaction", "history", "delete", "patch") instead of substring .matches,
and change hasSecurityContext to require the security CallExpr (securityCall) to
dominate or precede the target CallExpr in control flow (e.g., use cfg/dominance
or check that securityCall is on a dominating basic block or is a guarding
conditional/try that encloses the call) rather than merely existing in the same
enclosing function; reference the predicates isFHIROperation(CallExpr call) and
hasSecurityContext(CallExpr call) and the local variable securityCall when
implementing these stricter checks.
In @.github/codeql/custom-queries/fhir-insecure-search.ql:
- Around line 27-39: The hasInputSanitization predicate is too coarse: it only
checks for any sanitize/escape/validate CallExpr in the same function rather
than verifying the sanitization actually flows into the FHIR search arguments;
update hasInputSanitization to perform a dataflow/taint check from the searchOp
argument expressions to the sanitizeCall (or vice versa) using JavaScript CodeQL
idioms (e.g., convert expressions to DataFlow::Node with .flow() or use
DataFlow::LocalFlow/getASuccessor* style steps or a taint-tracking barrier
pattern similar to ehr-unsafe-access.ql) so that only sanitizeCall instances
that are on the actual data path into isFHIRSearch(searchOp) are considered
sanitization for that call.
In @.github/codeql/custom-queries/fhir-missing-version.ql:
- Around line 16-33: isFHIRClientInit uses overly broad name patterns and
hasVersionCheck only checks for any version-related call in the same function,
causing false positives/negatives; update isFHIRClientInit to narrow matching to
FHIR-specific identifiers (e.g., names containing "FHIR" or "Fhir" or explicit
factory names like "createFhirClient"/"createFHIRClient") and update
hasVersionCheck to require the version-related call to operate on the same
client instance: derive the client target (the variable/receiver produced by the
CallExpr in isFHIRClientInit or its assigned variable) and require that
versionCall’s receiver/target (or the object passed into it) matches that same
client symbol or dataflow origin. Modify the predicates isFHIRClientInit and
hasVersionCheck to compare the client symbol/receiver rather than only checking
enclosingFunction scope so the version check is tied to the specific initialized
client.
In @.github/codeql/custom-queries/fhir-unvalidated-access.ql:
- Around line 20-27: The current callee name matching in the query (the
name.matches(...) list including "%read%" and "%search%") is too permissive and
causes false positives; update the predicate that builds the allowed callee
names (the clauses referencing name.matches("%getResource%") etc.) to use exact
name equality (e.g., name = "getResource", name = "searchResource", name =
"createResource", name = "updateResource", name = "read", name = "vread", name =
"search") instead of substring matches, and apply the same tightening to the
hasValidation predicate by replacing broad patterns like "%validate%",
"%check%", "%verify%" with exact name equality checks for the specific
validation function names you intend to detect so the query matches only precise
callees and aligns with the declared high precision.
- Around line 31-37: The hasValidation predicate is too coarse: it treats any
validate/check/verify CallExpr in the same function as sufficient; update it to
ensure the validateCall actually validates the same argument and occurs before
the access. Specifically, for the predicate hasValidation(CallExpr call) compare
source locations (e.g., call.getLocation().getStartLine() and
validateCall.getLocation().getStartLine()) to require validateCall is before
call, and correlate arguments by extracting the validated value (use
call.getAnArgument() and validateCall.getAnArgument() or compare the same
variable/expr) so only validations that target the same argument suppress the
finding.
In @.jules/sentinel.md:
- Around line 1-2: The Markdown heading "## 2025-05-22 - [Broken Access Control
and NoSQL Injection in API Routes]" in .jules/sentinel.md needs a blank line
after it to satisfy MD022; insert a single empty line immediately following that
heading so the next paragraph or list is separated from the heading.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9d7609e2-e120-4d1a-9cc5-d2fbdee5852a
📒 Files selected for processing (19)
.github/codeql/codeql-config.yml.github/codeql/custom-queries/ehr-insecure-auth.ql.github/codeql/custom-queries/ehr-insecure-config.ql.github/codeql/custom-queries/ehr-missing-audit-log.ql.github/codeql/custom-queries/ehr-security.ql.github/codeql/custom-queries/ehr-unencrypted-data.ql.github/codeql/custom-queries/ehr-unsafe-access.ql.github/codeql/custom-queries/fhir-insecure-ops.ql.github/codeql/custom-queries/fhir-insecure-search.ql.github/codeql/custom-queries/fhir-missing-version.ql.github/codeql/custom-queries/fhir-security.ql.github/codeql/custom-queries/fhir-unvalidated-access.ql.github/codeql/custom-queries/qlpack.yml.github/workflows/codeql.yml.jules/sentinel.mdconfig/vitest.config.tssrc/api/routes/documents.tssrc/api/routes/projects.tssrc/api/routes/strategic-plans.ts
💤 Files with no reviewable changes (3)
- .github/codeql/codeql-config.yml
- .github/codeql/custom-queries/ehr-security.ql
- .github/codeql/custom-queries/fhir-security.ql
| predicate isAuthenticationMethod(CallExpr call) { | ||
| exists(string name | | ||
| name = call.getCalleeName() and | ||
| ( | ||
| name.matches("%login%") or | ||
| name.matches("%authenticate%") or | ||
| name.matches("%auth%") | ||
| ) | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find .github/codeql/custom-queries -name "*.ql" -type f | head -20Repository: daggerstuff/pixelated
Length of output: 544
🏁 Script executed:
cat -n .github/codeql/custom-queries/ehr-insecure-auth.qlRepository: daggerstuff/pixelated
Length of output: 1300
🌐 Web query:
CodeQL authentication detection patterns JavaScript MFA verification best practices
💡 Result:
CodeQL detection patterns to focus on (JavaScript/TypeScript)
-
“Auth present, but unsafe for cookie sessions” → missing CSRF protection
- For Express (and similar cookie-session apps), use/extend CodeQL’s existing “Missing CSRF middleware” query (
js/missing-token-validation) to flag state-changing endpoints (POST/PUT/PATCH/DELETE) that rely on cookies/sessions but do not enforce a CSRF token check. This is explicitly framed as “an additional authentication measure” in the query help. [1]
- For Express (and similar cookie-session apps), use/extend CodeQL’s existing “Missing CSRF middleware” query (
-
“MFA can be bypassed” → multi-step auth not enforced server-side
- Model MFA as a required server-side state (e.g.,
session.mfaComplete === true,amrcontainsmfa,acrmeets a required level) and write CodeQL queries that find:- Protected routes that accept an authenticated session/JWT without checking the MFA-complete marker.
- Code paths where primary login establishes an authenticated session/token before the MFA step is completed (e.g.,
req.session.userId = ...set in step 1, while step 2 is optional).
- OWASP’s MFA testing guidance calls out bypassing step 2 by force-browsing/direct API calls after completing only the first step. [3]
- Also check for inconsistent MFA enforcement across login methods/flows (password login vs. magic link, OAuth/OIDC flows/policies, legacy endpoints). [3]
- Model MFA as a required server-side state (e.g.,
-
“MFA changes are weakly protected” → factor enrollment/reset lacks re-auth
- Write CodeQL queries to find endpoints that modify MFA configuration (add/replace factors, change phone/email used for OTP, disable MFA) that:
- Rely only on the existing session (no reauthentication / no proof of an already-enrolled factor).
- Don’t trigger notifications/audit logging (if your codebase has standard notifier/audit sinks to check for).
- OWASP recommends requiring reauthentication with an existing enrolled factor, not relying solely on the active session, and notifying users for factor changes. [2]
- OWASP ASVS (session-abuse defenses) similarly requires full re-authentication before changing MFA configuration and step-up/secondary verification for highly sensitive actions. [5]
- Write CodeQL queries to find endpoints that modify MFA configuration (add/replace factors, change phone/email used for OTP, disable MFA) that:
-
“Session not hardened around auth/MFA events” → fixation & missing re-auth triggers
- Add checks for whether the app regenerates session identifiers at authentication boundaries (pre-auth → post-auth) and enforces reauthentication after high-risk events.
- OWASP’s Session Management guidance recommends regenerating the session ID and using distinct pre/post-auth session handling to prevent session fixation/binding issues, and requiring reauthentication after high-risk events. [4]
MFA verification best practices (what your CodeQL rules should enforce)
- Server-side enforcement: treat MFA as an authorization requirement for protected actions, not just a UI step; prevent “step 1 only” access. [3]
- Consistency across entry points: every authentication flow that yields access must meet the same MFA requirements (or a deliberate, reviewed policy). [3]
- Secure factor changes: require reauthentication with an existing factor; don’t trust an already-established session for factor replacement; notify users and consider delays/step-up for high-value accounts. [2]
- Step-up for sensitive actions: require additional verification for high-risk operations (transfers, recovery, MFA changes, session termination). [5]
- Harden sessions around auth: regenerate session IDs and enforce reauthentication after risk events. [4]
Sources: CodeQL query help (Missing CSRF middleware) [1]; OWASP MFA Cheat Sheet [2]; OWASP WSTG MFA Testing [3]; OWASP Session Management Cheat Sheet [4]; OWASP ASVS (Defenses Against Session Abuse) [5].
🌐 Web query:
CodeQL high precision security rules substring matching vs function signature analysis
💡 Result:
What “high precision” implies in CodeQL security rules
CodeQL’s @precision metadata (high, very-high) is explicitly about the expected share of true positives vs false positives. GitHub’s default code-scanning query sets heavily favor high/very-high precision, and community security queries are generally only promoted when their false-positive rate is acceptably low. [1], [2]
Substring matching (name-based) vs function-signature analysis (semantic)
1) Substring matching (for example, getName().matches(...), contains, LIKE-style patterns)
What it does: matches on strings (function names, qualified names rendered as strings, source text, etc.). In QL, matches behaves like SQL LIKE (pattern matching), so it’s inherently stringy, not semantic. [4]
Why it often reduces precision (more false positives):
- Name collisions: different APIs can share substrings (common prefixes/suffixes like
encrypt,decode,exec,render). - Overloads/overrides/constructors: substring checks don’t reliably distinguish overloads or inheritance-related targets.
- Refactors/aliases: renames or wrapper functions break heuristics (false negatives) or broaden matches (false positives).
- Regex pitfalls: unanchored patterns can match unintended strings; CodeQL even ships rules warning about missing regex anchors because accidental matches cause incorrect behavior. [5]
When it’s still useful:
- Early exploration (“triage mode”), or when you truly cannot resolve the target (dynamic dispatch patterns in some languages, weak library modeling, generated code edges).
2) Function signature analysis (resolved call targets + parameter/return types)
What it does: matches on the callable element (resolved callee) and its semantic identity (declaring type/module, fully-qualified name, parameter types), not just text.
Why it tends to produce higher precision:
- You can bind to the actual
Callable/Method/Functionreached in the CodeQL database (call graph), then filter by:- Qualified identity (for example,
hasQualifiedName(...)rather than ad-hoc string comparisons). - Exact signature (name + parameter types). CodeQL libraries commonly expose a
getSignature()that returns a signature with fully-qualified types, which is designed for this kind of precise matching. [3]
- Qualified identity (for example,
- You can distinguish:
- overloads (
foo(int)vsfoo(String)) - methods on different receiver types
- interface vs implementation targets
- constructors vs methods
- overloads (
Trade-offs / failure modes:
- Requires good type resolution and library models; precision can degrade if external/framework APIs aren’t modeled (common in taint tracking/source-sink rules).
- Slightly more work to implement: you typically combine signature/qualified-name constraints with dataflow/taint modeling.
Practical guidance for writing high precision CodeQL security rules
- Prefer semantic target matching (resolved call + qualified name + signature) over substring matching. [3]
- If you must use strings/regex:
- anchor patterns and scope them tightly (package/module + name), to avoid accidental matches. [5]
- Treat substring matching as a fallback, and expect it to be harder to justify a
@precision high/very-highlabel if the core detection hinges on broad name patterns. [1], [2]
Revise authentication matching to use semantic resolution instead of substring patterns, and model MFA as server-side state rather than same-function scope.
The rule declares @precision high but relies entirely on substring matching (%auth%, %verify%, %mfa%), which CodeQL documentation identifies as a precision-reducing approach. Substring patterns produce both false positives (matches unrelated functions like "authorize", "verify", "authentication" in non-auth contexts) and false negatives (misses actual auth methods with unconventional names).
More critically, the MFA check at lines 30–35 only verifies that a matching function name exists in the same enclosing function—it does not validate:
- Actual call to MFA verification (control flow)
- MFA completion state (session markers, JWT
amrclaims, or explicit authentication-step tracking) - Whether the MFA check actually guards the protected resource
Replace this with:
- Qualified name matching on known authentication/MFA APIs (library-modeled callables with resolved signatures)
- Data flow analysis to track session/token state changes and verify MFA completion before resource access
- Explicit state checks (e.g.,
session.mfaComplete === true,amrcontainsmfa) rather than relying on function name heuristics
This aligns with OWASP MFA testing guidance, which emphasizes server-side enforcement of MFA state and prevention of "step 1 only" access via bypasses.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/codeql/custom-queries/ehr-insecure-auth.ql around lines 16 - 24, The
isAuthenticationMethod(CallExpr call) predicate currently uses substring
matching and must be replaced with semantic resolution: resolve the call target
(e.g., call.getTarget()/getDecl() or getCallee().getQualifiedName()) and match
against known authentication/MFA callables by qualified name or library-modeled
functions instead of name.matches("%auth%"). Remove the same-function MFA
heuristic and instead add dataflow/state checks that verify server-side MFA
completion before sensitive access—track session/token mutations (e.g.,
session.mfaComplete, session["mfaComplete"], or JWT claims like amr contains
"mfa") via taint/flow predicates and require those explicit state checks or
validated token claims are true before allowing protected resource access.
Update predicates that reference isAuthenticationMethod and the MFA check to use
the resolved callables and the new session/token state predicates so detection
relies on signature resolution and explicit server-side MFA state rather than
substring heuristics.
| predicate isSink(DataFlow::Node sink) { | ||
| exists(DataFlow::CallNode call | | ||
| call.getCalleeName().matches("%log%") and | ||
| sink = call.getAnArgument() |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
CodeQL QL string matches() wildcard substring matching documentation
💡 Result:
In CodeQL (QL), string.matches(pattern) uses SQL LIKE-style wildcard matching (it’s not regex):
_matches exactly one character.%matches any sequence of characters (including empty), so substring match is typically done with%...%.- To match literal
_or%, escape with\; a literal backslash is also escaped with\. [1], [2]
Examples:
- Substring:
"foobarbaz".matches("%bar%")[1] - Prefix/suffix:
"filename.cpp".matches("%.cpp")or"Error: ...".matches("Error:%")[1] - Literal
%:"100%".matches("100\\%")[2]
Sources: CodeQL language spec (built-in string.matches) [1] and the standard-library docs (includes escaping example) [2].
Tighten sink matching to real logger methods.
matches("%log%") uses SQL LIKE-style wildcard matching, which matches any string containing "log" as a substring. This classifies non-logger calls like login(), dialog(), or catalog() as sinks, inflating findings for a @precision high rule. Replace with explicit equality checks for known logger method names:
🔧 Proposed fix
predicate isSink(DataFlow::Node sink) {
- exists(DataFlow::CallNode call |
- call.getCalleeName().matches("%log%") and
+ exists(DataFlow::CallNode call, string callee |
+ callee = call.getCalleeName() and
+ (
+ callee = "log" or
+ callee = "info" or
+ callee = "warn" or
+ callee = "error" or
+ callee = "debug" or
+ callee = "trace"
+ ) and
sink = call.getAnArgument()
)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| predicate isSink(DataFlow::Node sink) { | |
| exists(DataFlow::CallNode call | | |
| call.getCalleeName().matches("%log%") and | |
| sink = call.getAnArgument() | |
| predicate isSink(DataFlow::Node sink) { | |
| exists(DataFlow::CallNode call, string callee | | |
| callee = call.getCalleeName() and | |
| ( | |
| callee = "log" or | |
| callee = "info" or | |
| callee = "warn" or | |
| callee = "error" or | |
| callee = "debug" or | |
| callee = "trace" | |
| ) and | |
| sink = call.getAnArgument() | |
| ) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/codeql/custom-queries/ehr-insecure-config.ql around lines 32 - 35,
The isSink predicate is too broad because call.getCalleeName().matches("%log%")
matches any name containing "log" (e.g., "login"); replace it with explicit
checks for real logger method names by updating the DataFlow::CallNode condition
in isSink to compare call.getCalleeName() against a whitelist (e.g., "log",
"info", "warn", "error", "debug", "trace", "fatal") using equality or ORed
equality expressions so only actual logging methods are treated as sinks.
| predicate hasLogging(CallExpr call) { | ||
| exists(CallExpr logCall | | ||
| (logCall.getCalleeName().matches("%log%") or | ||
| logCall.getCalleeName().matches("%audit%")) and | ||
| logCall.getEnclosingFunction() = call.getEnclosingFunction() | ||
| ) | ||
| } | ||
|
|
||
| from CallExpr ehrOp | ||
| where | ||
| isEHROperation(ehrOp) and | ||
| not hasLogging(ehrOp) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd -t f "ehr-missing-audit-log.ql"Repository: daggerstuff/pixelated
Length of output: 47
🏁 Script executed:
fd -t f "*.ql"Repository: daggerstuff/pixelated
Length of output: 348
🏁 Script executed:
fd --glob "*.ql"Repository: daggerstuff/pixelated
Length of output: 47
🏁 Script executed:
git ls-files | head -20Repository: daggerstuff/pixelated
Length of output: 563
🏁 Script executed:
git ls-files | grep -i codeqlRepository: daggerstuff/pixelated
Length of output: 777
🏁 Script executed:
cat -n .github/codeql/custom-queries/ehr-missing-audit-log.qlRepository: daggerstuff/pixelated
Length of output: 1322
🏁 Script executed:
cat -n .github/codeql/README.mdRepository: daggerstuff/pixelated
Length of output: 7037
🏁 Script executed:
find .github/codeql -type f -name "*.md" -o -name "*test*" -o -name "*example*"Repository: daggerstuff/pixelated
Length of output: 194
🏁 Script executed:
cat -n .github/codeql/VALIDATION_CHECKLIST.mdRepository: daggerstuff/pixelated
Length of output: 8830
🏁 Script executed:
find . -type f \( -name "*test*" -o -name "*spec*" -o -name "*example*" \) | grep -i "audit\|ehr\|log" | head -20Repository: daggerstuff/pixelated
Length of output: 1156
🏁 Script executed:
cat -n ./src/lib/logging/__tests__/audit-logger.test.tsRepository: daggerstuff/pixelated
Length of output: 816
🏁 Script executed:
cat -n ./src/lib/logging/audit-logger.tsRepository: daggerstuff/pixelated
Length of output: 559
🏁 Script executed:
cat -n ./src/lib/ehr/__tests__/ehr.test.ts | head -100Repository: daggerstuff/pixelated
Length of output: 4342
🏁 Script executed:
rg "getPatientRecord\|updatePatientRecord\|readEHR" --type ts -A 5 | head -50Repository: daggerstuff/pixelated
Length of output: 47
🏁 Script executed:
rg "function.*\(patient\|record\|ehr\|fhir\)" --type ts -i | head -30Repository: daggerstuff/pixelated
Length of output: 47
🏁 Script executed:
rg "CallExpr\|predicate\|isEHROperation" --type ts --type qlRepository: daggerstuff/pixelated
Length of output: 91
Audit logging check is too loose for operation-specific compliance detection.
On lines 28-33, the hasLogging predicate checks if ANY logging call exists in the enclosing function, not whether the specific ehrOp found on lines 36-39 is audited. This allows generic logging (e.g., logger.debug("audit trace")) to suppress warnings for missing audit coverage of individual EHR operations. HIPAA's audit control requirement (§164.312(b)) mandates comprehensive audit trails per PHI access event, not function-level logging presence.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/codeql/custom-queries/ehr-missing-audit-log.ql around lines 28 - 39,
The current hasLogging predicate only checks for any logCall in the enclosing
function, which can falsely mark an ehrOp as audited; change the predicate to
take the specific ehrOp (rename to hasLoggingForOp(CallExpr ehrOp)) and require
the logCall to be syntactically or dataflow-associated with that ehrOp (e.g.,
the logCall must be in the same basic block or occur after ehrOp and/or one of
its arguments or receiver identifiers appears in the logCall arguments); use
CallExpr methods (getEnclosingBlock/getLocation/getArgument/getReceiver) or a
simple dataflow slice to tie logCall to ehrOp instead of just matching on
enclosing function so each ehrOp is verified to have an operation-specific audit
call.
| from CallExpr call, DataFlow::Node data | ||
| where | ||
| isDataTransmissionCall(call) and | ||
| isEHRData(data) and | ||
| not exists(CallExpr encryptCall | | ||
| encryptCall.getCalleeName().matches("%encrypt%") and | ||
| DataFlow::localFlow(data, DataFlow::exprNode(encryptCall.getAnArgument())) | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd -t f "ehr-unencrypted-data.ql" 2>/dev/nullRepository: daggerstuff/pixelated
Length of output: 47
🏁 Script executed:
find . -name "*.ql" -type f 2>/dev/null | head -20Repository: daggerstuff/pixelated
Length of output: 562
🏁 Script executed:
cat -n ./.github/codeql/custom-queries/ehr-unencrypted-data.qlRepository: daggerstuff/pixelated
Length of output: 1703
Bind EHR data flow to the actual transmission sink.
On Lines 42-49, data is not required to reach call, so the rule can flag unrelated network calls and miss real unencrypted flows. Add explicit data flow constraints to ensure EHR data is traced into the transmission call's argument, and verify encrypted results actually reach the sink.
🔧 Proposed fix
-from CallExpr call, DataFlow::Node data
+from CallExpr call, DataFlow::Node data, Expr sinkArg
where
isDataTransmissionCall(call) and
+ sinkArg = call.getAnArgument() and
isEHRData(data) and
+ DataFlow::localFlow(data, DataFlow::exprNode(sinkArg)) and
not exists(CallExpr encryptCall |
encryptCall.getCalleeName().matches("%encrypt%") and
- DataFlow::localFlow(data, DataFlow::exprNode(encryptCall.getAnArgument()))
+ DataFlow::localFlow(data, DataFlow::exprNode(encryptCall.getAnArgument())) and
+ DataFlow::localFlow(DataFlow::exprNode(encryptCall), DataFlow::exprNode(sinkArg))
)
select call,
"Potential unencrypted EHR data transmission detected. HIPAA compliance requires encryption."📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from CallExpr call, DataFlow::Node data | |
| where | |
| isDataTransmissionCall(call) and | |
| isEHRData(data) and | |
| not exists(CallExpr encryptCall | | |
| encryptCall.getCalleeName().matches("%encrypt%") and | |
| DataFlow::localFlow(data, DataFlow::exprNode(encryptCall.getAnArgument())) | |
| ) | |
| from CallExpr call, DataFlow::Node data, Expr sinkArg | |
| where | |
| isDataTransmissionCall(call) and | |
| sinkArg = call.getAnArgument() and | |
| isEHRData(data) and | |
| DataFlow::localFlow(data, DataFlow::exprNode(sinkArg)) and | |
| not exists(CallExpr encryptCall | | |
| encryptCall.getCalleeName().matches("%encrypt%") and | |
| DataFlow::localFlow(data, DataFlow::exprNode(encryptCall.getAnArgument())) and | |
| DataFlow::localFlow(DataFlow::exprNode(encryptCall), DataFlow::exprNode(sinkArg)) | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/codeql/custom-queries/ehr-unencrypted-data.ql around lines 42 - 49,
The query currently does not require that the EHR data node `data` actually
flows into the transmission `call`; update the where-clause to require
`DataFlow::localFlow(data, DataFlow::exprNode(call.getAnArgument()))` so the EHR
data is traced into the call's argument, and also ensure the `encryptCall` check
verifies the encryption result reaches the same sink by requiring
`DataFlow::localFlow(DataFlow::exprNode(encryptCall.getAnArgument()),
DataFlow::exprNode(call.getAnArgument()))`; keep the existing predicates
(`isDataTransmissionCall(call)`, `isEHRData(data)`,
`encryptCall.getCalleeName().matches("%encrypt%")`) but add these explicit
data-flow constraints so only transmissions that actually receive unencrypted
EHR data are flagged.
| exists(DataFlow::CallNode call | | ||
| ( | ||
| call.getCalleeName().matches("%request%") or | ||
| call.getCalleeName().matches("%fetch%") or | ||
| call.getCalleeName().matches("%axios%") | ||
| ) and |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -nP --type=js --type=ts '\baxios\.(get|post|put|patch|delete|request)\s*\('Repository: daggerstuff/pixelated
Length of output: 745
🏁 Script executed:
cat -n .github/codeql/custom-queries/ehr-unsafe-access.ql | head -40Repository: daggerstuff/pixelated
Length of output: 1379
Sink matcher misses common axios.<method>() call patterns.
The pattern %axios% in call.getCalleeName().matches("%axios%") cannot detect axios member calls like axios.get(), axios.post(), etc., because getCalleeName() returns only the method name (e.g., "get"), not the full reference. These patterns are actively used in the codebase (e.g., axios.get() in BusinessIntelligenceService, axios.post() in ExternalThreatFeedIntegration), so the query will miss unsafe data flows through common axios usage, significantly reducing detection coverage.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/codeql/custom-queries/ehr-unsafe-access.ql around lines 26 - 31, The
sink matcher currently only checks call.getCalleeName(), which misses member
calls like axios.get()/axios.post(); update the condition to also detect calls
whose callee is a PropertyAccess/MemberAccess whose qualifier is the identifier
"axios" (e.g. check call.getCallee() instanceof PropertyAccess/MemberAccess and
call.getCallee().getQualifier().getName() = "axios"), so DataFlow::CallNode
flows through both direct function names (request/fetch/axios-like names) and
member calls on the axios object (axios.get/axios.post/etc.).
| predicate hasInputSanitization(CallExpr call) { | ||
| exists(CallExpr sanitizeCall | | ||
| (sanitizeCall.getCalleeName().matches("%sanitize%") or | ||
| sanitizeCall.getCalleeName().matches("%escape%") or | ||
| sanitizeCall.getCalleeName().matches("%validate%")) and | ||
| sanitizeCall.getEnclosingFunction() = call.getEnclosingFunction() | ||
| ) | ||
| } | ||
|
|
||
| from CallExpr searchOp | ||
| where | ||
| isFHIRSearch(searchOp) and | ||
| not hasInputSanitization(searchOp) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "fhir-insecure-search.ql" -type fRepository: daggerstuff/pixelated
Length of output: 120
🏁 Script executed:
cd .github/codeql/custom-queries 2>/dev/null && cat -n fhir-insecure-search.ql || echo "File not found, checking alternative paths..."Repository: daggerstuff/pixelated
Length of output: 1400
🏁 Script executed:
find . -path "*codeql*" -name "*.ql" | head -20Repository: daggerstuff/pixelated
Length of output: 562
🏁 Script executed:
rg "DataFlow::" .github/codeql --type ql -A 2 -B 2Repository: daggerstuff/pixelated
Length of output: 91
🌐 Web query:
CodeQL DataFlow::localFlow JavaScript exprNode sanitization
💡 Result:
In CodeQL for JavaScript/TypeScript, the equivalents of DataFlow::localFlow(DataFlow::exprNode(a), DataFlow::exprNode(b)) are:
1) “exprNode” equivalent in JS
JS doesn’t use DataFlow::exprNode. Convert an Expr to a DataFlow::Node with either:
e.flow()(always works forExpr) (codeql.github.com)DataFlow::valueNode(e)(for expressions or certain declarations) (codeql.github.com)
2) “localFlow” equivalent in JS
JS local flow is typically expressed via single-step edges (getAPredecessor / getASuccessor) plus transitive closure (* / +). (codeql.github.com)
Example helper predicate (local flow within a function):
import javascript
import semmle.javascript.dataflow.DataFlow
predicate localFlowExpr(Expr src, Expr sink) {
sink.flow() = src.flow().getASuccessor*()
}This matches “0 or more” local steps; use + instead of * to require at least one step. (codeql.github.com)
3) Sanitization in JS data flow / taint tracking (what to do instead)
For taint tracking, “sanitizers” are modeled as barriers in a DataFlow::ConfigSig configuration.
- In the modern/shared libraries, the old
isSanitizerpredicate is migrated/renamed toisBarrier. (codeql.github.com) - Old-style
isSanitizerGuard(andisBarrierGuard) have been removed; useisBarrier(+DataFlow::MakeBarrierGuardif you were using guard-style blocking). (codeql.github.com)
Minimal example (barrier at a sanitizer call’s result):
import javascript
import semmle.javascript.dataflow.DataFlow
import semmle.javascript.dataflow.TaintTracking
module Cfg implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node n) { /* ... */ }
predicate isSink(DataFlow::Node n) { /* ... */ }
// Treat sanitize(x) as “cleansing” its output
predicate isBarrier(DataFlow::Node n) {
exists(DataFlow::CallNode c |
c.getCalleeName() = "sanitize" and
n = c
)
}
}
module Flow = TaintTracking::Global<Cfg>;(You can barrier the call result c as above, or barrier specific arguments depending on how your sanitizer works.) (codeql.github.com)
Citations:
- 1: https://codeql.github.com/docs/codeql-language-guides/data-flow-cheat-sheet-for-javascript/
- 2: https://codeql.github.com/docs/codeql-language-guides/analyzing-data-flow-in-javascript-and-typescript/
- 3: https://codeql.github.com/docs/codeql-language-guides/analyzing-data-flow-in-javascript-and-typescript/
- 4: https://codeql.github.com/docs/codeql-language-guides/analyzing-data-flow-in-javascript-and-typescript/
- 5: https://codeql.github.com/docs/codeql-language-guides/migrating-javascript-dataflow-queries/
- 6: https://codeql.github.com/docs/codeql-language-guides/migrating-javascript-dataflow-queries/
- 7: https://codeql.github.com/docs/codeql-language-guides/data-flow-cheat-sheet-for-javascript/
🏁 Script executed:
rg "flow()" .github/codeql/custom-queries/ -A 2 -B 2Repository: daggerstuff/pixelated
Length of output: 1191
🏁 Script executed:
rg "getASuccessor" .github/codeql/custom-queries/ -A 2 -B 2Repository: daggerstuff/pixelated
Length of output: 47
🏁 Script executed:
cat .github/codeql/custom-queries/fhir-insecure-ops.ql | head -50Repository: daggerstuff/pixelated
Length of output: 1178
Sanitization predicate should verify data flow to search arguments.
Lines 27-33 currently accept any sanitize/escape/validate call in the same function, regardless of whether it actually processes the searchOp arguments on lines 36-39. This allows unrelated sanitization to suppress legitimate insecure-search findings.
The proposed fix contains incorrect CodeQL syntax for JavaScript. In JavaScript/TypeScript CodeQL, DataFlow::exprNode() and DataFlow::localFlow() do not exist. Instead, use .flow() to convert expressions to DataFlow::Nodes and .getASuccessor*() for local data flow steps, or leverage taint-tracking with barrier patterns as shown in ehr-unsafe-access.ql.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/codeql/custom-queries/fhir-insecure-search.ql around lines 27 - 39,
The hasInputSanitization predicate is too coarse: it only checks for any
sanitize/escape/validate CallExpr in the same function rather than verifying the
sanitization actually flows into the FHIR search arguments; update
hasInputSanitization to perform a dataflow/taint check from the searchOp
argument expressions to the sanitizeCall (or vice versa) using JavaScript CodeQL
idioms (e.g., convert expressions to DataFlow::Node with .flow() or use
DataFlow::LocalFlow/getASuccessor* style steps or a taint-tracking barrier
pattern similar to ehr-unsafe-access.ql) so that only sanitizeCall instances
that are on the actual data path into isFHIRSearch(searchOp) are considered
sanitization for that call.
| predicate isFHIRClientInit(CallExpr call) { | ||
| exists(string name | | ||
| name = call.getCalleeName() and | ||
| ( | ||
| name.matches("%Client%") or | ||
| name.matches("%FHIRClient%") or | ||
| name.matches("%createClient%") | ||
| ) | ||
| ) | ||
| } | ||
|
|
||
| predicate hasVersionCheck(CallExpr call) { | ||
| exists(CallExpr versionCall | | ||
| (versionCall.getCalleeName().matches("%version%") or | ||
| versionCall.getCalleeName().matches("%compatibility%") or | ||
| versionCall.getCalleeName().matches("%checkVersion%")) and | ||
| versionCall.getEnclosingFunction() = call.getEnclosingFunction() | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find the file and check its complete contents
find . -name "fhir-missing-version.ql" -type fRepository: daggerstuff/pixelated
Length of output: 120
🏁 Script executed:
# Read the complete CodeQL query file
cat -n ./.github/codeql/custom-queries/fhir-missing-version.qlRepository: daggerstuff/pixelated
Length of output: 1437
Semantic linkage between client initialization and version checks is missing, reducing accuracy.
The isFHIRClientInit predicate (lines 16-23) uses broad patterns (%Client%, %FHIRClient%, %createClient%) that will match any function with "Client" in the name, including non-FHIR clients like HttpClient or DatabaseClient.
More critically, hasVersionCheck (lines 27-33) only verifies that some version-related call exists in the same enclosing function—it does not verify that the call is on the specific client object that was initialized. This creates two problems:
- False negatives: If two clients are initialized and only one has a version check, the unversioned client is not flagged (the other version call satisfies the predicate).
- False positives: Pattern-matched non-FHIR clients may be reported.
These predicates should verify that the version call is chained to (or otherwise operates on) the specific initialized client to ensure correctness consistent with the declared "high precision."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/codeql/custom-queries/fhir-missing-version.ql around lines 16 - 33,
isFHIRClientInit uses overly broad name patterns and hasVersionCheck only checks
for any version-related call in the same function, causing false
positives/negatives; update isFHIRClientInit to narrow matching to FHIR-specific
identifiers (e.g., names containing "FHIR" or "Fhir" or explicit factory names
like "createFhirClient"/"createFHIRClient") and update hasVersionCheck to
require the version-related call to operate on the same client instance: derive
the client target (the variable/receiver produced by the CallExpr in
isFHIRClientInit or its assigned variable) and require that versionCall’s
receiver/target (or the object passed into it) matches that same client symbol
or dataflow origin. Modify the predicates isFHIRClientInit and hasVersionCheck
to compare the client symbol/receiver rather than only checking
enclosingFunction scope so the version check is tied to the specific initialized
client.
| name.matches("%getResource%") or | ||
| name.matches("%searchResource%") or | ||
| name.matches("%createResource%") or | ||
| name.matches("%updateResource%") or | ||
| name.matches("%read%") or | ||
| name.matches("%vread%") or | ||
| name.matches("%search%") | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, verify the file exists and check its content
ls -la .github/codeql/custom-queries/fhir-unvalidated-access.qlRepository: daggerstuff/pixelated
Length of output: 170
🏁 Script executed:
# Read the file to understand the full context
cat -n .github/codeql/custom-queries/fhir-unvalidated-access.qlRepository: daggerstuff/pixelated
Length of output: 1623
🏁 Script executed:
# Also verify the specific lines mentioned
sed -n '1,35p' .github/codeql/custom-queries/fhir-unvalidated-access.ql | cat -nRepository: daggerstuff/pixelated
Length of output: 1253
Narrow callee matching to reduce false positives and align with declared precision level.
The substring patterns %read% and %search% are too broad and will match unrelated method names like thread, already_read, research, and search_config. This creates false positives that contradict the query's @precision high declaration. Switch to exact name matching for all patterns.
Suggested diff
predicate isFHIRResourceAccess(CallExpr call) {
exists(string name |
name = call.getCalleeName() and
(
- name.matches("%getResource%") or
- name.matches("%searchResource%") or
- name.matches("%createResource%") or
- name.matches("%updateResource%") or
- name.matches("%read%") or
- name.matches("%vread%") or
- name.matches("%search%")
+ name = "getResource" or
+ name = "searchResource" or
+ name = "createResource" or
+ name = "updateResource" or
+ name = "read" or
+ name = "vread" or
+ name = "search"
)
)
}Also review the hasValidation predicate, which uses similarly broad patterns (%validate%, %check%, %verify%).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| name.matches("%getResource%") or | |
| name.matches("%searchResource%") or | |
| name.matches("%createResource%") or | |
| name.matches("%updateResource%") or | |
| name.matches("%read%") or | |
| name.matches("%vread%") or | |
| name.matches("%search%") | |
| ) | |
| predicate isFHIRResourceAccess(CallExpr call) { | |
| exists(string name | | |
| name = call.getCalleeName() and | |
| ( | |
| name = "getResource" or | |
| name = "searchResource" or | |
| name = "createResource" or | |
| name = "updateResource" or | |
| name = "read" or | |
| name = "vread" or | |
| name = "search" | |
| ) | |
| ) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/codeql/custom-queries/fhir-unvalidated-access.ql around lines 20 -
27, The current callee name matching in the query (the name.matches(...) list
including "%read%" and "%search%") is too permissive and causes false positives;
update the predicate that builds the allowed callee names (the clauses
referencing name.matches("%getResource%") etc.) to use exact name equality
(e.g., name = "getResource", name = "searchResource", name = "createResource",
name = "updateResource", name = "read", name = "vread", name = "search") instead
of substring matches, and apply the same tightening to the hasValidation
predicate by replacing broad patterns like "%validate%", "%check%", "%verify%"
with exact name equality checks for the specific validation function names you
intend to detect so the query matches only precise callees and aligns with the
declared high precision.
| predicate hasValidation(CallExpr call) { | ||
| exists(CallExpr validateCall | | ||
| (validateCall.getCalleeName().matches("%validate%") or | ||
| validateCall.getCalleeName().matches("%check%") or | ||
| validateCall.getCalleeName().matches("%verify%")) and | ||
| validateCall.getEnclosingFunction() = call.getEnclosingFunction() | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n .github/codeql/custom-queries/fhir-unvalidated-access.ql | head -50Repository: daggerstuff/pixelated
Length of output: 1623
🏁 Script executed:
# Check the exact lines mentioned
sed -n '31,40p' .github/codeql/custom-queries/fhir-unvalidated-access.qlRepository: daggerstuff/pixelated
Length of output: 414
🏁 Script executed:
# Run the verification script from the review
rg -n 'predicate hasValidation|getEnclosingFunction|getLocation|getAnArgument' .github/codeql/custom-queries/fhir-unvalidated-access.qlRepository: daggerstuff/pixelated
Length of output: 181
🏁 Script executed:
# Search for how CodeQL typically handles location/ordering checks
rg -i 'getLocation|getStartLine|getAnArgument' .github/codeql/ -A 2 | head -40Repository: daggerstuff/pixelated
Length of output: 1178
🏁 Script executed:
# Check if there are other CodeQL predicates in the file that might show patterns
cat -n .github/codeql/custom-queries/fhir-unvalidated-access.qlRepository: daggerstuff/pixelated
Length of output: 1623
🌐 Web query:
CodeQL CallExpr getLocation getStartLine parameter checking
💡 Result:
In CodeQL, CallExpr (JavaScript/TypeScript) is a locatable AST node, so you can get its source position via getLocation(), and then line numbers via Location.getStartLine() / getEndLine() (1-based). [1], [2]
For parameter (argument) checking on calls, use the invocation APIs: getArgument(i) to fetch the ith argument, and getNumArgument() to constrain/guard by arity. [3], [4]
Example: filter calls by start line and check arguments
import javascript
from CallExpr call, int line, Expr arg0
where
line = call.getLocation().getStartLine() and
line >= 100 and line <= 140 and -- line-based filter
call.getNumArgument() >= 1 and -- arity guard
arg0 = call.getArgument(0) -- first argument
select call, line, arg0Example: find calls where an expected argument is missing
import javascript
from CallExpr call, int line
where
line = call.getLocation().getStartLine() and
call.getNumArgument() < 2
select call, line, "Call has fewer than 2 arguments."Sources: [1] https://codeql.github.com/docs/codeql-language-guides/working-with-source-locations/ [2] https://codeql.github.com/codeql-standard-libraries/javascript/semmle/javascript/Locations.qll/predicate.Locations%24Locatable%24getStartLine.0.html [3] https://codeql.github.com/docs/codeql-language-guides/codeql-library-for-javascript/ [4] https://codeql.github.com/codeql-standard-libraries/javascript/semmle/javascript/dataflow/Nodes.qll/predicate.Nodes%24InvokeNode%24getNumArgument.0.html
🌐 Web query:
CodeQL data flow analysis validation check pattern
💡 Result:
In CodeQL data-flow/taint-tracking, a “validation check” is typically modeled as a sanitizer (or a barrier) so that flows from sources → sinks are only reported when they do not pass the validation step. (codeql.github.com)
Standard pattern (taint tracking)
- Define a
TaintTracking::Configuration(or the newerDataFlow::ConfigSig-based setup, depending on language/library). - Implement:
isSource(node)– where untrusted data entersisSink(node)– where it becomes dangerousisSanitizer(node)and/orisBarrier(node)– your “validation” step(s) (codeql.github.com)
Minimal sketch:
import <language>.dataflow.TaintTracking
class Config extends TaintTracking::Configuration {
Config() { this = "MyConfig" }
override predicate isSource(DataFlow::Node src) { /* untrusted input */ }
override predicate isSink(DataFlow::Node sink) { /* dangerous use */ }
// Pattern A: validation as a sanitizer (stops taint)
override predicate isSanitizer(DataFlow::Node n) { /* validate()/sanitize() */ }
// Pattern B (optional): validation as a barrier (blocks flow through node)
override predicate isBarrier(DataFlow::Node n) { /* validation gate */ }
}
from Config c, DataFlow::Node src, DataFlow::Node sink
where c.hasFlowPath(src, sink)
select sink, src, "Unvalidated flow reaches sink."When the “validation” is a conditional check (guard)
If validation is done via a boolean check (for example if (isValid(x)) { use(x) }), many CodeQL libraries support sanitizer guards (a check that sanitizes on the “true” branch). Look for predicates like isSanitizerGuard(...) / guard-node modeling in the relevant language pack. (codeql.github.com)
Practical tips to “validate the analysis”
- If you’re unsure whether your sanitizer/barrier is being applied, debug with partial flow / data-flow query debugging to see where the path is (and isn’t) being cut. (codeql.github.com)
Citations:
- 1: https://codeql.github.com/docs/codeql-language-guides/using-flow-labels-for-precise-data-flow-analysis/?utm_source=openai
- 2: https://codeql.github.com/docs/codeql-language-guides/using-flow-labels-for-precise-data-flow-analysis/?utm_source=openai
- 3: https://codeql.github.com/codeql-standard-libraries/javascript/semmle/javascript/security/dataflow/TaintedFormatStringQuery.qll/type.TaintedFormatStringQuery%24Configuration.html?utm_source=openai
- 4: https://codeql.github.com/docs/writing-codeql-queries/debugging-data-flow-queries-using-partial-flow/?utm_source=openai
Validation logic has critical gaps—any validate/check/verify call in the same function suppresses findings, regardless of execution order or validated argument.
The hasValidation predicate currently accepts validation calls that:
- Occur after the resource access (no order guarantee)
- Validate different data than what is accessed (no argument correlation)
This allows false negatives where unvalidated FHIR access is missed if an unrelated validation call exists in the same function.
Suggested fix
predicate hasValidation(CallExpr call) {
exists(CallExpr validateCall |
(validateCall.getCalleeName().matches("%validate%") or
validateCall.getCalleeName().matches("%check%") or
validateCall.getCalleeName().matches("%verify%")) and
validateCall.getEnclosingFunction() = call.getEnclosingFunction() and
+ validateCall.getLocation().getStartLine() < call.getLocation().getStartLine() and
+ exists(Expr arg |
+ arg = call.getAnArgument() and
+ arg = validateCall.getAnArgument()
+ )
)
}The getLocation().getStartLine() and getAnArgument() methods are valid CodeQL APIs already used elsewhere in the codebase.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| predicate hasValidation(CallExpr call) { | |
| exists(CallExpr validateCall | | |
| (validateCall.getCalleeName().matches("%validate%") or | |
| validateCall.getCalleeName().matches("%check%") or | |
| validateCall.getCalleeName().matches("%verify%")) and | |
| validateCall.getEnclosingFunction() = call.getEnclosingFunction() | |
| ) | |
| predicate hasValidation(CallExpr call) { | |
| exists(CallExpr validateCall | | |
| (validateCall.getCalleeName().matches("%validate%") or | |
| validateCall.getCalleeName().matches("%check%") or | |
| validateCall.getCalleeName().matches("%verify%")) and | |
| validateCall.getEnclosingFunction() = call.getEnclosingFunction() and | |
| validateCall.getLocation().getStartLine() < call.getLocation().getStartLine() and | |
| exists(Expr arg | | |
| arg = call.getAnArgument() and | |
| arg = validateCall.getAnArgument() | |
| ) | |
| ) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/codeql/custom-queries/fhir-unvalidated-access.ql around lines 31 -
37, The hasValidation predicate is too coarse: it treats any
validate/check/verify CallExpr in the same function as sufficient; update it to
ensure the validateCall actually validates the same argument and occurs before
the access. Specifically, for the predicate hasValidation(CallExpr call) compare
source locations (e.g., call.getLocation().getStartLine() and
validateCall.getLocation().getStartLine()) to require validateCall is before
call, and correlate arguments by extracting the validated value (use
call.getAnArgument() and validateCall.getAnArgument() or compare the same
variable/expr) so only validations that target the same argument suppress the
finding.
| ## 2025-05-22 - [Broken Access Control and NoSQL Injection in API Routes] | ||
| **Vulnerability:** Placeholder authentication/authorization middleware was being used in several business-critical API routes (`documents.ts`), and unsanitized query parameters were being passed directly to MongoDB filters (`documents.ts`, `projects.ts`, `strategic-plans.ts`). |
There was a problem hiding this comment.
Fix markdown heading spacing (MD022).
Add a blank line after the heading to satisfy markdownlint.
🛠️ Suggested patch
## 2025-05-22 - [Broken Access Control and NoSQL Injection in API Routes]
+
**Vulnerability:** Placeholder authentication/authorization middleware was being used in several business-critical API routes (`documents.ts`), and unsanitized query parameters were being passed directly to MongoDB filters (`documents.ts`, `projects.ts`, `strategic-plans.ts`).📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ## 2025-05-22 - [Broken Access Control and NoSQL Injection in API Routes] | |
| **Vulnerability:** Placeholder authentication/authorization middleware was being used in several business-critical API routes (`documents.ts`), and unsanitized query parameters were being passed directly to MongoDB filters (`documents.ts`, `projects.ts`, `strategic-plans.ts`). | |
| ## 2025-05-22 - [Broken Access Control and NoSQL Injection in API Routes] | |
| **Vulnerability:** Placeholder authentication/authorization middleware was being used in several business-critical API routes (`documents.ts`), and unsanitized query parameters were being passed directly to MongoDB filters (`documents.ts`, `projects.ts`, `strategic-plans.ts`). |
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 1-1: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.jules/sentinel.md around lines 1 - 2, The Markdown heading "## 2025-05-22 -
[Broken Access Control and NoSQL Injection in API Routes]" in .jules/sentinel.md
needs a blank line after it to satisfy MD022; insert a single empty line
immediately following that heading so the next paragraph or list is separated
from the heading.
🚨 Severity: CRITICAL
💡 Vulnerability: Placeholder authorization middleware in
documents.tsallowed any user to create/edit documents. Unsanitized query parameters in several routes allowed for NoSQL Injection.🎯 Impact: Unauthorized data access and modification; potential data leakage or denial of service via NoSQL operators.
🔧 Fix: Re-enabled
requirePermissionsandrequireRolesmiddleware insrc/api/routes/documents.ts. Sanitizedstatus,type, andcategoryquery parameters usingensureStringor type checks indocuments.ts,projects.ts, andstrategic-plans.ts. Also corrected path aliases and setup file references inconfig/vitest.config.ts.✅ Verification: Ran unit tests for auth middleware (passing 52 tests) and verified code changes with
read_file.PR created automatically by Jules for task 3626314754728858959 started by @daggerstuff
Summary by Sourcery
Restore proper authorization and sanitize query parameters in critical API routes to address security vulnerabilities and align test configuration with the project structure.
Bug Fixes:
Enhancements:
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores