Skip to content

fix(metadata): prevent framework misclassification in codebase detection#96

Merged
PatrickSys merged 2 commits intomasterfrom
fix/phase-14-metadata-misclassification
Apr 12, 2026
Merged

fix(metadata): prevent framework misclassification in codebase detection#96
PatrickSys merged 2 commits intomasterfrom
fix/phase-14-metadata-misclassification

Conversation

@PatrickSys
Copy link
Copy Markdown
Owner

What / Why

Fixes false-positive framework classification in get_codebase_metadata for generic Node repos.

What changed:

  • Added framework.indicators to FrameworkInfo as evidence signals
  • Next.js analyzer now builds indicator evidence (dep:*, disk:*) and only claims framework when dep:next is present
  • React analyzer now builds indicator evidence and only claims framework when dep:react is present
  • Angular analyzer now emits indicator evidence under existing guard
  • Indexer merge now uses selectFramework() with a minimum evidence threshold (>= 3 indicators) before preserving a framework claim
  • Added regression tests for no framework on plain Node repos, weak React indicators, and positive indicator assertions

Testing

  • ✅ 21/21 new Phase 14 tests passing
  • ✅ Full suite: 447/447 tests passing
  • ✅ Type check: clean (no errors)
  • ✅ Formatting: Prettier verified

Release notes

Include as a bug fix:

  • Prevents metadata framework misclassification on non-framework projects
  • Improves framework detection reliability via evidence-based thresholding

Fixes false-positive framework claims on generic Node repos by:
- Adding framework.indicators to FrameworkInfo (evidence signals per claim)
- Adding dep guards to Next.js and React analyzers (framework only if core dep present)
- Adding indicator collection to Angular analyzer (inside existing guard)
- Enforcing >=3 indicator threshold in indexer.mergeMetadata() before promoting claims

These changes prevent repos without react/nextjs/@angular/core dependencies from
being incorrectly flagged with a framework type. Framework claims now require
enumerated evidence signals that demonstrate genuine framework presence.

Tests: 21/21 passing (4 new negative tests, 2 indicator assertions, 4 merge E2E)
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 11, 2026

Greptile Summary

This PR introduces evidence-based framework detection: each analyzer now collects indicators (dep-based and disk-based signals) before claiming a framework, and the indexer's selectFramework() requires at least 3 indicators to preserve a claim.

  • P1 — plain JS React regression: The React analyzer only collects 4 dep-based signals. A project with react + react-dom but no @types/react or react-native reaches exactly 2 indicators — silently dropped by the threshold. This is the most common shape of a plain-JavaScript React app and was classified correctly before this PR.
  • P1 — Angular library regression: Angular library projects may have only @angular/core as a direct dep (no angular.json, no tsconfig.app.json), yielding 1–2 indicators and a silent false-negative; unlike React/Next.js, Angular already had a reliable single-dep guard before this change.

Confidence Score: 3/5

Not safe to merge as-is — two P1 regressions cause silent false-negatives for common project shapes (plain JS React apps and Angular library projects).

The Next.js path is solid (6 possible indicators, strong dep:next guard), but the global MIN_INDICATORS = 3 threshold is not well-calibrated for React (only 4 dep signals, disk signals missing) or Angular library projects. Both represent current real defects on the changed path that didn't exist before.

src/analyzers/react/index.ts (missing disk indicators to reliably reach threshold), src/analyzers/angular/index.ts (Angular library projects under-served by current indicator set), tests/indexer-metadata.test.ts (missing react+react-dom 2-indicator test case)

Important Files Changed

Filename Overview
src/analyzers/react/index.ts Adds indicator collection and guard before setting framework, but plain JS React apps (react + react-dom, no @types/react) get only 2 indicators — silently dropped by selectFramework's threshold of 3.
src/analyzers/angular/index.ts Adds indicator evidence under the existing angularCoreVersion guard, but Angular library projects (only @angular/core, no angular.json) can end up with fewer than 3 indicators and have their framework claim silently dropped.
src/analyzers/nextjs/index.ts Wraps framework assignment behind dep:next guard and collects up to 6 indicators (3 dep + 3 disk); a Next.js project will reliably hit the threshold of 3.
src/core/indexer.ts Replaces simple OR-precedence merge with selectFramework() using a MIN_INDICATORS = 3 threshold; logic is correct but drops valid claims silently with no debug logging.
src/types/index.ts Adds generic indicators?: readonly string[] to FrameworkInfo — framework-agnostic field name, no rule violations.
tests/indexer-metadata.test.ts Good new regression tests, but missing coverage for react + react-dom (2-indicator case) which is the most common plain JS React shape and currently silently broken.
tests/nextjs-analyzer.test.ts Adds two well-scoped tests: absence of dep:next suppresses framework, and positive indicator assertions for a full Next.js setup.
tests/react-analyzer.test.ts Adds absence test and positive indicator assertion; the 2-indicator JS React gap is in the integration test, not here.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Analyzer runs detectCodebaseMetadata] --> B{Primary dep present?\ne.g. dep:react / dep:next / angularCoreVersion}
    B -- No --> C[framework = undefined]
    B -- Yes --> D[Collect indicators\ndep:* + disk:*]
    D --> E[Set metadata.framework\nwith indicators array]
    E --> F[mergeMetadata called in indexer]
    F --> G[selectFramework\nbase vs incoming]
    G --> H{indicators.length >= MIN_INDICATORS 3?}
    H -- Neither passes --> I[return undefined\n⚠️ silent drop, no log]
    H -- Only base passes --> J[return base]
    H -- Only incoming passes --> K[return incoming]
    H -- Both pass --> L[return base\nhigher-priority wins]
Loading

Reviews (1): Last reviewed commit: "fix(metadata): prevent framework misclas..." | Re-trigger Greptile

Comment on lines +230 to +234
const indicators: string[] = [];
if (packageInfo.allDependencies.react) indicators.push('dep:react');
if (packageInfo.allDependencies['react-dom']) indicators.push('dep:react-dom');
if (packageInfo.allDependencies['@types/react']) indicators.push('dep:@types/react');
if (packageInfo.allDependencies['react-native']) indicators.push('dep:react-native');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Plain JS React apps silently lose framework detection

The React analyzer collects only four dep-based indicators (dep:react, dep:react-dom, dep:@types/react, dep:react-native). A vanilla JavaScript React app that has react + react-dom but no TypeScript (@types/react) and no React Native ends up with exactly 2 indicators — one below the MIN_INDICATORS = 3 threshold in selectFramework. The framework claim is silently dropped in the merge step, even though react + react-dom together are a very strong signal of a real React project.

Before this PR the old base.framework || incoming.framework path would have kept the claim, so this is a regression for a common project shape. The test suite only covers the 1-indicator case (react alone) and the 3-indicator case (react + react-dom + @types/react); the 2-indicator case is untested and broken.

Possible fixes:

  1. Add at least one disk-based indicator (e.g. presence of src/App.jsx, src/App.tsx, or public/index.html) so a browser React app can always reach 3 signals.
  2. Lower the threshold for the React analyzer to 2, using a per-framework constant instead of the shared MIN_INDICATORS.
  3. Treat dep:react + dep:react-dom together as meeting the bar (the combination is browser-specific and rarely transitive-only).

Comment on lines +937 to +952
const indicators: string[] = ['dep:@angular/core'];
if (allDeps['@angular/common']) indicators.push('dep:@angular/common');
if (allDeps['@angular/compiler-cli']) indicators.push('dep:@angular/compiler-cli');
if (allDeps['@angular/cli']) indicators.push('dep:@angular/cli');
try {
await fs.stat(path.join(rootPath, 'angular.json'));
indicators.push('disk:angular-json');
} catch {
/* absent */
}
try {
await fs.stat(path.join(rootPath, 'tsconfig.app.json'));
indicators.push('disk:tsconfig-app');
} catch {
/* absent */
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Angular library projects may silently lose framework detection

dep:@angular/core is the only guaranteed indicator in this block — the rest are conditional. An Angular library project that lists @angular/core in peerDependencies (no @angular/common, and no CLI-generated angular.json or tsconfig.app.json) ends up with just 1 indicator, well below MIN_INDICATORS = 3. Unlike the old base.framework || incoming.framework path, selectFramework will silently return undefined.

Before this PR, angularCoreVersion alone was sufficient to classify a project as Angular (which is a much tighter guard than just dep:react). The same threshold now uniformly penalises Angular, which had a more reliable single-dep signal to begin with.

Suggested mitigations:

  • Add disk checks for ng-package.json (Angular library projects) and tsconfig.json so library projects get extra indicators.
  • Or use a per-framework threshold so Angular (which already has a strong one-dep guard) can pass at 2 indicators.

Comment on lines +1278 to +1287
): FrameworkInfo | undefined {
const MIN_INDICATORS = 3;
const passes = (f: FrameworkInfo | undefined): f is FrameworkInfo =>
!!f && (f.indicators?.length ?? 0) >= MIN_INDICATORS;

if (passes(base) && passes(incoming)) return base; // higher-priority analyzer wins
if (passes(base)) return base;
if (passes(incoming)) return incoming;
return undefined;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Silent framework drop is hard to diagnose

When selectFramework returns undefined because neither candidate crosses the threshold, there is no log message. If a downstream caller gets metadata.framework === undefined on a project they know is Angular or React, there is no way to see which analyzer ran, how many indicators were collected, or why the claim was rejected.

A single console.debug emitting something like "[selectFramework] dropped framework claim — only N indicators (threshold: MIN_INDICATORS)" would make future misclassification reports trivially diagnosable.

Suggested change
): FrameworkInfo | undefined {
const MIN_INDICATORS = 3;
const passes = (f: FrameworkInfo | undefined): f is FrameworkInfo =>
!!f && (f.indicators?.length ?? 0) >= MIN_INDICATORS;
if (passes(base) && passes(incoming)) return base; // higher-priority analyzer wins
if (passes(base)) return base;
if (passes(incoming)) return incoming;
return undefined;
}
private selectFramework(
base: FrameworkInfo | undefined,
incoming: FrameworkInfo | undefined
): FrameworkInfo | undefined {
const MIN_INDICATORS = 3;
const passes = (f: FrameworkInfo | undefined): f is FrameworkInfo =>
!!f && (f.indicators?.length ?? 0) >= MIN_INDICATORS;
if (passes(base) && passes(incoming)) return base; // higher-priority analyzer wins
if (passes(base)) return base;
if (passes(incoming)) return incoming;
const dropped = base ?? incoming;
if (dropped) {
console.debug(
`[selectFramework] dropped ${dropped.name} claim — ${dropped.indicators?.length ?? 0} indicators (threshold: ${MIN_INDICATORS})`
);
}
return undefined;
}

Comment on lines +155 to +167
const metadata = await indexer.detectMetadata();

expect(metadata.framework).toBeUndefined();
});

it('drops React claim when indicators are below threshold', async () => {
// react alone is only 1 indicator — should not meet the >=3 threshold
await fs.writeFile(
path.join(tempDir, 'package.json'),
JSON.stringify({ name: 'thin-react', dependencies: { react: '^18' } })
);

const indexer = new CodebaseIndexer({ rootPath: tempDir });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing test coverage for the 2-indicator React case

The comment on this test says "react alone is only 1 indicator — should not meet the >=3 threshold." The symmetric case — react + react-dom (2 indicators, no @types/react) — is not tested and is the common shape of a plain JavaScript React app. Adding a case like the one below would surface the regression described in react/index.ts:

it('drops React claim for react + react-dom without @types/react (2 indicators)', async () => {
    await fs.writeFile(
        path.join(tempDir, 'package.json'),
        JSON.stringify({ name: 'js-react', dependencies: { react: '^18', 'react-dom': '^18' } })
    );
    const indexer = new CodebaseIndexer({ rootPath: tempDir });
    const metadata = await indexer.detectMetadata();
    // Decide: should this be undefined (current) or 'react' (intended)?
    expect(metadata.framework?.type).toBe('react'); // currently fails
});

This commit fixes two critical P1 regressions in framework detection:

**React regression**: Plain JS React apps (CRA/Vite without TypeScript) produce only
2 indicators (dep:react, dep:react-dom) below the MIN_INDICATORS=3 threshold. Added two
disk-based checks (disk:src-directory, disk:public-index-html) to bring them to 3.

**Angular regression**: Angular library projects declare @angular/core in peerDependencies
only (not in dependencies/devDependencies). Added peerDependencies to the allDeps merge,
and added disk:ng-package-json indicator to identify library projects.

**Debug logging**: selectFramework() now logs when framework claims are dropped below
threshold, aiding production diagnosis.

**Tests added**:
- Plain JS React: react + react-dom + src directory → framework detected
- Angular library: @angular/core in peerDependencies + ng-package.json → framework detected

All 449 tests passing (2 new). TypeScript: clean.
@PatrickSys PatrickSys merged commit ae649dd into master Apr 12, 2026
2 of 3 checks passed
@PatrickSys PatrickSys deleted the fix/phase-14-metadata-misclassification branch April 12, 2026 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant