fix(metadata): prevent framework misclassification in codebase detection#96
Conversation
Fixes false-positive framework claims on generic Node repos by: - Adding framework.indicators to FrameworkInfo (evidence signals per claim) - Adding dep guards to Next.js and React analyzers (framework only if core dep present) - Adding indicator collection to Angular analyzer (inside existing guard) - Enforcing >=3 indicator threshold in indexer.mergeMetadata() before promoting claims These changes prevent repos without react/nextjs/@angular/core dependencies from being incorrectly flagged with a framework type. Framework claims now require enumerated evidence signals that demonstrate genuine framework presence. Tests: 21/21 passing (4 new negative tests, 2 indicator assertions, 4 merge E2E)
Greptile SummaryThis PR introduces evidence-based framework detection: each analyzer now collects
Confidence Score: 3/5Not safe to merge as-is — two P1 regressions cause silent false-negatives for common project shapes (plain JS React apps and Angular library projects). The Next.js path is solid (6 possible indicators, strong dep:next guard), but the global MIN_INDICATORS = 3 threshold is not well-calibrated for React (only 4 dep signals, disk signals missing) or Angular library projects. Both represent current real defects on the changed path that didn't exist before. src/analyzers/react/index.ts (missing disk indicators to reliably reach threshold), src/analyzers/angular/index.ts (Angular library projects under-served by current indicator set), tests/indexer-metadata.test.ts (missing react+react-dom 2-indicator test case)
|
| Filename | Overview |
|---|---|
| src/analyzers/react/index.ts | Adds indicator collection and guard before setting framework, but plain JS React apps (react + react-dom, no @types/react) get only 2 indicators — silently dropped by selectFramework's threshold of 3. |
| src/analyzers/angular/index.ts | Adds indicator evidence under the existing angularCoreVersion guard, but Angular library projects (only @angular/core, no angular.json) can end up with fewer than 3 indicators and have their framework claim silently dropped. |
| src/analyzers/nextjs/index.ts | Wraps framework assignment behind dep:next guard and collects up to 6 indicators (3 dep + 3 disk); a Next.js project will reliably hit the threshold of 3. |
| src/core/indexer.ts | Replaces simple OR-precedence merge with selectFramework() using a MIN_INDICATORS = 3 threshold; logic is correct but drops valid claims silently with no debug logging. |
| src/types/index.ts | Adds generic indicators?: readonly string[] to FrameworkInfo — framework-agnostic field name, no rule violations. |
| tests/indexer-metadata.test.ts | Good new regression tests, but missing coverage for react + react-dom (2-indicator case) which is the most common plain JS React shape and currently silently broken. |
| tests/nextjs-analyzer.test.ts | Adds two well-scoped tests: absence of dep:next suppresses framework, and positive indicator assertions for a full Next.js setup. |
| tests/react-analyzer.test.ts | Adds absence test and positive indicator assertion; the 2-indicator JS React gap is in the integration test, not here. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Analyzer runs detectCodebaseMetadata] --> B{Primary dep present?\ne.g. dep:react / dep:next / angularCoreVersion}
B -- No --> C[framework = undefined]
B -- Yes --> D[Collect indicators\ndep:* + disk:*]
D --> E[Set metadata.framework\nwith indicators array]
E --> F[mergeMetadata called in indexer]
F --> G[selectFramework\nbase vs incoming]
G --> H{indicators.length >= MIN_INDICATORS 3?}
H -- Neither passes --> I[return undefined\n⚠️ silent drop, no log]
H -- Only base passes --> J[return base]
H -- Only incoming passes --> K[return incoming]
H -- Both pass --> L[return base\nhigher-priority wins]
Reviews (1): Last reviewed commit: "fix(metadata): prevent framework misclas..." | Re-trigger Greptile
| const indicators: string[] = []; | ||
| if (packageInfo.allDependencies.react) indicators.push('dep:react'); | ||
| if (packageInfo.allDependencies['react-dom']) indicators.push('dep:react-dom'); | ||
| if (packageInfo.allDependencies['@types/react']) indicators.push('dep:@types/react'); | ||
| if (packageInfo.allDependencies['react-native']) indicators.push('dep:react-native'); |
There was a problem hiding this comment.
Plain JS React apps silently lose framework detection
The React analyzer collects only four dep-based indicators (dep:react, dep:react-dom, dep:@types/react, dep:react-native). A vanilla JavaScript React app that has react + react-dom but no TypeScript (@types/react) and no React Native ends up with exactly 2 indicators — one below the MIN_INDICATORS = 3 threshold in selectFramework. The framework claim is silently dropped in the merge step, even though react + react-dom together are a very strong signal of a real React project.
Before this PR the old base.framework || incoming.framework path would have kept the claim, so this is a regression for a common project shape. The test suite only covers the 1-indicator case (react alone) and the 3-indicator case (react + react-dom + @types/react); the 2-indicator case is untested and broken.
Possible fixes:
- Add at least one disk-based indicator (e.g. presence of
src/App.jsx,src/App.tsx, orpublic/index.html) so a browser React app can always reach 3 signals. - Lower the threshold for the React analyzer to 2, using a per-framework constant instead of the shared
MIN_INDICATORS. - Treat
dep:react+dep:react-domtogether as meeting the bar (the combination is browser-specific and rarely transitive-only).
| const indicators: string[] = ['dep:@angular/core']; | ||
| if (allDeps['@angular/common']) indicators.push('dep:@angular/common'); | ||
| if (allDeps['@angular/compiler-cli']) indicators.push('dep:@angular/compiler-cli'); | ||
| if (allDeps['@angular/cli']) indicators.push('dep:@angular/cli'); | ||
| try { | ||
| await fs.stat(path.join(rootPath, 'angular.json')); | ||
| indicators.push('disk:angular-json'); | ||
| } catch { | ||
| /* absent */ | ||
| } | ||
| try { | ||
| await fs.stat(path.join(rootPath, 'tsconfig.app.json')); | ||
| indicators.push('disk:tsconfig-app'); | ||
| } catch { | ||
| /* absent */ | ||
| } |
There was a problem hiding this comment.
Angular library projects may silently lose framework detection
dep:@angular/core is the only guaranteed indicator in this block — the rest are conditional. An Angular library project that lists @angular/core in peerDependencies (no @angular/common, and no CLI-generated angular.json or tsconfig.app.json) ends up with just 1 indicator, well below MIN_INDICATORS = 3. Unlike the old base.framework || incoming.framework path, selectFramework will silently return undefined.
Before this PR, angularCoreVersion alone was sufficient to classify a project as Angular (which is a much tighter guard than just dep:react). The same threshold now uniformly penalises Angular, which had a more reliable single-dep signal to begin with.
Suggested mitigations:
- Add disk checks for
ng-package.json(Angular library projects) andtsconfig.jsonso library projects get extra indicators. - Or use a per-framework threshold so Angular (which already has a strong one-dep guard) can pass at 2 indicators.
| ): FrameworkInfo | undefined { | ||
| const MIN_INDICATORS = 3; | ||
| const passes = (f: FrameworkInfo | undefined): f is FrameworkInfo => | ||
| !!f && (f.indicators?.length ?? 0) >= MIN_INDICATORS; | ||
|
|
||
| if (passes(base) && passes(incoming)) return base; // higher-priority analyzer wins | ||
| if (passes(base)) return base; | ||
| if (passes(incoming)) return incoming; | ||
| return undefined; | ||
| } |
There was a problem hiding this comment.
Silent framework drop is hard to diagnose
When selectFramework returns undefined because neither candidate crosses the threshold, there is no log message. If a downstream caller gets metadata.framework === undefined on a project they know is Angular or React, there is no way to see which analyzer ran, how many indicators were collected, or why the claim was rejected.
A single console.debug emitting something like "[selectFramework] dropped framework claim — only N indicators (threshold: MIN_INDICATORS)" would make future misclassification reports trivially diagnosable.
| ): FrameworkInfo | undefined { | |
| const MIN_INDICATORS = 3; | |
| const passes = (f: FrameworkInfo | undefined): f is FrameworkInfo => | |
| !!f && (f.indicators?.length ?? 0) >= MIN_INDICATORS; | |
| if (passes(base) && passes(incoming)) return base; // higher-priority analyzer wins | |
| if (passes(base)) return base; | |
| if (passes(incoming)) return incoming; | |
| return undefined; | |
| } | |
| private selectFramework( | |
| base: FrameworkInfo | undefined, | |
| incoming: FrameworkInfo | undefined | |
| ): FrameworkInfo | undefined { | |
| const MIN_INDICATORS = 3; | |
| const passes = (f: FrameworkInfo | undefined): f is FrameworkInfo => | |
| !!f && (f.indicators?.length ?? 0) >= MIN_INDICATORS; | |
| if (passes(base) && passes(incoming)) return base; // higher-priority analyzer wins | |
| if (passes(base)) return base; | |
| if (passes(incoming)) return incoming; | |
| const dropped = base ?? incoming; | |
| if (dropped) { | |
| console.debug( | |
| `[selectFramework] dropped ${dropped.name} claim — ${dropped.indicators?.length ?? 0} indicators (threshold: ${MIN_INDICATORS})` | |
| ); | |
| } | |
| return undefined; | |
| } |
| const metadata = await indexer.detectMetadata(); | ||
|
|
||
| expect(metadata.framework).toBeUndefined(); | ||
| }); | ||
|
|
||
| it('drops React claim when indicators are below threshold', async () => { | ||
| // react alone is only 1 indicator — should not meet the >=3 threshold | ||
| await fs.writeFile( | ||
| path.join(tempDir, 'package.json'), | ||
| JSON.stringify({ name: 'thin-react', dependencies: { react: '^18' } }) | ||
| ); | ||
|
|
||
| const indexer = new CodebaseIndexer({ rootPath: tempDir }); |
There was a problem hiding this comment.
Missing test coverage for the 2-indicator React case
The comment on this test says "react alone is only 1 indicator — should not meet the >=3 threshold." The symmetric case — react + react-dom (2 indicators, no @types/react) — is not tested and is the common shape of a plain JavaScript React app. Adding a case like the one below would surface the regression described in react/index.ts:
it('drops React claim for react + react-dom without @types/react (2 indicators)', async () => {
await fs.writeFile(
path.join(tempDir, 'package.json'),
JSON.stringify({ name: 'js-react', dependencies: { react: '^18', 'react-dom': '^18' } })
);
const indexer = new CodebaseIndexer({ rootPath: tempDir });
const metadata = await indexer.detectMetadata();
// Decide: should this be undefined (current) or 'react' (intended)?
expect(metadata.framework?.type).toBe('react'); // currently fails
});This commit fixes two critical P1 regressions in framework detection: **React regression**: Plain JS React apps (CRA/Vite without TypeScript) produce only 2 indicators (dep:react, dep:react-dom) below the MIN_INDICATORS=3 threshold. Added two disk-based checks (disk:src-directory, disk:public-index-html) to bring them to 3. **Angular regression**: Angular library projects declare @angular/core in peerDependencies only (not in dependencies/devDependencies). Added peerDependencies to the allDeps merge, and added disk:ng-package-json indicator to identify library projects. **Debug logging**: selectFramework() now logs when framework claims are dropped below threshold, aiding production diagnosis. **Tests added**: - Plain JS React: react + react-dom + src directory → framework detected - Angular library: @angular/core in peerDependencies + ng-package.json → framework detected All 449 tests passing (2 new). TypeScript: clean.
What / Why
Fixes false-positive framework classification in
get_codebase_metadatafor generic Node repos.What changed:
framework.indicatorstoFrameworkInfoas evidence signalsdep:*,disk:*) and only claims framework whendep:nextis presentdep:reactis presentselectFramework()with a minimum evidence threshold (>= 3 indicators) before preserving a framework claimTesting
Release notes
Include as a bug fix: