Summary
ReferenceExtractor.IgnoredCallNames is a single HashSet<string> that bundles together keywords from many languages — Makefile (all, clean, install, build, run, help), Gradle/Groovy (apply, task, dependencies, repositories, version, description, group, ext, …), Haskell (putStrLn, putStr, print, Just, Nothing, Left, Right, True, False), R (library, cat, paste, paste0, sprintf, stop, warning, message, tryCatch, …), Terraform (resource, data, variable, module, provider, …), Shell (echo, cd, set, unset, export, source, eval, exec, test, read, …), and "other languages" (require, import, include, raise, lambda).
The set is consulted once per call match regardless of the file's language (ReferenceExtractor.cs:158), so e.g. run() / build() / install() / clean() / help() / apply() / task() / require() / print() / library() / cat() / cd() / read() / test() / source() calls in Python / JavaScript / TypeScript / C# / Go / Rust / Java / Kotlin / Ruby / C / C++ / PHP / Swift / Dart / Scala / Elixir / F# code are silently dropped from symbol_references.
This makes references, callers, callees, and impact return wrong / missing results for any function that happens to share a name with a keyword from an unrelated language.
Repro
mkdir -p /tmp/dogfood/keyword_test && cd /tmp/dogfood/keyword_test
cat > a.py <<'EOF'
def caller():
run()
build()
install()
clean()
help()
print()
require()
notexcluded()
apply()
task()
EOF
/root/.local/bin/cdidx index .
sqlite3 .cdidx/codeindex.db \
"SELECT symbol_name, line, reference_kind, container_name FROM symbol_references ORDER BY line;"
Observed:
notexcluded|9|call|caller
Only 1 of 10 calls indexed. The other 9 are silently dropped because run, build, install, clean, help, print, require, apply, task all live in the global IgnoredCallNames set.
Downstream effect — none of these work as users would expect:
/root/.local/bin/cdidx callers run --exact --json
/root/.local/bin/cdidx callees caller --exact --json
/root/.local/bin/cdidx references print --exact --json
/root/.local/bin/cdidx impact apply --json
All four return zero / wrong results even though the calls obviously exist in indexed Python source.
Suspected root cause (from reading the source)
src/CodeIndex/Indexer/ReferenceExtractor.cs:
- Lines 22–69:
IgnoredCallNames is a single HashSet<string> that mixes Makefile, Gradle, Terraform, R, PowerShell, Haskell, Shell, F#, Java, C#, "Other languages" entries.
- Line 155–164 (the
CallRegex loop): every match goes through one untyped IgnoredCallNames.Contains(name) check, with no language is "..." guard.
So the per-language comments above each block in the set are aspirational only — at runtime there is no per-language scoping. A Makefile target name and a Python builtin and a Haskell prelude name are all blocked uniformly across all 31 graph-supported languages.
For comparison, EventSubscriptionRegex at line 149 is properly gated by if (language is "csharp"). The keyword exclusion needs the same treatment.
Suggested fix
Replace the single IgnoredCallNames HashSet with a small lookup keyed by language (or a base set of truly universal control-flow / declaration keywords plus per-language overlays merged at filter time). At minimum:
all, clean, install, build, run, help should only apply to language is "makefile".
apply, plugins, dependencies, repositories, allprojects, subprojects, task, buildscript, ext, group, version, description should only apply to gradle.
resource, data, variable, output, locals, module, provider, terraform, required_providers, backend should only apply to terraform.
library, cat, paste, paste0, sprintf, stop, warning, message, invisible, tryCatch, withCallingHandlers, next, break, repeat should only apply to r.
putStrLn, putStr, print, Just, Nothing, Left, Right, True, False, data, newtype, instance, deriving, infixl, infixr, infix, qualified, hiding, forall should only apply to haskell.
- Shell built-ins (
echo, exit, cd, set, unset, export, source, eval, exec, test, read, shift, trap, local, declare, readonly) should only apply to shell — currently shell is not even in SupportedLanguages (line 12–20), so this whole block is dead weight that only ever blocks calls in the other 31 languages it was never meant for.
- "Other languages"
require, import, include, raise, lambda — require blocks legitimate Ruby require calls and Node require() calls and a function named require; print (Haskell list above) blocks Python's print everywhere. These need careful per-language scoping.
- Genuinely cross-language keywords (
if, else, for, while, switch, catch, lock, do, try, when, sizeof, typeof, return, throw, nameof, await, using, new, class, struct, record, interface, enum, delegate, event, namespace, def, function, func) can stay in a shared base set.
Tests should cover at least one positive case per language family — e.g. Python print() must produce a reference row, JS require('x') must produce one, Ruby require 'x' must produce one, C# task.Run() must produce a Run call ref, etc.
Environment
- cdidx v1.10.0 (installed via official
install.sh to /root/.local/bin/cdidx)
- Linux, .NET 8
https://claude.ai/code/session_01Bi6Vn3v37ViFbJroJkpUe3
Summary
ReferenceExtractor.IgnoredCallNamesis a singleHashSet<string>that bundles together keywords from many languages — Makefile (all,clean,install,build,run,help), Gradle/Groovy (apply,task,dependencies,repositories,version,description,group,ext, …), Haskell (putStrLn,putStr,print,Just,Nothing,Left,Right,True,False), R (library,cat,paste,paste0,sprintf,stop,warning,message,tryCatch, …), Terraform (resource,data,variable,module,provider, …), Shell (echo,cd,set,unset,export,source,eval,exec,test,read, …), and "other languages" (require,import,include,raise,lambda).The set is consulted once per call match regardless of the file's language (
ReferenceExtractor.cs:158), so e.g.run()/build()/install()/clean()/help()/apply()/task()/require()/print()/library()/cat()/cd()/read()/test()/source()calls in Python / JavaScript / TypeScript / C# / Go / Rust / Java / Kotlin / Ruby / C / C++ / PHP / Swift / Dart / Scala / Elixir / F# code are silently dropped fromsymbol_references.This makes
references,callers,callees, andimpactreturn wrong / missing results for any function that happens to share a name with a keyword from an unrelated language.Repro
Observed:
Only 1 of 10 calls indexed. The other 9 are silently dropped because
run,build,install,clean,help,print,require,apply,taskall live in the globalIgnoredCallNamesset.Downstream effect — none of these work as users would expect:
/root/.local/bin/cdidx callers run --exact --json /root/.local/bin/cdidx callees caller --exact --json /root/.local/bin/cdidx references print --exact --json /root/.local/bin/cdidx impact apply --jsonAll four return zero / wrong results even though the calls obviously exist in indexed Python source.
Suspected root cause (from reading the source)
src/CodeIndex/Indexer/ReferenceExtractor.cs:IgnoredCallNamesis a singleHashSet<string>that mixes Makefile, Gradle, Terraform, R, PowerShell, Haskell, Shell, F#, Java, C#, "Other languages" entries.CallRegexloop): every match goes through one untypedIgnoredCallNames.Contains(name)check, with nolanguage is "..."guard.So the per-language comments above each block in the set are aspirational only — at runtime there is no per-language scoping. A Makefile target name and a Python builtin and a Haskell prelude name are all blocked uniformly across all 31 graph-supported languages.
For comparison,
EventSubscriptionRegexat line 149 is properly gated byif (language is "csharp"). The keyword exclusion needs the same treatment.Suggested fix
Replace the single
IgnoredCallNamesHashSet with a small lookup keyed by language (or a base set of truly universal control-flow / declaration keywords plus per-language overlays merged at filter time). At minimum:all,clean,install,build,run,helpshould only apply tolanguage is "makefile".apply,plugins,dependencies,repositories,allprojects,subprojects,task,buildscript,ext,group,version,descriptionshould only apply togradle.resource,data,variable,output,locals,module,provider,terraform,required_providers,backendshould only apply toterraform.library,cat,paste,paste0,sprintf,stop,warning,message,invisible,tryCatch,withCallingHandlers,next,break,repeatshould only apply tor.putStrLn,putStr,print,Just,Nothing,Left,Right,True,False,data,newtype,instance,deriving,infixl,infixr,infix,qualified,hiding,forallshould only apply tohaskell.echo,exit,cd,set,unset,export,source,eval,exec,test,read,shift,trap,local,declare,readonly) should only apply toshell— currentlyshellis not even inSupportedLanguages(line 12–20), so this whole block is dead weight that only ever blocks calls in the other 31 languages it was never meant for.require,import,include,raise,lambda—requireblocks legitimate Rubyrequirecalls and Noderequire()calls and a function namedrequire;print(Haskell list above) blocks Python'sprinteverywhere. These need careful per-language scoping.if,else,for,while,switch,catch,lock,do,try,when,sizeof,typeof,return,throw,nameof,await,using,new,class,struct,record,interface,enum,delegate,event,namespace,def,function,func) can stay in a shared base set.Tests should cover at least one positive case per language family — e.g. Python
print()must produce a reference row, JSrequire('x')must produce one, Rubyrequire 'x'must produce one, C#task.Run()must produce aRuncall ref, etc.Environment
install.shto/root/.local/bin/cdidx)https://claude.ai/code/session_01Bi6Vn3v37ViFbJroJkpUe3