Skip to content

Android XML resource files inflate index: 26k XML files (97% of files, 0% of symbols) #1047

Description

@jcrabapple

Android XML resource files inflate index with zero symbol extraction (27k → 97% of files, 0% of nodes)

On Android projects, XML resource files (layouts, drawables, values, menus) dominate the index by file count but contribute zero symbols, inflating DB size and indexing time with no code intelligence value.

Reproduction

# Any standard Android project with res/ directory
codegraph init .
codegraph status

Actual output

Files:     27,095
Nodes:     46,575
DB Size:   77.93 MB

Nodes by Kind:
  file            27,089      ← 97% of "nodes" are just file records
  import          7,955
  method          5,388
  field           3,320
  class           1,261
  ...

Files by Language:
  xml             26,453      ← 97.6% of all files
  java            636
  properties      3
  yaml            3

The problem

26,453 XML files (Android resources under res/) are indexed as file nodes with zero symbol extraction. Tree-sitter's XML grammar parses the tree but extracts no functions, classes, methods, or fields — there's nothing useful to extract from <TextView android:layout_width="match_parent" />.

These files:

  • Inflate DB size (78 MB for a 636-Java-file project)
  • Inflate indexing time (2m 39s — much of it parsing XML that yields nothing)
  • Inflate the file node count (27,089 file nodes vs ~19,486 actual symbol nodes)
  • Dilute codegraph_explore result quality ("Found 260 symbols across 124 files" — the file count includes XML matches)
  • Provide zero code intelligence value

Expected behavior

XML resource files should either:

  1. Be excluded by default — Android res/ directories (and similar resource-only dirs in other ecosystems) contain no executable code symbols. A default exclusion pattern for **/res/**/*.xml (and potentially **/build/**/*.xml) would be appropriate.
  2. Be excluded from symbol count and explore results — if indexed for file-tracking purposes, they shouldn't participate in explore/search result sets since they contribute no symbols.
  3. Be configurable via .codegraphignore — I see that Feature Request: Support .ignore files to override .gitignore exclusions #699 (.ignore files) and [Feature Request] Add local-only mode for .codegraph/.gitignore #910 (.codegraph/.gitignore) are open. Either of those would let users exclude res/, but a sensible default for Android projects would help out of the box.

Impact

  • DB is ~5x larger than necessary (78 MB vs ~15 MB if XML were excluded)
  • Indexing is ~2x slower than necessary
  • codegraph status and codegraph explore file counts are misleading (27,095 "files" suggests a much larger codebase than 636 Java source files)
  • Part of the noise in explore results (issue: explore excessive results) comes from XML file matches

Workaround

Users can create a .codegraphignore or add patterns to project config, but this requires knowing the issue exists. The default behavior should be smarter for Android projects.

Environment

  • CodeGraph v1.1.2 (linux-x64, bundled install)
  • Project: Moshidon/Tootsie fork (Mastodon Android client)
  • 26,453 XML files, 636 Java files

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions