Skip to content

Bug: File parsing fails silently on Windows when project path contains accented/non-ASCII characters #700

Description

@CesarBecVal

Version

0.8.1

Platform

Windows (x64)

Install channel

GitHub release archive / install.sh / install.ps1

Binary variant

ui

What happened, and what did you expect?

When codebase-memory-mcp indexes a repository on Windows, if the absolute path to the repository contains accented characters (e.g., C:\Users...\Proyectos De César...), the indexer correctly identifies and creates File nodes for all files in the project, but fails silently to extract any functions, classes, or AST-based nodes via tree-sitter.

The files appear in the graph (e.g. searching for label: File returns them), but searching for label: Function returns 0 results because the internal file reading/parsing phase fails.

codebase-memory-mcp should be able to open and parse the contents of files even if their absolute paths contain non-ASCII characters on Windows.

The files are registered in the index as File nodes (likely via directory traversal), but their contents are never successfully parsed by tree-sitter, leaving the knowledge graph completely devoid of semantic structural nodes (Functions, Classes, Methods, etc.) without throwing an explicit fatal error during indexing.

Because codebase-memory-mcp is a static C binary, this is likely an encoding issue with how Windows standard I/O handles file paths. Standard functions like fopen on Windows expect ANSI paths by default unless the program explicitly uses wide-character APIs (e.g., _wfopen) or the UTF-8 code page is enforced. When the C backend attempts to open the file using its UTF-8 path string, Windows fails to find the file, causing the parser to silently skip it.

Reproduction

1.- On a Windows machine, create a directory with an accented character, e.g., mkdir C:\test-césar
2.- Create a dummy Python file inside it: echo "def hello(): pass" > C:\test-césar\main.py
3.- Run the indexer on this path via the MCP tool or CLI.
4.- Query the graph for files: MATCH (n:File) RETURN n (Returns the main.py file).
5.- Query the graph for functions: MATCH (n:Function) RETURN n (Returns 0 results).
6.- Repeat the steps in a path without accents (e.g., C:\test-cesar), and the function hello is correctly extracted.

Logs


Diagnostics trajectory (memory / performance / leak issues)


Project scale (if relevant)

261 nodes / 249 edges / 232 files

Confirmations

  • I searched existing issues and this is not a duplicate.
  • My reproduction uses shareable code (a dummy snippet or a public OSS repository), not proprietary code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingduplicateThis issue or pull request already existsparsing/qualityGraph extraction bugs, false positives, missing edgespriority/backlogValuable contribution, lower scheduling urgency; review when maintainer capacity opens.windowsWindows-specific issues

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions