Multi-language code indexer (written in C11) for semantic search, built on SQLite and tree-sitter.
Languages currently implemented: C, Go, PHP, Python, TypeScript
Database: Creates code-index.db in current working directory
Note: This tool is designed for indexing source code (functions, classes, variables, ...),
not prose or documentation files (.md, .txt, .log, ...).
While you can index any text file, the tool extracts code symbols and won't be useful for natural language text.
On an apt-based Linux (like Debian), install the following dependencies. (MacOS users, see MACOS_SETUP.md) I have not yet tested on non-apt Linux, so any help here would be appreciated.
apt install libtree-sitter-dev libtree-sitter0 libsqlite3-devClone the repo
git clone https://github.com/ebcode/SourceMinder.git && cd SourceMinderClone the tree-sitter grammars (at least one):
git clone https://github.com/tree-sitter/tree-sitter-c.git
git clone https://github.com/tree-sitter/tree-sitter-go.git
git clone https://github.com/tree-sitter/tree-sitter-php.git
git clone https://github.com/tree-sitter/tree-sitter-python.git
git clone https://github.com/tree-sitter/tree-sitter-typesript.gitSelect which languages you want to build (all disabled by default):
./configure --enable-all # All languages (recommended for testing)
./configure --enable-c --enable-typescript --enable-php # Specific languages
./configure --enable-all --disable-php # All but PHP
CC=clang ./configure --enable-c # Custom compiler, only Cmake # Build indexers and query tool
sudo make install # Install to /usr/local/binInstalled binaries: index-c, index-ts, index-php, index-go, index-python, qi
Config files: /usr/local/share/sourceminder/<language>/config/
| Flag | Purpose | Example |
|---|---|---|
-i <type...> |
Include context types (OR) | qi user -i func var prop |
-x <type...> |
Exclude context types (OR) | qi user -x comment string |
--list-types |
Display full list of context types | qi --list-types |
-x noise |
Exclude comments & strings | qi user -x noise |
-f <pattern...> |
Filter by file paths (OR) | qi user -f .c .h or qi user -f src/* lib/* |
-p <pattern> |
Filter by parent symbol | qi count -p patterns finds patterns->count |
-t <pattern> |
Filter by type annotation | qi '*' -i arg -t 'int *' finds int pointer args |
-m <pattern> |
Filter by modifier | qi '*' -i func -m static finds static functions |
-s <pattern> |
Filter by scope | qi '*' -s public finds public members |
-e |
Expand full definitions | qi getUserById -i func -e |
-C <n> |
Show n context lines | qi user -C 3 |
-A <n> |
Show n lines after | qi user -A 5 |
-B <n> |
Show n lines before | qi user -B 5 |
--and |
Multi-pattern same-line | qi fprintf stderr --and |
--and <n> |
Multi-pattern within n lines | qi malloc free --and 10 |
--def |
Only definitions | qi getUserById --def |
--usage |
Only usages | qi User --usage |
--within <sym> |
Search within function/class | qi malloc --within handle_request |
--limit <n> |
Limit results | qi '*' --limit 20 |
--toc |
Table of contents | qi '*' -f file.c --toc |
--columns |
Custom columns | qi user --columns line symbol context parent |
-v |
Verbose (all columns) | qi user -v |
--full |
Full context names | qi user --full |
IMPORTANT: Multi-Value Arguments (Non-Standard UNIX Pattern)
Unlike most UNIX tools, qi accepts multiple values for both the main pattern and subsequent flags:
# Multiple search patterns (OR logic)
qi user session token --limit 20 # Find user OR session OR token
# Multiple context types (OR logic)
qi user -i func var prop # Functions OR variables OR properties
# Multiple file patterns (OR logic)
qi symbol -f file1.c file2.c utils.h # In any of these files
qi malloc -f *.c *.h # All .c OR .h files
# Combine multiple values across arguments
qi malloc free -i func call -f memory.c allocator.c buffer.cComparison with standard UNIX tools:
- Standard:
grep -f pattern.txt file.c(one file per-f) - qi:
qi symbol -f file1.c file2.c file3.c(multiple files per-f)
Pattern wildcards:
*= any characters (e.g.,*Manager,get*,*user*).= single character (e.g.,.etUsermatchesgetUser,setUser)- No wildcards = exact match (case-insensitive)
- Note:
%and_also work (SQL LIKE syntax) but*and.are recommended
Escaping special characters:
\*= literal asterisk (e.g.,operator\*findsoperator*)\.= literal period (e.g.,file\.cfindsfile.c)\--flag= search for flag-like symbols (e.g.,\--helpfinds--help)
Examples:
qi 'get*' -i func # Wildcard: finds getUser, getData, etc.
qi 'get\*' -i func # Literal: finds "get*" symbol if it exists
qi '.etUser' # Wildcard: finds getUser, setUser (. = single char)
qi '\--help' # Literal: finds "--help" stringFor Developers:
- Browse a project's vocabulary (like a book index)
- Find symbols by context (classes, functions, variables, etc.)
- Track relationships with parent symbols (e.g.,
obj.method()) - Filter by access modifiers (public, private, protected)
- Query with SQL wildcards for partial matches
qi vs grep - Different tools for different jobs:
- qi - Recommended default for code navigation: symbols (functions, classes, variables, types), structure exploration, precision filtering
- grep - Literal text (error messages, comments), regex patterns, non-code files
A few reasons to prefer defaulting to qi in lieu of grep:
--tocgives instant file overview (like a book's "table of contents")- Context filtering (
-i func,-x noise) eliminates false positives --within functionenables scoped search (impossible with grep)-eshows full definitions inline
For Claude Code:
- Faster workflow than grep + Read for code navigation
- Saves tokens with compact output
- Relative paths work directly with Read/Edit tools
- Context-aware filtering reduces noise
- Use
qi -f file.c --tocbefore Read tool to understand structure - Use
-eto see full definitions without Read tool
Beyond basic pattern matching, qi tracks rich metadata about every symbol. These filters enable greater precision:
Find member access, method calls, and struct/class fields:
qi count -p patterns # Find patterns->count or patterns.count
qi 'field%' -p 'config%' # All fields accessed on config objects
qi init -p User # Find User->init or User.init methods
qi '*' -p 'request%' --limit 20 # All members of request-like objectsHow it works: Tracks the parent in expressions like obj->field, obj.method(), array[i].field
Filter by type declarations (for refactoring):
qi '*' -i arg -t 'int *' # All int pointer arguments
qi '*' -i var -t 'char *' # All char pointer variables
qi '*' -i arg -t 'OldType%' # Find args using deprecated types
qi '*' -t 'uint32_t' --limit 20 # All uint32_t usageUse case: Type-based refactoring, finding all uses of a specific type
Filter by access modifiers and storage classes:
qi '*' -i func -m static --limit 10 # All static functions
qi '*' -i var -m const # All const variables
qi '*' -m private # Private members
qi '*' -m inline -i func # Inline functionsUse case: Understanding code visibility, finding optimization candidates
Filter by visibility scope:
qi '*' -s public -i func # Public API functions
qi '*' -s private -i prop # Private properties
qi method -s protected # Protected methodsUse case: API analysis, understanding encapsulation
Search only within specific functions or classes:
qi malloc --within handle_request # malloc calls only in handle_request
qi fprintf --within 'debug%' # fprintf in debug-related functions
qi strcmp --within parse_args # strcmp only in parse_args
qi '*' -i var --within main # All variables declared in mainUse case: Understanding local behavior, debugging specific functions
Stack filters for maximum precision:
# Find static int pointer arguments in memory functions
qi '*' -i arg -t 'int *' -m static --within 'mem*'
# Find private fields accessed on config objects
qi '*' -p config -s private -i prop
# Find all malloc calls in static helper functions
qi malloc -i call --within '*helper*' -m staticPro tip: Use -v (verbose) to see all available columns, then craft precise queries
Step 1: Index the SourceMinder codebase
index-c . --once --verboseStep 2: Query (single and multiple patterns)
qi -f shared/indexer-main.c --toc # Table-of-Contents output
qi handle_import_statement # Exact match
qi '*user*' # Contains "user"
qi '*' -i func --def --limit 10 # First 10 function definitions (quote single * to prevent shell expansion)
# Multiple patterns (OR logic)
qi malloc free calloc # Find any of these
qi user session token -i var --limit 20 # Variables named user/session/tokenStep 3: View code
qi getUserById -i func -e # Expand full function
qi getUserById -i func call -e -C 3 # Expand full function, and show context around call sitesStep 4: Background daemon
index-c . & # Watch for changesqi 'symbol' --limit-per-file 1 --limit 5Output:
Searching for: symbol
LINE | SYM | CTX
-----+--------+-----
./query-index.c:
67 | symbol | COM
./benchmark/struct-layout-optimized.c:
26 | symbol | PROP
./benchmark/struct-layout-original.c:
17 | symbol | PROP
./c/c_language.c:
23 | Symbol | COM
./go/go_language.c:
83 | Symbol | COM
Found 351 matches (showing first 5)
Result breakdown: ARG (162), COM (68), VAR (52), STR (37), PROP (32)
Tip: Use -i <context> to narrow results
- LINE: Line number
- SYM: The matched symbol (as it appears in the source)
- CTX: Where/how the symbol appears (context)
- Informational notes at the end
Understanding context types is key to effective filtering:
| Type | Short | Definition | Example |
|---|---|---|---|
class |
- | Class declarations | class UserManager { } |
interface |
iface |
Interface declarations | interface Storable { } |
function |
func |
Functions and methods | function getUserById() { } |
argument |
arg |
Function parameters | function foo(userId: string) |
variable |
var |
Variables and constants | const cursor = ... |
property |
prop |
Class properties | this.name = "Alice" |
type |
- | Type definitions/annotations | type UserID = string |
import |
imp |
Imported symbols | import { User } from ... |
export |
exp |
Exported symbols | export { User } |
call |
- | Function/method calls | getUserById(123) |
lambda |
- | Lambda/arrow functions | (x) => x * 2 |
enum |
- | Enum declarations | enum Status { } |
case |
- | Enum cases | Active = 1 |
namespace |
ns |
Namespace declarations | namespace Utils { } |
trait |
- | Trait declarations | trait Serializable { } (PHP) |
comment |
com |
Words from comments | // display user info |
string |
str |
Words from string literals | "user name" |
filename |
file |
Filename without extension | File named user.ts |
Usage:
qi user -i func var # Include Functions OR variables
qi user -x comment string # Exclude comments AND strings
qi user -x noise # Shorthand for -x comment string
qi user -i func var -x comment # Combine include and excludeFolder Mode (recommended for projects):
- Recursively indexes all matching files
- Respects ignore lists in
<language>/config/ignore_files.txt - Runs in daemon by default, watching for changes
- Use for: Active development on a codebase
index-c ./src ./libFile Mode (for specific files):
- Indexes only the specified files
- Ignores ignore lists
- Runs once and exits (no daemon)
- Use for: One-off indexing, testing, or specific file updates
index-c main.c utils.c helper.cStart daemon:
index-c ./src & # Background process
index-c ./src --silent & # SilentOne-time indexing:
- Useful for one-time analysis, but not for querying and editing in the same session
index-c ./src --once # Index once and exitStop daemon:
ps aux | grep index-c # Find process ID
kill <PID> # Kill it
killall index-c index-ts # Or kill all indexersWhen to use:
- Daemon: Active development - auto-updates as you edit
- --once: CI/CD pipelines, one-time analysis, manual control
index-<language> <folders...> [options]Options:
--once- Run once and exit (no daemon)--silent- Silence output--quiet-init- Quiet initial indexing, noisy re-indexing on file change--verbose- Show preflight checks and progress--exclude-dir DIR [DIR...]- Exclude additional folders
Examples:
index-c ./src --verbose --once
index-c ./src ./lib --quiet-init &
index-c ../ --exclude-dir build dist node_modulesThe indexer:
- Scans files matching
<language>/config/file-extensions.txt - Skips folders in
<language>/config/ignore_files.txt - Extracts symbols via tree-sitter AST parsing
- Tracks parent symbols for member expressions (e.g.,
this.target.getBounds()) - Captures access modifiers (public, private, protected)
- Filters noise (stopwords, keywords, punctuation, short symbols, pure numbers)
- Stores relative paths from current working directory
Multiple language indexers can run in parallel without database locks:
index-ts ./src &
index-c ./lib &
index-php ./app &SQLite WAL mode is enabled automatically on first run.
qi <pattern...> [options] # One or more patterns (logical OR)Single pattern matching (wildcard syntax):
qi cursor # Exact: cursor
qi '*manager' # Ends with Manager
qi 'get*' # Starts with get
qi '*session*' # Contains session
qi _etuser # Matches: getUser, setUserMultiple pattern matching (OR logic - finds any match):
qi malloc free calloc # Find any of these functions
qi user session token --limit 30 # Find user OR session OR token
qi '*error' '*Exception' '*Fail*' # Any error-related symbols
qi 'init*' 'destroy*' 'cleanup*' # Any lifecycle functionsqi '*manager' -i class iface # Only classes or interfaces
qi user -i func var prop # Functions, variables, or properties
qi user -x comment # Exclude comments
qi user -x noise # Exclude comments and strings (recommended default -- see how to configure below)Single or multiple file patterns (OR logic):
qi user -f .c # All .c files
qi user -f .c .h # .c OR .h files
qi user -f src/* lib/* include/* # Multiple directories
qi user -f auth/*.c session/*.c # .c files in multiple dirs
qi user -f database.c utils.c helpers.c # Specific filesqi getUserById --def # Only definitions (is_definition=1)
qi User --usage # Only usages (is_definition=0)Use cases:
- Jump to definition: Find where a function is declared
- Find all usages: See everywhere a type is used
- API exploration: List all function definitions
Find lines where ALL patterns co-occur:
qi fprintf err_msg --and # Both on same line
qi user session token --and # All three on same line
qi malloc free --and 10 # Both within 10 lines of each otherIncremental refinement:
qi file_filter # Too many results
qi file_filter *build* --and # Refine to lines also containing "build"Expand full definitions:
qi getUserById -i func -e # Show complete functionOutput:
LINE | SYMBOL | CONTEXT
-----+--------------+---------
auth.c:
45 | validateUser | FUNC
45 | bool validateUser(const char* username, const char* password) {
46 | if (!username || !password) {
47 | return false;
48 | }
...
54 | }
Show context lines:
qi cursor -C 2 # 2 lines before and after
qi cursor -A 5 # 5 lines after
qi cursor -B 3 # 3 lines beforeTwo-step workflow (recommended):
# Step 1: Discover
qi *auth* -i func --limit 10
# Step 2: Explore
qi auth -i func --def -e # Expand definitionsQuick overview of all definitions in a file:
qi -f database.c --toc
qi -f c_language.c --toc -i func # Only functions
qi -f .c --toc # Works with extension shorthand (WARNING: This command could produce a LOT of output!)Output:
database.c:
IMPORTS: sqlite3.h, stdio.h, stdlib.h, string.h
FUNCTIONS:
init_database (15-45)
execute_query (47-89)
close_database (91-98)
TYPES:
DatabaseConfig (5-12)
qi database --columns line context symbol # Basic columns
qi database --columns line context clue symbol # Include clue
qi database --columns symbol line clue # Reorder columnsAvailable columns: line, context, parent, scope, modifier, clue, namespace, type, symbol
Default behavior:
- Normal:
line,symbol,context(abbreviated) - With
-c/--clue: Addscluecolumn - With
-v/--verbose: All columns - With
--full: Full context type names
qi user --limit 10 # First 10 matches
qi '*' --limit 20 # First 20 symbols
qi '*' --limit-per-file 3 # First 3 symbols in each file
qi '*' --limit-per-file 3 --limit 10 # First 3 symbols in each file, up to 10 totalInstalled system-wide: /usr/local/share/sourceminder/<language>/config/
Local development: <language>/config/ (in the project directory)
The indexer checks both locations (local takes precedence).
Location: <language>/config/file-extensions.txt
Format: One extension per line starting with a dot:
.ts
.tsx
.js
.jsx
No recompilation needed after editing config files. But you will need to re-index.
Location: <language>/config/ignore_files.txt
Format: One folder per line:
node_modules
dist
build
.git
vendor/legacy
Folders are ignored at any level. Use --exclude-dir for per-run exclusions.
Stopwords (shared): shared/config/stopwords.txt
Keywords (per-language): <language>/config/<language>-keywords.txt
Typical sizes: Project: index.db size, time to index
- Small (100 files): ~10MB, ~2s
- Med (1K files): ~100MB, ~10s
- Large (10K files): ~500MB, ~30s
Optimization:
- Use
--silentfor large projects - Run indexers in parallel with WAL mode
- Use
--exclude-dirfor dependency folders - Index only source directories (exclude tests, docs)
- Write and read code-index.db to/from a ram disk (--db-file /dev/shm/index.db)
Fast queries:
- Exact matches:
qi getUserById - Prefix matches:
qi get* - With filters:
qi user -i func -f .c
Slower queries:
- Wildcard everywhere:
qi '*user*' - No filters:
qi '*' - Complex AND queries
Optimization:
- Start specific, broaden if needed
- Use
--limitand--limit-per-filewhile exploring - Combine context and file filters
- Use
--defto filter out usages
Indexing:
- Start small - index one directory first
- Use daemon mode for active development
- Exclude build artifacts in ignore list
Querying:
- Start with
-x noiseto exclude comments/strings (and consider placing this line in ~/.smconfig) - Use
-eand-Cto see definitions and context - Use
--limitwhile exploring - Two-step workflow: discover (--toc, -i func), then expand (-e)
- Combine filters for precision
Example workflow:
# Step 1: Discover (fast, targeted)
qi handle* -i func --def --limit-per-file 2 --limit 10
# Step 2: See table of contents for selected file
qi -f c_language.c --toc
# Step 3: Expand (show code) for selected function
qi handle_do_statement -f c_language.c --def -e# Code symbols - use qi
qi CursorManager
qi *manager* -i class
# Text/messages - use grep
grep "Error: connection failed" *.c
# Exclude noise for cleaner results
qi user -x noise
# Compact output (default) vs full names
qi user # Shows: VAR, FUNC, etc.
qi user --full # Shows: VARIABLE, FUNCTION, etc.Database locked:
- Another indexer is writing to the database
- Wait or kill zombie processes:
killall index-c index-ts - If persists, delete
code-index.db, (andcode-index.db-*if present) and re-index
No files being indexed:
- Check file extensions:
cat <language>/config/file-extensions.txt - Check ignore list:
cat <language>/config/ignore_files.txt - Run with
--verboseto see preflight checks
Daemon not updating:
- Check if running:
ps aux | grep index-c - Restart:
killall index-c && ./index-c ./src & - Use
--oncemode if daemon is problematic
Too many results:
- Add context filter:
-i func var - Add file filter:
-f .cor-f src/% - Exclude noise:
-x noise - Use definition filter:
--def - Use
--limitand--limit-per-fileto see samples first
No results found:
- Try with wildcards:
qi '*symbol*' - Verify file indexed:
qi '*' -f filename.c --limit 1 - Check if symbol is filtered (qi will output a notice for words in stopwords.txt)
Missing symbols:
- Symbol might be filtered (stopword, keyword, or too short)
- Try wildcards:
qi *symbol* - Check keywords:
cat <language>/config/<language>-keywords.txt
undefined reference to ts_language_c:
- Run
./configurefirst, thenmake clean && make
No such file: shared/config/stopwords.txt:
- Config files missing
- Run
sudo make installor verify local config files exist
Warning: ENABLE_C redefined:
- Run
./configureagain, thenmake clean && make
# Check version
qi --version
./index-c --version
# Verbose output
./index-c ./src --verbose --once
# Test with minimal example
echo 'int test() { return 42; }' > /tmp/test.c
./index-c /tmp/test.c
qi testqi cursor # Exact match
qi get* # Starts with (wildcard)
qi *Manager # Ends with (wildcard)
qi *user* # Contains (wildcard)
qi .etUser # Single char wildcard: getUser, setUser
qi 'get\*' # Literal asterisk (escaped)
qi '\--help' # Search for --help symbolqi user -f .c # By extension
qi user -f src/* # By directory
qi user -f ./auth/*.c # Specific path
qi user -f .c .h # Multiple patterns (OR)Find a function definition:
qi getUserById -i func --def -eExplore a file:
qi -f database.c --tocFind related code:
qi user session --and 10 -C 3Refactor a type:
qi '*' -i arg -t "OldType *" # Find all parameters using old type
qi '*' -i arg -t "OldType*" # Verify none remain after refactorUnderstand how a type is used:
qi user --def -e # See the definition
qi user --usage --limit 20 # See all usagesChain your workflow:
qi '*' -f query-index.c --toc # 1. Explore
qi *user* -i func --def --limit 10 # 2. Find interesting functions
qi validateUser -i func -e # 3. See implementation
qi validateUser --usage -C 3 # 4. Find usage examples
qi validate* login* --and 20 -i func call # 5. Find related code