semantic_code_search and semantic_navigate fail with 'Unable to embed oversized input' on projects with large data files

## Bug Description

`semantic_code_search` and `semantic_navigate` fail with the error:

```
Unable to embed oversized input after adaptive retries
```

This happens on projects that contain large non-code files (JSON data files, GeoJSON, CSV, etc.) alongside source code. The tools appear to attempt embedding the entire file content including these large data files, which exceeds the embedding model's context window.

## Environment

- **Context+ version**: latest via `bunx contextplus`
- **OS**: macOS (Apple Silicon, 64 GB RAM)
- **Ollama**: running locally
- **Embed model**: `nomic-embed-text` (context window: 2048 tokens)
- **Chat model**: `gemma2:27b`

## Reproduction

Any project that has both source code files and large data files (JSON > 100KB, GeoJSON, CSV, etc.) in the project tree.

### Steps

1. Configure Context+ as MCP server with Ollama (`nomic-embed-text`)
2. Have a project with a few JS/TS source files and some large `.json` data files (100KB+)
3. Run `semantic_code_search` with any query:
   ```
   semantic_code_search({ query: "authentication logic", top_k: 3 })
   ```
4. **Result**: `Unable to embed oversized input after adaptive retries`

### What works vs what doesn't

| Tool | Status | Notes |
|------|--------|-------|
| `get_context_tree` | ✅ Works | AST-based, no embeddings |
| `get_file_skeleton` | ✅ Works | AST-based, no embeddings |
| `semantic_identifier_search` | ✅ Works | Embeds function signatures (small) |
| `get_blast_radius` | ✅ Works | Import/usage tracing |
| `semantic_code_search` | ❌ Fails | Tries to embed large data files |
| `semantic_navigate` | ❌ Fails | Same issue |

## Expected Behavior

Context+ should either:
1. **Skip non-code files** (`.json`, `.geojson`, `.csv`, etc.) during embedding, or
2. **Chunk large files** before embedding instead of sending the entire content, or
3. **Add a configurable `max_file_size` threshold** (e.g., 50KB) beyond which files are skipped for embedding, or
4. **Gracefully degrade** — skip files that exceed the model's context window and continue with the rest

## Suggested Fix

The `embeddings.ts` core module could:
- Filter out known data-only extensions (`.json`, `.geojson`, `.csv`, `.xlsx`) from the embedding pipeline
- Add a `CONTEXTPLUS_MAX_EMBED_FILE_SIZE` env var (default ~50KB)
- Or use the existing Tree-sitter parser to detect if a file has meaningful code symbols — if not, skip it

## Additional Context

`semantic_identifier_search` works because it only embeds function/class signatures (small strings). The bug is specifically in the file-level embedding pipeline used by `semantic_code_search` and `semantic_navigate`.

The `nomic-embed-text` model has a 2048-token context window. Large data files far exceed this limit.

Great project — the AST tools work beautifully. Looking forward to semantic search handling mixed codebases!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

semantic_code_search and semantic_navigate fail with 'Unable to embed oversized input' on projects with large data files #15

Bug Description

Environment

Reproduction

Steps

What works vs what doesn't

Expected Behavior

Suggested Fix

Additional Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Tool	Status	Notes
`get_context_tree`	✅ Works	AST-based, no embeddings
`get_file_skeleton`	✅ Works	AST-based, no embeddings
`semantic_identifier_search`	✅ Works	Embeds function signatures (small)
`get_blast_radius`	✅ Works	Import/usage tracing
`semantic_code_search`	❌ Fails	Tries to embed large data files
`semantic_navigate`	❌ Fails	Same issue

semantic_code_search and semantic_navigate fail with 'Unable to embed oversized input' on projects with large data files #15

Description

Bug Description

Environment

Reproduction

Steps

What works vs what doesn't

Expected Behavior

Suggested Fix

Additional Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions