Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 69 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,26 @@
# semble

Instant local code search for agents.
<h2 align="center">
<img width="30%" alt="semble logo" src="assets/images/semble_logo.png"><br/>
Fast Code Search for Agents
</h2>

## Install
<div align="center">

[Quickstart](#quickstart) •
[Main Features](#main-features) •
[MCP Server](#mcp-server) •
[Benchmarks](#benchmarks)

</div>

Semble is a fast code search library for local and remote repositories. It combines static [Model2Vec](https://github.com/MinishLab/model2vec) embeddings using [potion-code-16M](https://huggingface.co/minishlab/potion-code-16M) with [BM25](https://github.com/xhluca/bm25s) and a specialized hybrid reranking stack to deliver near-transformer accuracy at a fraction of the cost. As an [MCP server](#mcp-server), it gives agents (Claude Code, Cursor, Codex, OpenCode, etc.) instant access to any codebase: repos are cloned and indexed on demand.

## Quickstart

```bash
pip install semble
```

## Python API

```python
from semble import SembleIndex

Expand All @@ -19,19 +30,29 @@ index = SembleIndex.from_path("./my-project")
# Index a remote git repository
index = SembleIndex.from_git("https://github.com/MinishLab/model2vec")

results = index.search("load a pretrained model", top_k=3)
for result in results:
print(result)
results = index.search("save model to disk", top_k=3)

# Each result exposes the matched chunk
result = results[0]
result.chunk.file_path # "model2vec/model.py"
result.chunk.start_line # 127
result.chunk.end_line # 150
result.chunk.content # "def save_pretrained(self, path: PathLike, ..."
```

## MCP server
## Main Features

Semble can run as an MCP server so agents (Claude Code, Cursor, etc.) can search any codebase directly.
- **Fast**: indexes a repo in ~250 ms and answers queries in ~1.5 ms, all on CPU.
- **Accurate**: NDCG@10 of 0.854 on our benchmarks, on par with code-specialized transformer models, at a fraction of the size and cost.
- **Local and remote**: pass a local path or a git URL.
- **MCP server**: drop-in tool for Claude Code, Cursor, Codex, OpenCode, and any other MCP-compatible agent.
- **Lightweight**: CPU-only, minimal dependencies.

The agent will clone and index repos on demand as you ask questions. Indexes are cached for the lifetime of the session.
## MCP Server

Semble can run as an MCP server so agents can search any codebase directly. Repos are cloned and indexed on demand, and indexes are cached for the lifetime of the session.

### Installation
### Setup

#### Claude Code
```bash
Expand Down Expand Up @@ -59,7 +80,43 @@ Add to `~/.opencode/config.json`:
}
```

### Tools

| Tool | Description |
|------|-------------|
| `search` | Search a codebase with a natural-language or code query. Pass `repo` as a git URL or local path. |
| `find_related` | Given a file path and line number, return chunks semantically similar to the code at that location. |

## Benchmarks

Quality and speed across all methods on ~1,250 queries over 63 repositories in 19 languages.

| Method | NDCG@10 | Index time | Query p50 |
|--------|--------:|-----------:|----------:|
| ripgrep | 0.126 | — | 12 ms |
| ColGREP | 0.693 | 5.8 s | 124 ms |
| CodeRankEmbed | 0.765 | 57 s | 16 ms |
| semble | 0.854 | **263 ms** | **1.5 ms** |
| CodeRankEmbed Hybrid | **0.862** | 57 s | 16 ms |

The 137M-parameter CodeRankEmbed Hybrid leads NDCG@10 by 0.008. Semble indexes 218x faster and answers queries 11x faster. See [benchmarks](benchmarks/README.md) for per-language results, ablations, and methodology.

## License

MIT

## Citing

If you use Semble in your research, please cite the following:

```bibtex
@software{minishlab2026semble,
author = {{van Dongen}, Thomas and Stephan Tulkens},
title = {Semble: Fast Code Search for Agents},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.XXXXXXX},
url = {https://github.com/MinishLab/semble},
license = {MIT}
}
```
Binary file added assets/images/semble_logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "semble"
description = "Instant local code search for agents."
description = "Fast Code Search for Agents"
authors = [{name = "Thomas van Dongen", email = "thomas123@live.nl"}, { name = "Stéphan Tulkens", email = "stephantul@gmail.com"}]
readme = { file = "README.md", content-type = "text/markdown" }
dynamic = ["version"]
Expand Down
2 changes: 1 addition & 1 deletion uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading