Joplin AI Note Clustering Benchmark Plugin

This repository contains a Joplin plugin prototype that benchmarks local text embeddings and performs unsupervised note clustering inside the Joplin plugin sandbox.

Goal

The goal of this plugin is to validate that a fully local AI pipeline can run inside Joplin Desktop and provide useful automatic grouping of notes.

Specifically, it demonstrates:

Embedding generation for note text with Transformers.js (running in a worker)
Optional dimensionality reduction with UMAP
Automatic K selection using silhouette score
Final K-Means clustering and sidebar visualization
End-to-end timing metrics for performance discussion

Tech Stack and Packages

Core Runtime

@huggingface/transformers for embedding inference
onnxruntime-web backend assets copied to plugin dist (dist/onnx-dist) for WASM runtime
Web Worker for non-blocking inference

Clustering and Math

@saehrimnir/druidjs for UMAP dimensionality reduction
In-project K-Means implementation
In-project cosine-distance silhouette scoring

Build Tooling

TypeScript
Webpack
copy-webpack-plugin to package static assets (data.json and ONNX runtime files)

Note: The build no longer depends on a tools/ directory. Static assets are copied by webpack during npm run dist.

Model Configuration Used

Current defaults:

Embedding model: Xenova/bge-small-en-v1.5
Display name: BGE-small-en-v1.5
DType: q8
Pooling: mean
Normalization: enabled

These settings are defined in src/modelConfig.ts.

Data Format

Input file: src/data.json

Important: this is line-delimited JSON (JSONL-style), even though the file extension is .json.

Each line is one note-like record, for example:

{"title":"Linear Algebra","body":"Vectors, matrices, eigenvalues..."}
{"text":"Raw text-only note format is also supported"}

Accepted fields:

text
or title + body (combined during parsing)
optional label

For fast local testing, the plugin currently limits processing to the first 100 records.

Pipeline Working (Step by Step)

Plugin starts and opens a sidebar panel.
data.json is loaded from plugin installation directory.
Data lines are parsed into note text payloads.
Worker loads the embedding model and performs warmup inference.
Main thread sends notes to worker one-by-one for embedding.
Worker returns embedding vectors and per-note inference time.
Plugin optionally runs UMAP to reduce vector dimensions.
Plugin tries multiple K values (K=2 to an adaptive max).
For each K, K-Means is executed and silhouette score is computed.
Best K is selected by highest silhouette score.
Final K-Means is run with best K.
Cluster groups + benchmark metrics are rendered in the sidebar.

Clustering Method Used

This plugin uses a classic unsupervised clustering stack:

Feature space: transformer embeddings (semantic vectors)
Distance basis: cosine similarity / cosine distance
Optional projection: UMAP (for better separability and lower compute)
Clustering algorithm: K-Means
Model selection metric: silhouette score

Why this combination:

Embeddings capture semantic meaning of notes.
K-Means is simple, explainable, and fast for PoC.
Silhouette score gives an objective way to pick K.
UMAP can improve cluster geometry and speed for larger sets.

Performance Reporting

The sidebar reports:

Model load time
Warmup time
Per-note embedding latency
Average latency (excluding warmup)
Total embedding time
Silhouette score for each tested K
Selected best K and final cluster sizes

Observed behavior in this repository

Pipeline successfully scales from small samples to larger corpora in prior runs (see screenshot section below).
Worker-based inference keeps UI responsive during embedding.
Main bottleneck is embedding inference time, which scales roughly with note count.

Build and Run

npm install
npm run dist

This creates a plugin archive in publish/.

Install in Joplin:

Open Tools -> Options -> Plugins
Choose Install from file
Select the generated .jpl
Restart Joplin

Demo of Pipeline Working

https://drive.google.com/file/d/1VPv44PIQ71v0Q-gJQ-1Qtr8ZV9MiWVbK/view?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
api		api
src		src
.gitignore		.gitignore
.npmignore		.npmignore
GENERATOR_DOC.md		GENERATOR_DOC.md
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
plugin.config.json		plugin.config.json
tsconfig.json		tsconfig.json
webpack.config.js		webpack.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Joplin AI Note Clustering Benchmark Plugin

Goal

Tech Stack and Packages

Core Runtime

Clustering and Math

Build Tooling

Model Configuration Used

Data Format

Pipeline Working (Step by Step)

Clustering Method Used

Performance Reporting

Observed behavior in this repository

Build and Run

Demo of Pipeline Working

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Joplin AI Note Clustering Benchmark Plugin

Goal

Tech Stack and Packages

Core Runtime

Clustering and Math

Build Tooling

Model Configuration Used

Data Format

Pipeline Working (Step by Step)

Clustering Method Used

Performance Reporting

Observed behavior in this repository

Build and Run

Demo of Pipeline Working

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages