Sagin

Sagin (색인, [sæ-gin]) is a self-hosted, unified document search tool for a workplace.

Feature Roadmap

One place for searching all documents
Fast keyword search & full-text search
Similarity search & Summarization
Safe AI integration with data protection policy
Advanced filter by metadata
Knowledge graph by references

Supported Sources

Architecture

Overview

 Stores (Notion, Confluence, etc)
   ▲
   │
   │                    Metadata / Entities (Postgres)
Indexers ────────────►  Search Indexes      (Postgres or MeiliSearch)
   ▲                    Knowledge Graph     (Postgres or Neo4j)
   │                           ▲
   │                           │
   │                           │
  App (UI, System Admin)───────┘

Database

Sagin is designed so all components can share a single Postgres instance as it is possible.

Job queue by Graphile Worker
Keyword search & similarity search by ParadeDB
Graph queries & page rank by Apache AGE

This is possible thanks to Postgres' capable extension system. However, this may be difficult to use for some reasons, like organization policies using managed Postgres or for instance scale. Sagin tries to provide several options.

Search Indexes

Sagin indexers maintain search indexes for documents, powered by ParadeDB or MeiliSearch (maybe adding OpenSearch later)

Why ParadeDB?

Sagin uses ParadeDB becuase not only it is a Postgres extension, but also it provides amazing features set.

ParadeDB is based on Tantivy, implementing accurate BM25 search. Faster searches with fewer resources than ElasticSearch.

Queries fully customizable with BM25/HNSW hybrid scores and ParadeQL

One concern is that it is still very early, but there are already enough features and very active development. They responded to our requirements for a Korean tokenizer just in one day.

Why MeiliSearch?

Sagin uses MeiliSearch because Karrot is a Korean company, and it was the only option that guaranteed meaningful quality for Korean out of the box. Others provide only a simple bi-gram tokenizer for Korean.

MeiliSearch currently supports keyword search OR similarity search, hybrid search is still on roadmap. It's fine due to we use keyword search and AI search as separated features yet.

Also MeiliSearch is licenced under MIT

Knowledge Graph

TBD

Indexers

Indexers are background workers that crawl content from stores, and sync it with search indexes.

Typically, one indexer is configured for one store API with an authority. Configuring multiple indexers for the same store is not recommended for reasons such as rate limiting on the endpoint.

  Notion    Confluence    GitHub
     ▲           ▲           ▲
     │           │           │
     │           │           │
  Indexer     Indexer     Indexer
     │           │           │
     └───────────┼───────────┘
                 │
                 ▼
              Database

A "Store" refers to the service where the actual documents are produced. A "Source" is a specific locator among it.

When an admin user registers a source, an actual index is created and a task of synchronizing documents is registered. That is, one indexer manages multiple indexes.

Data Protection Policy

Sagin could send data to OpenAI to request text embeddings. To protect sensitive documents, users can disable it by specifying data protection policy at indexer, source, and document level. If any of these scope are specified as protected, they will not be sent externally.

LICENSE

See LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.vim		.vim
.vscode		.vscode
.yarn/releases		.yarn/releases
docs		docs
initdb.d		initdb.d
packages		packages
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.tool-versions		.tool-versions
.yarnrc.yml		.yarnrc.yml
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
docker-compose.yml		docker-compose.yml
otel-collector-config.yml		otel-collector-config.yml
package.json		package.json
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sagin

Feature Roadmap

Supported Sources

Architecture

Overview

Database

Search Indexes

Knowledge Graph

Indexers

Data Protection Policy

LICENSE

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sagin

Feature Roadmap

Supported Sources

Architecture

Overview

Database

Search Indexes

Knowledge Graph

Indexers

Data Protection Policy

LICENSE

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages