LELA

Standalone, swappable NER → candidate generation → rerank → disambiguation pipeline. Uses file-based storage (JSONL for KB and outputs) and optional caching in .ner_cache/.

Install

Requirements: Python 3.10-3.12 (Python 3.13 is NOT supported due to vLLM), CUDA 12.x for GPU support

python3.10 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Quick start

Web UI (Gradio)

Launch the interactive web interface:

python app.py

Open http://localhost:7860 and configure the pipeline through the UI. See docs/WEB_APP.md for details.

Troubleshooting

If you encounter issues, see docs/TROUBLESHOOTING.md for solutions to common problems including:

PyTorch CUDA mismatch
vLLM installation failures
GPU memory issues

CLI

Prepare a JSONL knowledge base with fields: id, title, description (plus optional metadata).
Create a config file, e.g. config.json:

{
  "loader": {"name": "pdf", "params": {}},
  "ner": {"name": "spacy", "params": {"model": "en_core_web_sm"}},
  "candidate_generator": {"name": "bm25", "params": {}},
  "reranker": {"name": "none", "params": {}},
  "disambiguator": {"name": "popularity", "params": {}},
  "knowledge_base": {"name": "jsonl", "params": {"path": "kb.jsonl"}},
  "cache_dir": ".ner_cache",
  "batch_size": 1
}

Run:

python -m lela.cli --config config.json --input docs/file1.pdf docs/file2.pdf --output outputs.jsonl

Example: lightweight fuzzy run (no heavy models)

python -m lela.cli \
  --config data/configs/simplewiki_fuzzy_simple.json \
  --input data/docs/simple-english-wiki/corpus.txt \
  --output outputs.jsonl

This uses the simple regex NER, fuzzy candidates, first-candidate disambiguation, and the YAGO-derived KB JSONL.

Python API

from lela import Lela

# Load from a JSON config file path
lela = Lela("config.json")
results = lela.run("docs/file1.txt")

# Or pass a dict directly
import json
config = json.load(open("config.json"))
lela = Lela(config)
results = lela.run("docs/file1.txt", "docs/file2.txt")

Available components

Loaders: text, json, jsonl, pdf, docx, html
NER: spacy, gliner, simple (regex)
Candidate generators: bm25, dense, fuzzy
Rerankers: cross_encoder, none
Disambiguators: popularity, first, llm
Knowledge bases: jsonl, wikipedia, wikidata

Data & configs

The data/ directory is gitignored by default. Keep shareable configs in data/configs/ (tracked).
Sample configs provided:
- data/configs/simplewiki_fuzzy_simple.json

Conversion utilities

YAGO labels TSV → JSONL KB:

python -m lela.scripts.convert_yago_labels data/kb/yagoLabels.tsv data/kb/yago_labels_en.jsonl

Notes

Outputs are JSONL (one line per document with resolved entities).
- Each line: id, text, entities (with text, start, end, label, entity_id, entity_title, entity_description, candidates).
Cache lives in .ner_cache/ keyed by file path, mtime, and size.
No dependency on LELA; integration would be optional if added later.

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
config		config
data/test		data/test
docs		docs
lela		lela
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LELA

Install

Quick start

Web UI (Gradio)

Troubleshooting

CLI

Example: lightweight fuzzy run (no heavy models)

Python API

Available components

Data & configs

Conversion utilities

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LELA

Install

Quick start

Web UI (Gradio)

Troubleshooting

CLI

Example: lightweight fuzzy run (no heavy models)

Python API

Available components

Data & configs

Conversion utilities

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages