Skip to content

NDobricic/LELA

Repository files navigation

LELA

Standalone, swappable NER → candidate generation → rerank → disambiguation pipeline. Uses file-based storage (JSONL for KB and outputs) and optional caching in .ner_cache/.

Install

Requirements: Python 3.10-3.12 (Python 3.13 is NOT supported due to vLLM), CUDA 12.x for GPU support

python3.10 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Quick start

Web UI (Gradio)

Launch the interactive web interface:

python app.py

Open http://localhost:7860 and configure the pipeline through the UI. See docs/WEB_APP.md for details.

Troubleshooting

If you encounter issues, see docs/TROUBLESHOOTING.md for solutions to common problems including:

  • PyTorch CUDA mismatch
  • vLLM installation failures
  • GPU memory issues

CLI

  1. Prepare a JSONL knowledge base with fields: id, title, description (plus optional metadata).
  2. Create a config file, e.g. config.json:
{
  "loader": {"name": "pdf", "params": {}},
  "ner": {"name": "spacy", "params": {"model": "en_core_web_sm"}},
  "candidate_generator": {"name": "bm25", "params": {}},
  "reranker": {"name": "none", "params": {}},
  "disambiguator": {"name": "popularity", "params": {}},
  "knowledge_base": {"name": "jsonl", "params": {"path": "kb.jsonl"}},
  "cache_dir": ".ner_cache",
  "batch_size": 1
}
  1. Run:
python -m lela.cli --config config.json --input docs/file1.pdf docs/file2.pdf --output outputs.jsonl

Example: lightweight fuzzy run (no heavy models)

python -m lela.cli \
  --config data/configs/simplewiki_fuzzy_simple.json \
  --input data/docs/simple-english-wiki/corpus.txt \
  --output outputs.jsonl

This uses the simple regex NER, fuzzy candidates, first-candidate disambiguation, and the YAGO-derived KB JSONL.

Python API

from lela import Lela

# Load from a JSON config file path
lela = Lela("config.json")
results = lela.run("docs/file1.txt")

# Or pass a dict directly
import json
config = json.load(open("config.json"))
lela = Lela(config)
results = lela.run("docs/file1.txt", "docs/file2.txt")

Available components

  • Loaders: text, json, jsonl, pdf, docx, html
  • NER: spacy, gliner, simple (regex)
  • Candidate generators: bm25, dense, fuzzy
  • Rerankers: cross_encoder, none
  • Disambiguators: popularity, first, llm
  • Knowledge bases: jsonl, wikipedia, wikidata

Data & configs

  • The data/ directory is gitignored by default. Keep shareable configs in data/configs/ (tracked).
  • Sample configs provided:
    • data/configs/simplewiki_fuzzy_simple.json

Conversion utilities

  • YAGO labels TSV → JSONL KB:
    python -m lela.scripts.convert_yago_labels data/kb/yagoLabels.tsv data/kb/yago_labels_en.jsonl

Notes

  • Outputs are JSONL (one line per document with resolved entities).
    • Each line: id, text, entities (with text, start, end, label, entity_id, entity_title, entity_description, candidates).
  • Cache lives in .ner_cache/ keyed by file path, mtime, and size.
  • No dependency on LELA; integration would be optional if added later.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages