TextDB is a text-centric data management system to support declarative and scalable text processing, in particular, information extraction. It can be viewed as a DBMS specifically designed for text.


  • Disk-based storage and indexing of text documents
  • Various text-centric operators, e.g., keyword search, regex, dictionary-based lookups, fuzzy search, and NLP
  • Index-based query processing
  • Query engine to execute plans consisting of operators
  • GUI for easy query formulation
  • Declarative query language "TextQL" (under development)

TextDB architecture Diagram Source

Query Execution Flow:
Query Execution Flow
Diagram Source

Development Tools:

  • Eclipse: Eclipse
  • Lucene: Lucene
  • Maven: Maven
  • Yourkit: Yourkit