Please note that due to Python 2 EOL, KonText version 0.13.x is the last one running in Python 2. It means that the next release (planned for
Q3 Q4 2020) will run only in Python 3. For the
master branch users - the last commit supporting Python 2 is tagged py2_last_version and the first one supporting Python 3 is tagged py3_initial_version. To upgrade, please refer to doc/py2to3.md for details. For new installations, please follow doc/INSTALL.md.
- Build and installation
- Customization and contribution
- Notable users
KonText is an advanced corpus query interface and corpus data integration middleware built around corpus search engine Manatee-open. The development is maintained by the Institute of the Czech National Corpus.
Notable end-user features
- fully editable query chain
- any operation from a user defined sequence (e.g. query -> filter -> sample -> sorting) can be changed and the whole sequence is then re-executed.
- advanced CQL editor with syntax highlighting and attribute recognition
- interactive PoS tag tool - in case of positional PoS tag formats an interactive tool can be used to write tag queries
- support for spoken corpora
- defined concordance segments can be played back as audio
- KWIC detail provides a custom rendering with easily distinguishable speeches
- support for user-defined line groups
- user can define custom numeric tags attached to concordance lines, filter out other lines, review groups ratios
- rich subcorpus-related functionality
- user can easily examine corpus structure by selecting some text types and see how other text type attributes availability changed ("which publishers are there in case only fiction is selected?")
- a custom text types ratio can be defined ("give me 20% fiction and 80% journalism")
- a sub-corpus can be created by a custom CQL expression
- a sub-corpus can be published so other users can access it
- subcorpora are backed up as CQL queries which makes further modification/restoring possible
- frequency distribution
- 2-dimensional frequency distribution for both positional and structural attributes
- result caching decreases time required to navigate between pages
- on the multilevel frequency distribution page, starting word can be specified for multi-word KWICs
- persistent URL for any query - you can send a link to someone even if the query string was megabytes long
- access to previous queries, named queries
- access to favorite corpora (subcorpora, aligned corpora)
- a concordance/frequency/collocation listing can be saved in Excel format (xlsx)
- concordance tokens and KWICs can be connected to external data services (e.g. dictionaries, encyclopedias)
- a correct (i.e. the one calculating only with selected text types) i.p.m. can be calculated on-demand for ad-hoc subcorpora
- integrability with external data resources (e.g. dictionaries, media libraries)
- modern client-side application (written in TypeScript, event stream architecture, React components, extensible)
- server-side written as a WSGI application with fully decoupled background concordance/frequency/collocation calculation (using an integrated worker server)
- modular code design with dynamically loadable plug-ins providing custom functionality implementation (e.g. custom database adapters, authentication method, corpus listing widgets, HTTP session management)
- Rerverse proxy server
- Python 3.6 (or newer) and:
- corpus search engine Manatee
- versions 2.167.8 and newer are supported by KonText 0.15 and newer
- versions from 2.83.3 to 2.158.8 are supported by KonText 0.13 and older
- a key-value storage
- a task queue for asynchronous/demanding background calculations and maintenance tasks
Note: KonText versions up to 0.13.x (incl.) run on Python 2. To use Python 3, 0.15.x and newer versions of KonText must be used.
Build and installation
KonText provides a script for automatic installation to an existing Ubuntu system. The easiest way to install KonText is to create an LXC/LXD container, clone the repository there and run the script. On a decently fast network, the whole process takes only a couple of seconds. Please refer to the doc/INSTALL.md file for details.
Customization and contribution
Please refer to our Wiki.