An advanced web front-end for the Manatee-open corpus search engine
TypeScript Python CSS JavaScript Shell HTML
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
build-scripts Make FCS configurable Jun 28, 2018
cmpltmpl Yet another _store_checked_text_types() change Mar 25, 2015
conf Update installation scripts Aug 17, 2018
doc Update installation scripts Aug 17, 2018
lib Fix conc session passing Aug 16, 2018
locale/cs_CZ/LC_MESSAGES Remove redundant server-side l10 stuff, fix corpus info box Jul 12, 2018
public Improve pagination blocks visuals Aug 16, 2018
scripts Update installation scripts Aug 17, 2018
templates Fix conc session passing Aug 16, 2018
test-data/tags Yet another _store_checked_text_types() change Mar 25, 2015
tests Update get_user_info signature to fix ucnk_remote_auth Aug 8, 2018
.gitignore Clean-up after config.sample.xml removal Aug 2, 2018
.gitmodules Merging Lindat's W4 back to upstream (#2311) Apr 18, 2018
.pre-commit-config.yaml Disable temporarily tslint due to configuration problems Jun 26, 2018
.travis.yml rebased complete testing on the newest version, added build and test … May 31, 2018
CONTRIBUTING.md Add tests for the 'settings' module Feb 28, 2017
COPYING Yet another _store_checked_text_types() change Mar 25, 2015
Makefile Remove 'client-devel' target Jul 25, 2018
README.md Update README.md Jul 10, 2018
apt-requirements.txt rebased complete testing on the newest version, added build and test … May 31, 2018
dev-requirements.txt rebased complete testing on the newest version, added build and test … May 31, 2018
ecosystem.config.js Merging Lindat's W4 back to upstream (#2311) Apr 18, 2018
package.json Upgrade some vendor libs (React, RSVP) Jul 13, 2018
requirements.txt Allow PID in log file name, add optional suppor for concurrent log Apr 20, 2018
tsconfig.json Add tsconfig Oct 27, 2017
tslint.json rebased complete testing on the newest version, added build and test … May 31, 2018
webpack.dev.js Fix broken css/less rule matching in devel vs. prod. mode Jul 19, 2018
webpack.prod.js Upgrade to Webpack 4.x Jul 13, 2018
worker.py Implement initial client and cmd wrapper for KonServer Aug 9, 2018

README.md

KonText screenshot

Build status

Contents

Introduction

KonText is an advanced corpus query interface for the Manatee-open corpus search engine. It builds on top of core server-side libraries from NoSketchEngine and both applications are data-compatible as well. The development is maintained by the Institute of the Czech National Corpus.

Features

new features

  • fully editable query chain
    • any operation from a user defined sequence (e.g. query -> filter -> sample -> sorting) can be changed and the whole sequence is then re-executed.
  • support for spoken corpora
    • defined concordance segments can be played back as audio
    • KWIC detail provides a custom rendering with easily distinguishable speeches
  • support for user-defined line groups
    • user can define custom numeric tags attached to concordance lines, filter out other lines, review groups ratios
  • improved subcorpus creation
    • user can easily examine corpus structure by selecting some text types and see how other text type attributes availability changed ("which publishers are there in case only fiction is selected?")
    • a custom text types ratio can be defined ("give me 20% fiction and 80% journalism")
    • a sub-corpus can be created by a custom CQL expression
    • subcorpora are backed up as CQL queries which makes further modification/restoring possible
  • frequency distribution
    • 2-dimensional frequency distribution for both positional and structural attributes
    • result caching decreases time required to navigate between pages
    • on the multilevel frequency distribution page, starting word can be specified for multi-word KWICs
  • persistent URLs for large queries - you can send a link to someone even if the query was in megabytes
  • access to previous queries, named queries
  • access to favorite corpora (subcorpora, aligned corpora)
  • interactive PoS tag tool - in case of positional PoS tag formats an interactive tool can be used to write tag queries
  • a concordance/frequency/collocation listing can be saved in Excel format (xlsx)
  • a correct (i.e. the one calculating only with selected text types) i.p.m. can be calculated on-demand for ad-hoc subcorpora
  • result shuffling can be pre-set
  • less full page reloads

internal features

  • server-side written as a WSGI application
  • modern client-side application (event stream architecture, React components, extensible, written in TypeScript)
  • modular code design with dynamically loadable plug-ins providing custom functionality implementation (e.g. custom database adapters, authentication method, corpus listing widgets, HTTP session management)
  • fully decoupled background concordance/frequency/collocation calculation based on the Celery task queue (alternatively, Python's multiprocessing package can be used)
  • improved logging, error processing and debugging support
  • improved code documentation

Requirements

  • a WSGI-compatible server
    • recommended setup: Gunicorn + a reverse proxy (e.g. Nginx or Apache2)
    • supported setup: Apache2 with mod_wsgi
  • Python 2.7 and:
    • Cheetah Template Engine
    • lxml library
    • werkzeug library (provides WSGI middleware)
    • PyICU library (optional but preferred)
    • markdown library (optional, for formatted corpora references)
    • openpyxl library (optional, for XLSX export)
  • corpus search engine Manatee
    • versions from 2.83.3 to 2.158.8 are supported (the latest one is highly recommended); unless there is an incompatible change in Manatee, newer versions should work too
  • a key-value storage
    • any custom implementation (Redis and SQLite backends are available by default)
  • (optional) Celery task queue task queue for (asynchronous) background calculations and maintenance tasks

Build and installation

Please refer to the doc/INSTALL.md file for details.

Customization and contribution

Please refer to our Wiki.

Notable installations