Skip to content
A very simple concordancer with XML support.
Python Batchfile
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin
data
docs
pyxmlconc
tests
.gitignore
LICENSE
MANIFEST.in
README.md
requirements.txt
setup.py

README.md

PyXMLConc

PyXMLConc is a very simple concordancer. It is supposed to be used in exploratory analysis of XML-annotated corpora. Its primary feature lies in the automatic detection of XML tags and attributes. The search/concordancing function supports regular expressions.

Usage

After cloning the repository, simply run python pyxmlconc/pyxmlconc.py. Alternatively, you can install PyXMLConc by running pip install .. This will make PyXMLConc available as a shell command.

The concordancer supports two working modes. The default mode (Tokenizer) tokenizes the text and builds the concordances from the individual tokens. The second mode, re.findall, uses regular expressions to search the text without previous tokenization. While this mode is somewhat more flexible, the user has to account for potential overlaps resulting in 'missing' concordances.

I also provide compiled/binary versions for Windows:

  • PyXMLConc-0.1 ([SHA 256] e64391aabeaa42a94c4baf1c1d0dd9854f85178683e5f2ffa94d5f24b25c1536) (40mb)

Todo

  • Simple frequency table
  • Unit-Tests
  • Automatically centering the scrollbar
  • Frequency table as an actual table
  • Allow search from frequency table
  • Color the actual search term

Screenshot

Screenshot

You can’t perform that action at this time.