Skip to content

IngoKl/PyXMLConc

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
bin
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

PyXMLConc

PyXMLConc-Logo

PyXMLConc is a very simple concordancer. It is supposed to be used in exploratory analysis of XML-annotated corpora. Its primary feature lies in the automatic detection of XML tags and attributes. The search/concordancing function supports regular expressions.

Note: Please be aware that this is not production software/code at all. I primarily use this tool to teach XML annotated corpora. There are numerous bugs and idiosyncrasies.

Usage

After cloning the repository, simply run python -m pyxmlconc.pyxmlconc. Alternatively, you can install PyXMLConc by running pip install .. This will make PyXMLConc available as a shell command.

The concordancer supports two working modes. The default mode (Tokenizer) tokenizes the text and builds the concordances from the individual tokens. The second mode, re.findall, uses regular expressions to search the text without previous tokenization. While this mode is somewhat more flexible, the user has to account for potential overlaps resulting in 'missing' concordances.

Todo

  • Add additional tests
  • Automatically centering the scrollbar
  • Frequency table as an actual table
  • Allow search from the frequency table
  • Color the actual search term; split up the concordance into columns
  • Select search terms from the frequency list
  • Fix issues when there are multiple attributes

Screenshot

Screenshot

Updates

  • (2020-12-01) PyXMLConc 0.2 - Upgrade to Qt 5 and PySide2; Ensure Python 3.x compatibility; Add simple frequency tables

About

A very simple concordancer with XML support.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published