executables for building pipelines
R Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
Python
R
examples
README.md

README.md

Unix-executables for Quantitative Language Comparison (QLC)

Michael Cysouw cysouw@mac.com

Currently, there are many different projects using either Python, R, or even Java or basic C++ for analyses of linguistic data. The basic idea of this project is to make simple wrappers around useful algorithms that can then be piped together in the command line. In this way it might become easier to combine different methods from different implementations. For example to link it to the cldf data export, see here

As a proof-of-concept, we will be collecting various executables here. However, in the future these executables should become part of the published code of any of the projects.

There is no packaging of all dependencies (yet), so you will have to install all dependencies yourself for now. This is the current list of dependencies:

The input and output of these executables should be as simple as possible, and the functionality be as low-level as possible. The idea is to build higher-level functionality processing complex data files by using pipes of the executables here.

Basically, there are two kinds of IO-structures currently assumed:

  • Line-based lists, i.e. UTF-8 data with LF line breaks in which each item is put on a new line
  • Square matrices: pairwise similarities/distances with items separated by tabs, lines separated by LF line breaks

Currently, I would propose to not use CSV-type multi-column files, but (when necessary) use executables with more than one argument to pass multiple columns.

Just for convenience, the executables are organised by underlying code language in this repository.