Skip to content

Bitextor 6.0.0-rc.1

Choose a tag to compare
@lpla lpla released this 14 Jun 08:25
· 3785 commits to master since this release

Hi there! Here we go with the v6.0.0-rc.1 of Bitextor. This release is related to the code release at Paracrawl project. There are lots of changes since v5.0 of Bitextor and it is the first release since we moved into Github.

How do I install Bitextor?

How do I run Bitextor?

Any example to check if it is working?

6.0.0-rc.1 Changelog

  • Updated documentation and with new dependencies, commands and troubleshooting
  • Added original repositories for most of compiled dependencies (mgiza, clustercat, bicleaner...)
  • Fixed encoding errors in tika input/output management
  • Added option to use nltk as sentence splitter
  • Added lots of parameters and options for bitextor to control most parts of the pipeline and long named versions of them (see --help)
  • Replaced mkcls with clustercat and giza-pp with mgiza
  • Added option for a config file in bitextor. See
  • Added ELRC metrics and filters
  • Added bicleaner and zipporah classifiers and thresholds for filtering
  • Added httrack as alternative crawler
  • Added a JHU processing script for processing crawler content (option --jhu-lett)
  • Added an alternative document aligner translate based (Paracrawl) (option --jhu-aligner-command TRANSLATIONCOMMAND)
  • Minor changes and bugfixes

Note: the tarball does not include submodules code. If you start compiling the project from this tarball, first you need to git submodule update --init --recursive. Also, you can't perform this command on the source code .tar.gz and .zip packages, so we recommend the tarball or cloning the repo.