Bitextor 6.0.0-rc.1
Hi there! Here we go with the v6.0.0-rc.1 of Bitextor. This release is related to the code release at Paracrawl project. There are lots of changes since v5.0 of Bitextor and it is the first release since we moved into Github.
How do I install Bitextor?
How do I run Bitextor?
Any example to check if it is working?
6.0.0-rc.1 Changelog
- Updated documentation and
README.mdwith new dependencies, commands and troubleshooting - Added original repositories for most of compiled dependencies (mgiza, clustercat, bicleaner...)
- Fixed encoding errors in
tikainput/output management - Added option to use
nltkas sentence splitter - Added lots of parameters and options for
bitextorto control most parts of the pipeline and long named versions of them (see--help) - Replaced
mkclswithclustercatandgiza-ppwithmgiza - Added option for a config file in
bitextor. See README.md. - Added ELRC metrics and filters
- Added
bicleanerandzipporahclassifiers and thresholds for filtering - Added
httrackas alternative crawler - Added a JHU processing script for processing crawler content (option
--jhu-lett) - Added an alternative document aligner translate based (Paracrawl) (option
--jhu-aligner-command TRANSLATIONCOMMAND) - Minor changes and bugfixes
Note: the bitextor-v6.0.0-rc.1.zip tarball does not include submodules code. If you start compiling the project from this tarball, first you need to git submodule update --init --recursive. Also, you can't perform this command on the source code .tar.gz and .zip packages, so we recommend the bitextor-v6.0.0-rc.1.zip tarball or cloning the repo.