Bitextor 6.0.0-rc.1
Hi there! Here we go with the v6.0.0-rc.1 of Bitextor. This release is related to the code release at Paracrawl project. There are lots of changes since v5.0 of Bitextor and it is the first release since we moved into Github.
How do I install Bitextor?
How do I run Bitextor?
Any example to check if it is working?
6.0.0-rc.1 Changelog
- Updated documentation and
README.md
with new dependencies, commands and troubleshooting - Added original repositories for most of compiled dependencies (mgiza, clustercat, bicleaner...)
- Fixed encoding errors in
tika
input/output management - Added option to use
nltk
as sentence splitter - Added lots of parameters and options for
bitextor
to control most parts of the pipeline and long named versions of them (see--help
) - Replaced
mkcls
withclustercat
andgiza-pp
withmgiza
- Added option for a config file in
bitextor
. See README.md. - Added ELRC metrics and filters
- Added
bicleaner
andzipporah
classifiers and thresholds for filtering - Added
httrack
as alternative crawler - Added a JHU processing script for processing crawler content (option
--jhu-lett
) - Added an alternative document aligner translate based (Paracrawl) (option
--jhu-aligner-command TRANSLATIONCOMMAND
) - Minor changes and bugfixes
Note: the bitextor-v6.0.0-rc.1.zip
tarball does not include submodules code. If you start compiling the project from this tarball, first you need to git submodule update --init --recursive
. Also, you can't perform this command on the source code .tar.gz
and .zip
packages, so we recommend the bitextor-v6.0.0-rc.1.zip
tarball or cloning the repo.