Hi there! Here we go with the v6.0.0-rc.1 of Bitextor. This release is related to the code release at Paracrawl project. There are lots of changes since v5.0 of Bitextor and it is the first release since we moved into Github.
- Updated documentation and
README.mdwith new dependencies, commands and troubleshooting
- Added original repositories for most of compiled dependencies (mgiza, clustercat, bicleaner...)
- Fixed encoding errors in
- Added option to use
nltkas sentence splitter
- Added lots of parameters and options for
bitextorto control most parts of the pipeline and long named versions of them (see
- Added option for a config file in
bitextor. See README.md.
- Added ELRC metrics and filters
zipporahclassifiers and thresholds for filtering
httrackas alternative crawler
- Added a JHU processing script for processing crawler content (option
- Added an alternative document aligner translate based (Paracrawl) (option
- Minor changes and bugfixes
bitextor-v6.0.0-rc.1.zip tarball does not include submodules code. If you start compiling the project from this tarball, first you need to
git submodule update --init --recursive. Also, you can't perform this command on the source code
.zip packages, so we recommend the
bitextor-v6.0.0-rc.1.zip tarball or cloning the repo.