GitHub is home to over 40 million developers working together. Join them to grow your own development teams, manage permissions, and collaborate on projects.
Parallel corpus classifier/cleaner
Bitextor generates translation memories from multilingual websites.
Tool to fix bitexts and tag near-duplicates for removal
PDF parser and converter to HTML
Python interface to pdf-extract, HTML extraction from PDF
Repository for data models, dictionaries and more resources for Bitextor
Anonymizer module for Bicleaner's pipeline (WIP)
The hunalign sentence aligner. Forked from version 1.2