Cleaning of Parallel Texts for Machine Translation
Code for my BSc thesis. More info at http://www.adliska.com/publications/#theses.
Abstract: The aim of the thesis is to design, implement and manually evaluate filters for parallel data cleaning, focused on statistical machine translation. Annotated sets of parallel texts to be used during development of new filters in the future are another result of this work. Several tools facilitating work with these sets and allowing for automatic evaluation of filter outputs are also developed.