New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] CPD: Added file encoding detection to CPD. #31

merged 1 commit into from Jan 21, 2016


None yet
2 participants

tiobe commented Jan 11, 2016

At some of our customer sites not all source files in the archive use the same file encoding. Most files are encoded in 'ISO-8859-1', while others use UTF-8. This causes errors during the CPD analysis, because some files cannot be read/tokenized properly. This pull request add a simple form of file encoding detection to CPD. When a source file contains a BOM marker, the encoding indicated by the BOM marker is used to read the file. Otherwise the encoding specified on the command line of CPD is used.


This comment has been minimized.

Show comment
Hide comment

adangel Jan 21, 2016




adangel commented Jan 21, 2016


@adangel adangel merged commit 6076065 into adangel:pmd/5.3.x Jan 21, 2016

@tiobe tiobe deleted the tiobe:detect_file_encoding_using_UTF_BOM branch Feb 18, 2016

@adangel adangel changed the title from Added file encoding detection to CPD. to [core] CPD: Added file encoding detection to CPD. Jun 25, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment