[core] CPD: Added file encoding detection to CPD. #31

merged 1 commit into from Jan 21, 2016


None yet

2 participants

tiobe commented Jan 11, 2016

At some of our customer sites not all source files in the archive use the same file encoding. Most files are encoded in 'ISO-8859-1', while others use UTF-8. This causes errors during the CPD analysis, because some files cannot be read/tokenized properly. This pull request add a simple form of file encoding detection to CPD. When a source file contains a BOM marker, the encoding indicated by the BOM marker is used to read the file. Otherwise the encoding specified on the command line of CPD is used.

adangel commented Jan 21, 2016


@adangel adangel merged commit 6076065 into adangel:pmd/5.3.x Jan 21, 2016
@tiobe tiobe deleted the tiobe:detect_file_encoding_using_UTF_BOM branch Feb 18, 2016
@adangel adangel changed the title from Added file encoding detection to CPD. to [core] CPD: Added file encoding detection to CPD. Jun 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment