[core] CPD: Added file encoding detection to CPD. #31

Merged
merged 1 commit into from Jan 21, 2016

Projects

None yet

2 participants

@tiobe
Contributor
tiobe commented Jan 11, 2016

At some of our customer sites not all source files in the archive use the same file encoding. Most files are encoded in 'ISO-8859-1', while others use UTF-8. This causes errors during the CPD analysis, because some files cannot be read/tokenized properly. This pull request add a simple form of file encoding detection to CPD. When a source file contains a BOM marker, the encoding indicated by the BOM marker is used to read the file. Otherwise the encoding specified on the command line of CPD is used.

@adangel
Owner
adangel commented Jan 21, 2016

Thanks!

@adangel adangel merged commit 6076065 into adangel:pmd/5.3.x Jan 21, 2016
@tiobe tiobe deleted the tiobe:detect_file_encoding_using_UTF_BOM branch Feb 18, 2016
@adangel adangel changed the title from Added file encoding detection to CPD. to [core] CPD: Added file encoding detection to CPD. Jun 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment