Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
[core] CPD: Added file encoding detection to CPD. #31
At some of our customer sites not all source files in the archive use the same file encoding. Most files are encoded in 'ISO-8859-1', while others use UTF-8. This causes errors during the CPD analysis, because some files cannot be read/tokenized properly. This pull request add a simple form of file encoding detection to CPD. When a source file contains a BOM marker, the encoding indicated by the BOM marker is used to read the file. Otherwise the encoding specified on the command line of CPD is used.