New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] CPD: Added file encoding detection to CPD. #31

Merged
merged 1 commit into from Jan 21, 2016

Conversation

Projects
None yet
2 participants
@tiobe

tiobe commented Jan 11, 2016

At some of our customer sites not all source files in the archive use the same file encoding. Most files are encoded in 'ISO-8859-1', while others use UTF-8. This causes errors during the CPD analysis, because some files cannot be read/tokenized properly. This pull request add a simple form of file encoding detection to CPD. When a source file contains a BOM marker, the encoding indicated by the BOM marker is used to read the file. Otherwise the encoding specified on the command line of CPD is used.

@adangel

This comment has been minimized.

Show comment
Hide comment
@adangel

adangel Jan 21, 2016

Owner

Thanks!

Owner

adangel commented Jan 21, 2016

Thanks!

@adangel adangel merged commit 6076065 into adangel:pmd/5.3.x Jan 21, 2016

@tiobe tiobe deleted the tiobe:detect_file_encoding_using_UTF_BOM branch Feb 18, 2016

@adangel adangel changed the title from Added file encoding detection to CPD. to [core] CPD: Added file encoding detection to CPD. Jun 25, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment