Treat files containing invalid byte sequences as non-matches #4310

mclark · 2017-04-24T23:57:55Z

If a file path contains a non-UTF8 character and we are matching
against a regular expression, an ArgumentError is thrown,
failing the entire operation. Since it is not even a valid byte
sequence for the encoding, we can assume it is not a match and
return false instead.

With this change, if there are paths on the file system that are not
valid UTF-8 we will just treat them as not matching the
regular expression and continue processing instead of failing
with an exception.

Before submitting the PR make sure the following are checked:

Wrote [good commit messages][1].
Used the same coding conventions as the rest of the project.
Feature branch is up-to-date with master (if not - rebase it).
All tests are passing.
The new code doesn't generate RuboCop offenses.
The PR relates to only one subject with a clear title
and description in grammatically correct, complete sentences.

bbatsov · 2017-04-25T04:14:58Z

spec/rubocop/path_util_spec.rb

@@ -78,7 +78,7 @@

    it 'matches regexps' do
      expect(described_class.match_path?(/^d.*e$/, 'dir/file')).to be_truthy
-      expect(described_class.match_path?(/^d.*e$/, 'dir/filez')).to be_falsey
+      expect(described_class.match_path?(/^d.*e$/, "dir/file\xBF")).to be_falsey


I'd probably have this as a separate example for clarity's sake.

bbatsov · 2017-04-25T04:15:48Z

Your change should be accompanied by a changelog entry.

If a file path contains a non-UTF8 character and we are matching against a regular expression, an ArgumentError is thrown, failing the entire operation. Since it is not a valid byte sequence for the encoding, we can assume it is not a match and return false instead.

mclark · 2017-04-25T13:49:45Z

@bbatsov thanks for the 👀 ! I've addressed your feedback, feel free to take another look when you get a chance.

bbatsov · 2017-04-25T15:40:44Z

👍

mclark force-pushed the fix-regexp-path-matching branch 2 times, most recently from 102eaef to 8cd52a6 Compare April 25, 2017 00:18

bbatsov reviewed Apr 25, 2017

View reviewed changes

mclark force-pushed the fix-regexp-path-matching branch 2 times, most recently from 9cb1248 to d6b2638 Compare April 25, 2017 13:12

mclark force-pushed the fix-regexp-path-matching branch from d6b2638 to 6469ea0 Compare April 25, 2017 13:13

bbatsov merged commit 2b369fd into rubocop:master Apr 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Treat files containing invalid byte sequences as non-matches #4310

Treat files containing invalid byte sequences as non-matches #4310

mclark commented Apr 24, 2017

bbatsov Apr 25, 2017

bbatsov commented Apr 25, 2017

mclark commented Apr 25, 2017

bbatsov commented Apr 25, 2017

Treat files containing invalid byte sequences as non-matches #4310

Treat files containing invalid byte sequences as non-matches #4310

Conversation

mclark commented Apr 24, 2017

bbatsov Apr 25, 2017

Choose a reason for hiding this comment

bbatsov commented Apr 25, 2017

mclark commented Apr 25, 2017

bbatsov commented Apr 25, 2017