New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NonPrintableCheck sniff #103
Conversation
…sniff for php, js and css files Reference WPTT#94
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @selul Thank you ever so much for this! Looks great!
I've left some feedback and questions inline.
One additional question: this is quite a generic sniff and very re-usable by other (WP) standards.
I'm wondering if placing it in the Theme
subset is correct or if it should be better placed in the Files
subset.
I'd also like to suggest you may want to ask upstream (PHPCS) if they'd be interested in this sniff as I got a feeling it would be well placed in the Generic
ruleset.
* @return array | ||
*/ | ||
public function register() { | ||
return ( array_merge( PHP_CodeSniffer_Tokens::$commentTokens, PHP_CodeSniffer_Tokens::$emptyTokens, PHP_CodeSniffer_Tokens::$stringTokens, array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be very interested to hear the rationale of how you decided which tokens to look at for these type of characters.
Will bad characters always be in these tokens or is there - even a remote chance - that they could be tokenized differently ?
Some additional suggestions:
PHP_CodeSniffer_Tokens::$heredocTokens
T_BAD_CHARACTER
- [New sniff?] Check for non-printable characters #94 (comment)
Two side-notes:
PHP_CodeSniffer_Tokens::$commentTokens
is a sub-set ofPHP_CodeSniffer_Tokens::$emptyTokens
, so you only need the last one.- Layout-wise I'd make this a multi-line function call with each parameter on a new line for readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrfnl Yes, my assumption was that the non-printable chars can be sneaked in those kind of tokens without breaking the linting, so my best guess was the selected categories.
I have also added the T_BAD_CHARCTER but i saw that is breaking the build on PHP7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just wondering if there are still more tokens in which they could be sneaked in...
Regarding the build break for PHP7, if you change the code within the register()
method to the below, that should fix it:
$targets = array_merge(
PHP_CodeSniffer_Tokens::$emptyTokens,
PHP_CodeSniffer_Tokens::$heredocTokens,
PHP_CodeSniffer_Tokens::$stringTokens
);
$targets[T_STRING] = T_STRING;
$targets[T_INLINE_HTML] = T_INLINE_HTML;
if ( defined( 'T_BAD_CHARACTER' ) ) {
$targets[T_BAD_CHARACTER] = T_BAD_CHARACTER;
}
return $targets;
Note: as we're defining a variable anyway, I've moved the extra tokens out of the array_merge()
. array_merge()
is expensive, while straight array assignments are not.
Also - for the extra ones, I've followed the pattern which the PHP_CodeSniffer_Tokens
properties already use of having the token as both the key as well as the value in the array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrfnl thanks for help and feedback, i have pushed the changes for this
$token = $tokens[ $stackPtr ]; | ||
$filename = $phpcsFile->getFilename(); | ||
if ( preg_match( '/[\x00-\x08\x0B-\x0C\x0E-\x1F\x80-\xFF]/', $token['content'], $matches ) ) { | ||
$warning = 'Non-printable characters were found in the ' . $filename . ' file. You may want to check this file for errors.'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PHPCS natively supports sprintf()
type of functionality in the addWarning/Errror()
methods by passing a $data
array as the fourth parameter. Please consider using this instead, i.e. use a placeholder for the variable and pass the variable to the addWarning()
method as the fourth parameter array( $filename )
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand there is not much need to mention in which file the error is as the line number and file is mention in the PHPCs output.
$filename = $phpcsFile->getFilename(); | ||
if ( preg_match( '/[\x00-\x08\x0B-\x0C\x0E-\x1F\x80-\xFF]/', $token['content'], $matches ) ) { | ||
$warning = 'Non-printable characters were found in the ' . $filename . ' file. You may want to check this file for errors.'; | ||
$phpcsFile->addWarning( $warning, $stackPtr, 'NonPrintableCheck' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a warning or an error ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an error.
$filename = $phpcsFile->getFilename(); | ||
if ( preg_match( '/[\x00-\x08\x0B-\x0C\x0E-\x1F\x80-\xFF]/', $token['content'], $matches ) ) { | ||
$warning = 'Non-printable characters were found in the ' . $filename . ' file. You may want to check this file for errors.'; | ||
$phpcsFile->addWarning( $warning, $stackPtr, 'NonPrintableCheck' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also wondering if - when PHPCS finds a bad character - we should even bother checking the tokens in the rest of the file or if we should just skip to the end of the file using return $phpcsFile->numTokens;
?
This might give issues with the unit tests, but we can work around that with a public $skip_to_end = true;
class property combined with a // @codingStandardsChangeSetting WordPress.Theme.NonPrintableCheck skip_to_end false
annotation if it would be decided this is the way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be for checking the whole file as it otherwise frustrating finding new issues after fixing the first issue.
*/ | ||
|
||
/** | ||
* Restricts the use of non-printable characters in themes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in themes
can be removed. The sniff can be used on any file, not just WP theme files.
@@ -0,0 +1,8 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ease of understanding what you're testing for on each line, it would be great if you would add some indication of the character "hidden" in each line. Something like Bad - ASCII 07
.
This goes for all three test files.
Also, please mind punctuation in the test files. The Good/Bad comments should generally start with a capital and end with a .
period.
6 => 1, | ||
7 => 1, | ||
); | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to break
if you're return
ing in the line above as that effectively breaks you out of the switch (and the whole function) already. (same below)
Changed testfiles to match the comments standards and added the ascii codes Changed report from warning to error Improved register tokens
@selul Thanks for the update. I'll try and do another review run through tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I was testing this sniffs on a few existing themes I found this example. In this case °
, ′
and ″
are non-printable characters. I am not sure if it is valid code but seems to make sense.
$str = abs($lat['deg']) . '°' . $lat['min'] . '′' . round($lat['sec']) . '″';
* @return array | ||
*/ | ||
public function register() { | ||
$targets = array_merge( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would only have one space between the variable and the equal sign but wait till we get @jrfnl's feedback on it.
$tokens = $phpcsFile->getTokens(); | ||
$token = $tokens[ $stackPtr ]; | ||
if ( preg_match( '/[\x00-\x08\x0B-\x0C\x0E-\x1F\x80-\xFF]/', $token['content'], $matches ) ) { | ||
$phpcsFile->addError( 'Non-printable characters were found in the file. You may want to check this file for errors.', $stackPtr, 'NonPrintableCheck' ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I was testing the sniff, I realized there is no way to work out in which position the non-printable character is. So what about changing the message to:
$phpcsFile->addError(
'Non-printable character was found in column %s.',
$stackPtr,
'NonPrintableCheck',
array(
$token['column']
)
);
@jrfnl I checked with the Theme Check plugin and I am getting the same result which makes sense as the regex is the same but the in the theme check this is only a info, so it is not required. It would be good to do some research on the subject. @selul would you be willing to help out. Do symbols like |
); | ||
default: | ||
return array(); | ||
} // End switch(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also multiple spaces between the closing bracket and the comment.
I spoke to Otto and the reason this was added in the Theme Check was to check for Byte Order Mark. Explanation. This is now being checked with The check was made a bit looser to try to catch other issues too. I did a bit of research, non printable characters are things like line breaks. In order for this sniff to work we would need to think of places where it does not make sense to have non printable characters. I am not sure how many places there are. More of the ASCII codes can be found in the links 1-31 and 127-159 are the invisible characters. |
Closing this PR in favour of continuing the discussion in the original issue #94. Once that discussion has run its course, we can always reopen. |
Non printable chars sniffs for css,js and php files, implementing the themecheck plugin rule.
Reference #94