Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NonPrintableCheck sniff #103

Closed

Conversation

selul
Copy link

@selul selul commented Jan 6, 2017

Non printable chars sniffs for css,js and php files, implementing the themecheck plugin rule.

Reference #94

Copy link

@jrfnl jrfnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @selul Thank you ever so much for this! Looks great!

I've left some feedback and questions inline.

One additional question: this is quite a generic sniff and very re-usable by other (WP) standards.
I'm wondering if placing it in the Theme subset is correct or if it should be better placed in the Files subset.

I'd also like to suggest you may want to ask upstream (PHPCS) if they'd be interested in this sniff as I got a feeling it would be well placed in the Generic ruleset.

* @return array
*/
public function register() {
return ( array_merge( PHP_CodeSniffer_Tokens::$commentTokens, PHP_CodeSniffer_Tokens::$emptyTokens, PHP_CodeSniffer_Tokens::$stringTokens, array(
Copy link

@jrfnl jrfnl Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be very interested to hear the rationale of how you decided which tokens to look at for these type of characters.
Will bad characters always be in these tokens or is there - even a remote chance - that they could be tokenized differently ?

Some additional suggestions:

Two side-notes:

  • PHP_CodeSniffer_Tokens::$commentTokens is a sub-set of PHP_CodeSniffer_Tokens::$emptyTokens, so you only need the last one.
  • Layout-wise I'd make this a multi-line function call with each parameter on a new line for readability.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrfnl Yes, my assumption was that the non-printable chars can be sneaked in those kind of tokens without breaking the linting, so my best guess was the selected categories.
I have also added the T_BAD_CHARCTER but i saw that is breaking the build on PHP7.

Copy link

@jrfnl jrfnl Jan 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just wondering if there are still more tokens in which they could be sneaked in...

Regarding the build break for PHP7, if you change the code within the register() method to the below, that should fix it:

	$targets = array_merge(
		PHP_CodeSniffer_Tokens::$emptyTokens,
		PHP_CodeSniffer_Tokens::$heredocTokens,
		PHP_CodeSniffer_Tokens::$stringTokens
	);
	$targets[T_STRING]      = T_STRING;
	$targets[T_INLINE_HTML] = T_INLINE_HTML;

	if ( defined( 'T_BAD_CHARACTER' ) ) {
		$targets[T_BAD_CHARACTER] = T_BAD_CHARACTER;
	}

	return $targets;

Note: as we're defining a variable anyway, I've moved the extra tokens out of the array_merge(). array_merge() is expensive, while straight array assignments are not.
Also - for the extra ones, I've followed the pattern which the PHP_CodeSniffer_Tokens properties already use of having the token as both the key as well as the value in the array.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrfnl thanks for help and feedback, i have pushed the changes for this

$token = $tokens[ $stackPtr ];
$filename = $phpcsFile->getFilename();
if ( preg_match( '/[\x00-\x08\x0B-\x0C\x0E-\x1F\x80-\xFF]/', $token['content'], $matches ) ) {
$warning = 'Non-printable characters were found in the ' . $filename . ' file. You may want to check this file for errors.';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PHPCS natively supports sprintf() type of functionality in the addWarning/Errror() methods by passing a $data array as the fourth parameter. Please consider using this instead, i.e. use a placeholder for the variable and pass the variable to the addWarning() method as the fourth parameter array( $filename ).

Copy link
Member

@grappler grappler Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand there is not much need to mention in which file the error is as the line number and file is mention in the PHPCs output.

$filename = $phpcsFile->getFilename();
if ( preg_match( '/[\x00-\x08\x0B-\x0C\x0E-\x1F\x80-\xFF]/', $token['content'], $matches ) ) {
$warning = 'Non-printable characters were found in the ' . $filename . ' file. You may want to check this file for errors.';
$phpcsFile->addWarning( $warning, $stackPtr, 'NonPrintableCheck' );
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a warning or an error ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an error.

$filename = $phpcsFile->getFilename();
if ( preg_match( '/[\x00-\x08\x0B-\x0C\x0E-\x1F\x80-\xFF]/', $token['content'], $matches ) ) {
$warning = 'Non-printable characters were found in the ' . $filename . ' file. You may want to check this file for errors.';
$phpcsFile->addWarning( $warning, $stackPtr, 'NonPrintableCheck' );
Copy link

@jrfnl jrfnl Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also wondering if - when PHPCS finds a bad character - we should even bother checking the tokens in the rest of the file or if we should just skip to the end of the file using return $phpcsFile->numTokens; ?

This might give issues with the unit tests, but we can work around that with a public $skip_to_end = true; class property combined with a // @codingStandardsChangeSetting WordPress.Theme.NonPrintableCheck skip_to_end false annotation if it would be decided this is the way to go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be for checking the whole file as it otherwise frustrating finding new issues after fixing the first issue.

*/

/**
* Restricts the use of non-printable characters in themes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in themes can be removed. The sniff can be used on any file, not just WP theme files.

@@ -0,0 +1,8 @@
/*
Copy link

@jrfnl jrfnl Jan 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ease of understanding what you're testing for on each line, it would be great if you would add some indication of the character "hidden" in each line. Something like Bad - ASCII 07.
This goes for all three test files.

Also, please mind punctuation in the test files. The Good/Bad comments should generally start with a capital and end with a . period.

6 => 1,
7 => 1,
);
break;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to break if you're returning in the line above as that effectively breaks you out of the switch (and the whole function) already. (same below)

Changed testfiles to match the comments standards and added the ascii codes
Changed report from warning to error
Improved register tokens
@selul
Copy link
Author

selul commented Jan 7, 2017

@grappler @jrfnl I have added the requested changes. The build fails on PHP 7 as the T_BAD_CHARACTER is no longer available.

@jrfnl
Copy link

jrfnl commented Jan 7, 2017

@selul Thanks for the update. I'll try and do another review run through tomorrow.

Copy link
Member

@grappler grappler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I was testing this sniffs on a few existing themes I found this example. In this case °, and are non-printable characters. I am not sure if it is valid code but seems to make sense.

$str = abs($lat['deg']) . '°' . $lat['min'] . '′' . round($lat['sec']) . '″';

* @return array
*/
public function register() {
$targets = array_merge(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would only have one space between the variable and the equal sign but wait till we get @jrfnl's feedback on it.

$tokens = $phpcsFile->getTokens();
$token = $tokens[ $stackPtr ];
if ( preg_match( '/[\x00-\x08\x0B-\x0C\x0E-\x1F\x80-\xFF]/', $token['content'], $matches ) ) {
$phpcsFile->addError( 'Non-printable characters were found in the file. You may want to check this file for errors.', $stackPtr, 'NonPrintableCheck' );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I was testing the sniff, I realized there is no way to work out in which position the non-printable character is. So what about changing the message to:

			$phpcsFile->addError(
				'Non-printable character was found in column %s.',
				$stackPtr,
				'NonPrintableCheck',
				array(
					$token['column']
				)
			);

@grappler
Copy link
Member

grappler commented Jan 19, 2017

@jrfnl I checked with the Theme Check plugin and I am getting the same result which makes sense as the regex is the same but the in the theme check this is only a info, so it is not required.

It would be good to do some research on the subject. @selul would you be willing to help out.

Do symbols like °, and count as non-printable characters and what negative affect can they have?

);
default:
return array();
} // End switch().
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also multiple spaces between the closing bracket and the comment.

@grappler
Copy link
Member

I spoke to Otto and the reason this was added in the Theme Check was to check for Byte Order Mark. Explanation. This is now being checked with Generic.Files.ByteOrderMark

The check was made a bit looser to try to catch other issues too.

I did a bit of research, non printable characters are things like line breaks. In order for this sniff to work we would need to think of places where it does not make sense to have non printable characters. I am not sure how many places there are.

More of the ASCII codes can be found in the links 1-31 and 127-159 are the invisible characters.
http://ascii.cl/htmlcodes.htm
http://www.ascii-code.com/

@jrfnl
Copy link

jrfnl commented May 31, 2018

Closing this PR in favour of continuing the discussion in the original issue #94.

Once that discussion has run its course, we can always reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants