Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve validating sanitizer with context for why element/attribute is invalid #3780

Merged
merged 41 commits into from
Dec 12, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
bf28199
Eliminate premature/redundant attribute checks from validate_attr_spe…
westonruter Oct 31, 2019
656ebb0
Remove redundant is_missing_mandatory_attribute check
westonruter Nov 16, 2019
a7f2750
Consolidate removal of elements without mandatory attributes
westonruter Nov 16, 2019
6475b70
Introduce illegal_cdata error code
westonruter Nov 16, 2019
ad440fa
Fix return value docs for validate_attr_spec_list_for_node
westonruter Nov 22, 2019
3f2f50e
Reuse check_attr_spec_rule_mandatory in is_missing_mandatory_attribute
westonruter Nov 23, 2019
3bb3d8e
Introduce specific error codes for attribute value violations
westonruter Nov 23, 2019
ad66b8f
Include list of missing mandatory attributes in validation error
westonruter Nov 23, 2019
35638c5
Extract AMP validator error codes and messages from spec
westonruter Nov 23, 2019
a2e20d6
Use constant in switch statement
westonruter Nov 23, 2019
ef80051
Fix removing emptyable-attributes and fix is_missing_mandatory_attribute
westonruter Nov 23, 2019
2d6ec3d
Add constants for error codes; improve CDATA error reporting
westonruter Nov 24, 2019
00eb55c
Add fine-grained error codes for CDATA
westonruter Nov 24, 2019
a6d0fc9
Revert "Extract AMP validator error codes and messages from spec"
westonruter Nov 24, 2019
15fb6dd
Fix validation of __amp_source_origin URL value
westonruter Nov 25, 2019
c55c58c
Eliminate duplicated testing; add code checking
westonruter Nov 25, 2019
7f4bf4e
Add DISALLOWED_DESCENDANT_TAG error code
westonruter Nov 25, 2019
df96667
Fix checking of empty URL before relative URL
westonruter Nov 25, 2019
7b32198
Add assertions for specific error codes
westonruter Nov 25, 2019
9808761
Move erroneous sanitiation inside of validate_tag_spec_for_node method
westonruter Nov 25, 2019
374e0d3
Add fine-grained error codes for elements that have bad ancestors or …
westonruter Nov 25, 2019
934bbd1
Add constants for normalized error codes used in style sanitizer
westonruter Nov 25, 2019
5da07dc
Ensure body present instead of raising error
westonruter Nov 25, 2019
b0dfd17
Improve error codes used in media converters; add error context data
westonruter Nov 25, 2019
2c42a07
Include spec_name in validation error
westonruter Nov 25, 2019
10b0c29
Include spec_name in validation errors raised by style sanitizer
westonruter Nov 25, 2019
6975bd0
Verify unique tag spec names when generating spec
westonruter Nov 26, 2019
e2db927
Remove redundant info from validation error now that spec_name provided
westonruter Nov 26, 2019
5e56cce
Fix up PHP comments
westonruter Nov 26, 2019
9067d6c
Remove obsolete DISALLOWED_DOMAIN checks
westonruter Nov 28, 2019
5d9c205
Add tests for INVALID_CDATA_CONTENTS and DISALLOWED_RELATIVE_URL
westonruter Nov 28, 2019
7470a24
Add test for MANDATORY_CDATA_MISSING_OR_INCORRECT
westonruter Nov 28, 2019
038caa7
Add tests for MANDATORY_TAG_ANCESTOR, DISALLOWED_TAG_ANCESTOR, and (n…
westonruter Nov 29, 2019
998c45c
Add test for INCORRECT_NUM_CHILD_TAGS
westonruter Nov 29, 2019
22b1869
Remove redundant validation error data; test for non-redundant data
westonruter Nov 29, 2019
8550fd7
Merge branch 'develop' of github.com:ampproject/amp-wp into add/inval…
westonruter Nov 30, 2019
322035b
Bring sanity to the code
schlessera Dec 3, 2019
dc76b9f
Merge branch 'develop' of github.com:ampproject/amp-wp into add/inval…
westonruter Dec 6, 2019
b96e4e7
Fix typos in comments and code style
westonruter Dec 6, 2019
827659a
Harmonize logic for getting stylesheet by URL
westonruter Dec 6, 2019
f7d00f9
Use SORT_REGULAR flag for array_unique() instead of serializing array…
westonruter Dec 6, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ const ValidationErrorMessage = ( { message, code, node_name: nodeName, parent_na
return message;
}

if ( 'invalid_element' === code && nodeName ) {
if ( 'DISALLOWED_TAG' === code && nodeName ) { // @todo Needs to be fleshed out.
return (
<>
{ __( 'Invalid element: ', 'amp' ) }
Expand All @@ -34,7 +34,7 @@ const ValidationErrorMessage = ( { message, code, node_name: nodeName, parent_na
</code>
</>
);
} else if ( 'invalid_attribute' === code && nodeName ) {
} else if ( 'DISALLOWED_ATTR' === code && nodeName ) { // @todo Needs to be fleshed out.
return (
<>
{ __( 'Invalid attribute: ', 'amp' ) }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,17 @@ describe( 'ValidationErrorMessage', () => {
} );

it( 'renders an error for an invalid element', () => {
const errorMessage = render( <ValidationErrorMessage code="invalid_element" node_name="foo" /> );
const errorMessage = render( <ValidationErrorMessage code="DISALLOWED_TAG" node_name="foo" /> );
expect( errorMessage ).toMatchSnapshot();
} );

it( 'renders an error for an invalid attribute', () => {
const errorMessage = render( <ValidationErrorMessage code="invalid_attribute" node_name="bar" /> );
const errorMessage = render( <ValidationErrorMessage code="DISALLOWED_ATTR" node_name="bar" /> );
expect( errorMessage ).toMatchSnapshot();
} );

it( 'renders an error for an invalid attribute with parent node', () => {
const errorMessage = render( <ValidationErrorMessage code="invalid_attribute" node_name="bar" parent_name="baz" /> );
const errorMessage = render( <ValidationErrorMessage code="DISALLOWED_ATTR" node_name="bar" parent_name="baz" /> );
expect( errorMessage ).toMatchSnapshot();
} );

Expand Down
30 changes: 30 additions & 0 deletions bin/amphtml-update.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@
from collections import defaultdict
import imp

seen_spec_names = set()

def Die(msg):
print >> sys.stderr, msg
sys.exit(1)
Expand Down Expand Up @@ -96,6 +98,14 @@ def GeneratePHP(out_dir):

allowed_tags, attr_lists, descendant_lists, reference_points, versions = ParseRules(out_dir)

expected_spec_names = (
'style amp-custom',
'style[amp-keyframes]',
)
for expected_spec_name in expected_spec_names:
if expected_spec_name not in seen_spec_names:
raise Exception( 'Missing spec: %s' % expected_spec_name )

#Generate the output
out = []
GenerateHeaderPHP(out)
Expand Down Expand Up @@ -470,8 +480,28 @@ def GetTagSpec(tag_spec, attr_lists):

cdata_dict['css_spec'] = css_spec
if len( cdata_dict ) > 0:
if 'blacklisted_cdata_regex' in cdata_dict:
if 'error_message' not in cdata_dict['blacklisted_cdata_regex']:
raise Exception( 'Missing error_message for blacklisted_cdata_regex.' );
if cdata_dict['blacklisted_cdata_regex']['error_message'] not in ( 'CSS !important', 'contents', 'html comments' ):
raise Exception( 'Unexpected error_message "%s" for blacklisted_cdata_regex.' % cdata_dict['blacklisted_cdata_regex']['error_message'] );
tag_spec_dict['cdata'] = cdata_dict

if 'spec_name' not in tag_spec_dict['tag_spec']:
if 'extension_spec' in tag_spec_dict['tag_spec']:
# CUSTOM_ELEMENT=1 (default), CUSTOM_TEMPLATE=2
extension_type = tag_spec_dict['tag_spec']['extension_spec'].get('extension_type', 1)
spec_name = 'script [%s=%s]' % ( 'custom-element' if 1 == extension_type else 'custom-template', tag_spec_dict['tag_spec']['extension_spec']['name'].lower() )
else:
spec_name = tag_spec.tag_name.lower()
else:
spec_name = tag_spec_dict['tag_spec']['spec_name']

if '$reference_point' != spec_name:
if spec_name in seen_spec_names:
raise Exception( 'Already seen spec_name: %s' % spec_name )
seen_spec_names.add( spec_name )

return tag_spec_dict


Expand Down
9 changes: 8 additions & 1 deletion includes/sanitizers/class-amp-audio-sanitizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,14 @@ public function sanitize() {
* See: https://github.com/ampproject/amphtml/issues/2261
*/
if ( empty( $sources ) ) {
$this->remove_invalid_child( $node );
$this->remove_invalid_child(
$node,
[
'code' => AMP_Tag_And_Attribute_Sanitizer::ATTR_REQUIRED_BUT_MISSING,
'attributes' => [ 'src' ],
'spec_name' => 'amp-audio',
]
);
} else {
$node->parentNode->replaceChild( $new_node, $node );

Expand Down
30 changes: 23 additions & 7 deletions includes/sanitizers/class-amp-base-sanitizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ abstract class AMP_Base_Sanitizer {
*
* @var array
*/
private $should_not_removed_nodes = [];
private $nodes_to_keep = [];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😜

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, maybe I should mention I searched for usage first:

Image 2019-12-03 at 4 43 35 PM


/**
* AMP_Base_Sanitizer constructor.
Expand Down Expand Up @@ -439,15 +439,15 @@ public function remove_invalid_child( $node, $validation_error = [] ) {
}

// Prevent double-reporting nodes that are rejected for sanitization.
if ( isset( $this->should_not_removed_nodes[ $node->nodeName ] ) && in_array( $node, $this->should_not_removed_nodes[ $node->nodeName ], true ) ) {
if ( isset( $this->nodes_to_keep[ $node->nodeName ] ) && in_array( $node, $this->nodes_to_keep[ $node->nodeName ], true ) ) {
return false;
}

$should_remove = $this->should_sanitize_validation_error( $validation_error, compact( 'node' ) );
if ( $should_remove ) {
$node->parentNode->removeChild( $node );
} else {
$this->should_not_removed_nodes[ $node->nodeName ][] = $node;
$this->nodes_to_keep[ $node->nodeName ][] = $node;
}
return $should_remove;
}
Expand All @@ -463,9 +463,10 @@ public function remove_invalid_child( $node, $validation_error = [] ) {
* @param DOMElement $element The node for which to remove the attribute.
* @param DOMAttr|string $attribute The attribute to remove from the element.
* @param array $validation_error Validation error details.
* @param array $attr_spec Attribute spec.
* @return bool Whether the node should have been removed, that is, that the node was sanitized for validity.
*/
public function remove_invalid_attribute( $element, $attribute, $validation_error = [] ) {
public function remove_invalid_attribute( $element, $attribute, $validation_error = [], $attr_spec = [] ) {
if ( $this->is_exempt_from_validation( $element ) ) {
return false;
}
Expand All @@ -475,10 +476,23 @@ public function remove_invalid_attribute( $element, $attribute, $validation_erro
} else {
$node = $attribute;
}

// Catch edge condition (no known possible way to reach).
if ( ! ( $node instanceof DOMAttr ) || $element !== $node->parentNode ) {
return false;
}

$should_remove = $this->should_sanitize_validation_error( $validation_error, compact( 'node' ) );
if ( $should_remove ) {
$element->removeAttributeNode( $node );
$allow_empty = ! empty( $attr_spec[ AMP_Rule_Spec::VALUE_URL ][ AMP_Rule_Spec::ALLOW_EMPTY ] );
$is_href_attr = ( isset( $attr_spec[ AMP_Rule_Spec::VALUE_URL ] ) && 'href' === $node->nodeName );
if ( $allow_empty && ! $is_href_attr ) {
$node->nodeValue = '';
} else {
$element->removeAttributeNode( $node );
}
}

return $should_remove;
}

Expand Down Expand Up @@ -528,13 +542,15 @@ public function prepare_validation_error( array $error = [], array $data = [] )

if ( $node instanceof DOMElement ) {
if ( ! isset( $error['code'] ) ) {
$error['code'] = AMP_Validation_Error_Taxonomy::INVALID_ELEMENT_CODE;
$error['code'] = AMP_Tag_And_Attribute_Sanitizer::DISALLOWED_TAG;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base class retrieves internals from one of its extending classes.

Can we extract all of the validation error constants into a separate class? I tend to create an interface with only constants in such a case. An interface with constants better communicates what is happening: two different pieces of code have a contract about how to refer to a specific problem. Contracts should be encoded in interface, not use internal logic of one another.

namespace Amp\AmpWP;

interface ValidationError {
	const DISALLOWED_TAG             = 'DISALLOWED_TAG';
	const DISALLOWED_CHILD_TAG       = 'DISALLOWED_CHILD_TAG';
	const DISALLOWED_FIRST_CHILD_TAG = 'DISALLOWED_FIRST_CHILD_TAG';
	// [...]
}

Thi is separate of any inheritance chain, and provides one central place where you retrieve constants from. The interface name also makes the syntax nicer to read:

$error['code'] = ValidationError::DISALLOWED_TAG;

If we need to have more grouping, we can have multiple interfaces, like CssValidationError, HtmlValidationError, ...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. On one hand, I think this should just be removed entirely, and let the code be UNKNOWN_ERROR. Each sanitizer should be responsible for indicating the code, specifically.

Otherwise, I think this should be scoped with what you discussed in #3780 (comment). It's bound up with generating PHP objects from the spec.

So I think we should do it, but probably not in the scope of this PR.

}

if ( ! isset( $error['type'] ) ) {
// @todo Also include javascript: protocol for URL errors.
$error['type'] = 'script' === $node->nodeName ? AMP_Validation_Error_Taxonomy::JS_ERROR_TYPE : AMP_Validation_Error_Taxonomy::HTML_ELEMENT_ERROR_TYPE;
}

// @todo Change from node_attributes to element_attributes to harmonize the two.
if ( ! isset( $error['node_attributes'] ) ) {
$error['node_attributes'] = [];
foreach ( $node->attributes as $attribute ) {
Expand All @@ -555,7 +571,7 @@ public function prepare_validation_error( array $error = [], array $data = [] )
}
} elseif ( $node instanceof DOMAttr ) {
if ( ! isset( $error['code'] ) ) {
$error['code'] = AMP_Validation_Error_Taxonomy::INVALID_ATTRIBUTE_CODE;
$error['code'] = AMP_Tag_And_Attribute_Sanitizer::DISALLOWED_ATTR;
}
if ( ! isset( $error['type'] ) ) {
// If this is an attribute that begins with on, like onclick, it should be a js_error.
Expand Down
2 changes: 1 addition & 1 deletion includes/sanitizers/class-amp-core-theme-sanitizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ public static function get_supported_themes() {
public static function get_acceptable_errors( $template ) {
if ( isset( self::$theme_features[ $template ] ) ) {
return [
'illegal_css_at_rule' => [
AMP_Style_Sanitizer::CSS_SYNTAX_INVALID_AT_RULE => [
[
'at_rule' => 'viewport',
],
Expand Down
9 changes: 8 additions & 1 deletion includes/sanitizers/class-amp-iframe-sanitizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,14 @@ public function sanitize() {
* @see: https://github.com/ampproject/amphtml/issues/2261
*/
if ( empty( $normalized_attributes['src'] ) ) {
$this->remove_invalid_child( $node );
$this->remove_invalid_child(
$node,
[
'code' => AMP_Tag_And_Attribute_Sanitizer::ATTR_REQUIRED_BUT_MISSING,
'attributes' => [ 'src' ],
'spec_name' => 'amp-iframe',
]
);
continue;
}

Expand Down
9 changes: 8 additions & 1 deletion includes/sanitizers/class-amp-img-sanitizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,14 @@ public function sanitize() {
}

if ( ! $node->hasAttribute( 'src' ) || '' === trim( $node->getAttribute( 'src' ) ) ) {
$this->remove_invalid_child( $node );
$this->remove_invalid_child(
$node,
[
'code' => AMP_Tag_And_Attribute_Sanitizer::ATTR_REQUIRED_BUT_MISSING,
'attributes' => [ 'src' ],
'spec_name' => 'amp-img',
]
);
continue;
}

Expand Down
1 change: 0 additions & 1 deletion includes/sanitizers/class-amp-rule-spec.php
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,6 @@ abstract class AMP_Rule_Spec {
const ALLOWED_PROTOCOL = 'protocol';
const ALTERNATIVE_NAMES = 'alternative_names';
const BLACKLISTED_VALUE_REGEX = 'blacklisted_value_regex';
const DISALLOWED_DOMAIN = 'disallowed_domain';
const MANDATORY = 'mandatory';
const VALUE = 'value';
const VALUE_CASEI = 'value_casei';
Expand Down
Loading