ANY23-247 FIX Attribute name itemscope associated with an element type html must be followed by the ' = ' character.#17
Conversation
There was a problem hiding this comment.
How is this class recognised or instantiated? META-INF/services/ or another method?
There was a problem hiding this comment.
I looked for it being registered during a single document extraction. It was my understanding that validation and fixes are registered and active as part of the extraction parameters agenda? If a vanilla SingleDocumentExtration is invoked... as per the Any23Test then by default the Fixes and Validations are activated.
There was a problem hiding this comment.
It may be done using a classpath scan. I will look into it further.
There was a problem hiding this comment.
Ack
On Monday, March 30, 2015, Peter Ansell notifications@github.com wrote:
In
core/src/main/java/org/apache/any23/validator/rule/MissingItemscopeAttributeValueRule.java
#17 (comment):+/**
- * This fixes missing attribute values for the 'itemscope' attribute,
- * which was be associated with
nodes.- * Typically when such a snippet of XHTML is fed through the
- * {@link org.apache.any23.extractor.rdfa.RDFa11Extractor}, and
- * subsequently to Sesame's {@link org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser},
- * it will result in the following behavior.
- *
- * {@code
- * [Fatal Error] :23:15: Attribute name "itemscope" associated with an element type "div" must be followed by the ' = ' character.
- * }
- *
- * This Fix is an effort to mitigate against that happening.
- */
+public class MissingItemscopeAttributeValueRule implements Fix {It may be done using a classpath scan. I will look into it further.
—
Reply to this email directly or view it on GitHub
https://github.com/apache/any23/pull/17/files#r27442717.
Lewis
There was a problem hiding this comment.
Everything I've uploaded to the patch is what I have coded. There is no
other black magic on my end to get this invoked.
On Monday, March 30, 2015, Lewis John Mcgibbney lewis.mcgibbney@gmail.com
wrote:
Ack
On Monday, March 30, 2015, Peter Ansell <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:In
core/src/main/java/org/apache/any23/validator/rule/MissingItemscopeAttributeValueRule.java
#17 (comment):+/**
- * This fixes missing attribute values for the 'itemscope' attribute,
- * which was be associated with
nodes.- * Typically when such a snippet of XHTML is fed through the
- * {@link org.apache.any23.extractor.rdfa.RDFa11Extractor}, and
- * subsequently to Sesame's {@link org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser},
- * it will result in the following behavior.
- *
- * {@code
- * [Fatal Error] :23:15: Attribute name "itemscope" associated with an element type "div" must be followed by the ' = ' character.
- * }
- *
- * This Fix is an effort to mitigate against that happening.
- */
+public class MissingItemscopeAttributeValueRule implements Fix {It may be done using a classpath scan. I will look into it further.
—
Reply to this email directly or view it on GitHub
https://github.com/apache/any23/pull/17/files#r27442717.Lewis
Lewis
There was a problem hiding this comment.
There is a hardcoded set in DefaultValidator.loadDefaultRules, but I can't find any place that is doing classpath scanning there.
I also do not understand the relationship between Rule and Fix. In the DefaultValidator, there are either Rule, or Rule+Fix, not just a Fix like you have here.
I will look into it further when I get a chance.
|
Could you rebase your branch onto upstream master and try again? The place where the error started to become visible as a test failure (when I started rethrowing an exception that was being swallowed incorrectly) is on the current master, but your master branch is 4 commits behind that so the test will still silently succeed on your branch. |
|
@ansell done, the branch is now 2 ahead of master. Yes you are correct, my results are as follows |
|
By the way @ansell, an observation is that whenever we make an attempt to infer the document language, we never succeed. It is always returns null. On every single occasion I get back null. |
|
When I debug this, a good place to set a breakpoint is at line |
|
@ansell the line I am getting the error on is away down in semargl here |
…e html must be followed by the ' = ' character
|
hi @ansell OK I've added in the correct rule and fix as well as a test to verify that empty itemscope values are identified and fixed.
What do you think? |
|
The system does seem a little too complex for our purposes and isn't usable because of that. Removing generics would be the first step IMO as there are too many rawtypes definitions which indicate generics are being used badly. ContentExtractor may be able to be completely removed instead of being refitted into the process after that and the parser should always be set to parse as far as practical for our purposes. It is a little strange that there isn't a buffered, markable, InputStream provided for all of the steps to reuse as necessary rather than pushing a raw InputStream or other source into different extractors. |
|
I agree. Jumping through this in the debugged made me think the same. I would like to propose that this PR is committed to master as is, we then Any thoughts Peter? Thanks fr quick response. On Friday, March 25, 2016, Peter Ansell notifications@github.com wrote:
Lewis |
|
I tested this pull request and it has a few failing tests for me. I know that the Any23 master hasn't been perfect for its test record (mostly due to unreliable remote queries), but I haven't been watching recently to know which tests are expected to fail. |
|
ACK @ansell , master branch is unstable with the following test failures https://builds.apache.org/view/A-D/view/Any23/job/Any23-trunk/1466/#showFailuresLink If you can reproduce this locally (or up until your test build fails within core with 3 failing tests) then that is the 'expected' behaviour right now. The Microdata test is directly related to the issue we are now discussing here. This issue is the most pressing for Any23 right now, IMHO it is a complete blocker to us releasing Any23 1.2. Thanks for the review. |
|
@ansell any further comments here? I will try to get to work on the larger issue this week. |
…pe html must be followed by the ' = ' character. this closes #17
Hi Folks,
PR which fixes this issue locally. I am getting clean builds now again after introducing this new MissingItemscopeAttributeValueRule class.