Skip to content
This repository was archived by the owner on Jul 3, 2023. It is now read-only.

ANY23-226 Extract JSON-LD embedded in HTML#16

Merged
asfgit merged 1 commit intoapache:masterfrom
lewismc:ANY23-226
Mar 21, 2015
Merged

ANY23-226 Extract JSON-LD embedded in HTML#16
asfgit merged 1 commit intoapache:masterfrom
lewismc:ANY23-226

Conversation

@lewismc
Copy link
Member

@lewismc lewismc commented Mar 20, 2015

Initial patch for this support.
It is not working correctly @ansell can you have a look into the parsing of JSONLD textual content?
I've provided a '//' comment to where I can see the correct parser being selected. It seems to not parse and extract the JSONLD so I know I am doing something wrong.
Thank you very much @ansell if you can have a wee look.

@lewismc
Copy link
Member Author

lewismc commented Mar 20, 2015

Important to state, this is largely based off of our existing META extractor. We are merely looking for /HTML/HEAD/SCRIPT/ presence.
Therefore, this initial effort needs to be augmented by a fully functional implementation which can catch presence of JSONLD in body as well.

@asfgit asfgit merged commit 1e3eb9c into apache:master Mar 21, 2015
@ansell
Copy link
Member

ansell commented Mar 21, 2015

The main bug was that the entire script node was being sent to JSONLD-Java, and not just its content.

However, I also made a few other changes while doing that testing.

It turned out that the jsonld was invalid, but somehow the exception when parses fail was changed to be silently swallowed, so the only indication was that the count was 0. I turned on the exception propagation again (no reason it should be swallowed outside of temporary testing).

However, in addition to the 4 tests currently failing on the core tests, there are now other tests failing due to an inability to parse "<div itemscope>"

@lewismc
Copy link
Member Author

lewismc commented Mar 21, 2015

Ok Peter thank you for looking. This is great. I have not seen the test
failures. Can you please tell me if it is in Any23 or in jsonld-Java?
We could upgrade the Jsonld-Java implementation as well. To the 0.5.1
release

On Saturday, March 21, 2015, Peter Ansell notifications@github.com wrote:

The main bug was that the entire script node was being sent to
JSONLD-Java, and not just its content.

However, I also made a few other changes while doing that testing.

It turned out that the jsonld was invalid, but somehow the exception when
parses fail was changed to be silently swallowed, so the only indication
was that the count was 0. I turned on the exception propagation again (no
reason it should be swallowed outside of temporary testing).

However, in addition to the 4 tests currently failing on the core tests,
there are now other tests failing due to an inability to parse "
"


Reply to this email directly or view it on GitHub
#16 (comment).

Lewis

@ansell
Copy link
Member

ansell commented Mar 22, 2015

The test failures are in the Microdata parsing code, not JSONLD-Java, so I thought it was fine to push this even though it was going to break the Jenkins build (it was already silently broken before due to the swallowed exception). The JSONLD parsing now works, the key fix on what you had done was to send the first child of the script element, which is the actual JSON code.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants