Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JENA-2031: Refine IRI settings #940

Merged
merged 2 commits into from
Mar 1, 2021
Merged

JENA-2031: Refine IRI settings #940

merged 2 commits into from
Mar 1, 2021

Conversation

afs
Copy link
Member

@afs afs commented Mar 1, 2021

This is work inspired by parsing wikidata with Jena3, finding all the types of warnings and errors that arise then making the new code agree with Jena3, except where newer RFCs have changed the situation (e.g. percent encoding in DNS host names is now legal). The code also reflect jena's behaviour of warning about bad IRIs but not signalling a parse error - very large datasets have some less-than-perfect IRIs in them and aborting a load because of this is annoying.

Tests added to pin down expectations.

There is also a lot of clearing up and refactoring.

Performance checked: the Jena3 and Jena4 times to parse (BSBM data) are the same.

Hopefully, the last large IRI-related PR!

@@ -291,6 +293,14 @@ public RDFParserBuilder httpClient(HttpClient httpClient) {
*/
public RDFParserBuilder resolveURIs(boolean flag) { this.resolveURIs = flag ; return this; }

/**
* Provide a specific {@link IRIxResolver} to check and resolve URIs. It's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/It's/Its


// Whitespace.
// XSD allows whitespace before and after the lexical forms of a literal but not
// insiode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/insiode/inside

private static IRIx setSystemBase(String baseStr) {
/**
* Create an {@link IRIx} suitable for a system base.
* This oepration always returns an {@link IRIx}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/oepration/operation

@afs afs merged commit 66bf616 into apache:main Mar 1, 2021
@afs afs deleted the irix-more-more branch March 1, 2021 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants