New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for parsing large SPARQL XML Results #357
Conversation
You could create a big file during a test, parse it and remove it afterwards. But as far as I understand the documentation about
... it changes parameters not behavior. So a test might not be required here. Why another PR besides #352? |
#352 is a fix for parsing large RDF/XML files. This PR is for parsing SPARQL Results and XML Literals. |
I wrote a quick test to check if adding It just created a result set with a large number of public function testSelectHugeXml()
{
$huge = "<sparql xmlns=\"http://www.w3.org/2005/sparql-results#\">\n";
$huge .= "<head><variable name=\"s\"/><variable name=\"p\"/><variable name=\"o\"/></head>\n";
$huge .= "<results>\n";
for ($i = 1; $i < 50000; $i++) {
$huge .= "<result>\n";
$huge .= "<binding name=\"s\"><uri>http://www.example.com/person/$i</uri></binding>\n";
$huge .= "<binding name=\"p\"><uri>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</uri></binding>\n";
$huge .= "<binding name=\"o\"><uri>http://xmlns.com/foaf/0.1/Person</uri></binding>\n";
$huge .= "</result>\n";
}
$huge .= "</results>\n</sparql>\n";
// Check it is more than 10Mb
$this->assertGreaterThan(10485760, strlen($huge));
$result = new Result($huge, 'application/sparql-results+xml');
$this->assertCount(50000, $result);
$this->assertSame(50000, $result->numRows());
} However it is so slow to run, it takes longer than 60 seconds and the test times out. I don't think it is the parsing of the XML that is slow - it is the stepping through the results that is slow 😞 |
I have rewritten the SPARQL XML parser to use It now parses results more than 10Mb in size and much much faster. Would be good to profile the old and new parser and see how much faster it is. |
What is the state of this PR? |
The code currently in the PR is quite hacky - it is just based on XML element names - rather than checking that the path is correct. This could cause it to do weird things. I started refactoring it so that there is a new I am in the process for creating a The new Hoping to get some time this weekend to finish this PR. |
I have finished creating a The performance improvement is significant. I did some quick parse time check on my laptop, running PHP 7.2.31. The units are seconds(!).
I had to use a logarithmic scale, to compare the two on a chart: The tests are all passing - included the new test that checks it can parse a XML document larger than 10 Megabytes. @k00ni please can you review and merge if you are happy? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic job!
The code looks good. I would change a few comments for styling reasons, but its fine. Will keep this open for a few days, if that's alright. This way our community has a chance to comment.
@zozlak if you have the time, it would be great if you could also have a look here.
Replaces #294
Does it need a test for the two places loadXML() is called?
I would rather not add a large file to the Git repo.