Microdata parser: updated the parser to the latest version of the microdata->rdf note (published in December 2014) #443

joernhees · 2014-12-15T18:15:18Z

re-based microdata-to-rdf-second-edition from six_2to3 branch as described here: b082c48#commitcomment-8975378

pull request to run the tests...

…rodata->rdf note (published in December 2014)

joernhees · 2014-12-15T18:49:00Z

rdflib/plugins/parsers/pyMicrodata/__init__.py

-				self.base = 'file://'+name
-				return open(name, 'rb')
+				self.base = name
+				return file(name)


@iherman from the test runs it seems you actually re-introduced bug #375 here (already checked, wasn't because of the rebase)

joernhees · 2014-12-16T10:00:27Z

@iherman sooo... the problem is that with both, your original changes on top of the six_2to3 branch (see #444) as well as after rebasing them onto master (as here), there are some tests not passing. I tried out if it's just the two lines that i commented on above in #445 but sadly that didn't do the trick.

joernhees · 2014-12-16T10:08:56Z

so i guess something in this change (no matter based on which branch) isn't compatible with the tests introduced for #375. I still think that the two lines that i restored in #445 are part of the problem but seems to be something else still.

I'd suggest to just continue developing on this branch (after hard resetting your local one to it). If you push to it this Pull Request will update and automatically re-run the tests.

In case you want to try yourself you can still make an own Pull Request from your original change based on the six_2to3 branch, but it should be identical to #444.

iherman · 2014-12-16T11:08:25Z

Jörn,

before I do anything... I wonder whether it is not cleaner if we do a fresh start. Here is what I have in mind.

For historical and practical reason, I do maintain the 'original' repo for pyMicrodata:

https://github.com/RDFLib/pymicrodata

I need this, because I have to maintain a microdata service on the W3C site without installing the full updated RDFLib (you do not want to know the details:-). That was the repo that I originally used, that I then moved to RDFLib when I pushed the microdata (and the RDFa 1.1) parsers into the main branch. In both cases I simply made a copy of the whole package into RDFLib and created a separate interface to the plugin structure of RDFLib.

They way I updated the implementation last week was to update the separate version first and test it with all the microdata tests. The changes are on my machine, not on the repo yet; I want to do push it when the new microdata->RDF document is published.

To make the update on RDFLib, what I intended to do is to, essentially, copy the content of that original package into RDFLib (under pyMicrodata) and make the update of the interface. Essentially what I did in the past, thus. I did two mistakes, however

I messed up and created a branch from the six... branch instead of master. Huge stupidity on my part, that is where the mess started.
I did not realize or, more exactly, I did not remember that you (or somebody) made some changes on the pyMicrodata/init.py, ie, I blindly changed the init.py, re-introducing some old issues.

So... isn't it simpler if

you, somehow, using some github wizardry, roll back. Meaning removing (and really removing from the surface of the Earth!) everything I did, I to get back to a stable version. Before I was even born (o.k., not that far, but before I did anything about a week or two ago). I would then remove my local RDFLib repo and download again from the server to create a 100% clean situation.
I would do the update again, but looking first at init to find out where the changes are and use a branch off the master branch. Essentially redo the work that I did a few weeks ago.

This requires more work but that if fine. I made the mess, so I believe it is my job to clean this up somehow. It will take some more time, but there is no rush, after all. Note that I cannot really check the python3 version and the issues around that, but I hope this would work out nevertheless.

WDYT? Is this a possibility?

And thanks for all your help

Ivan

On 16 Dec 2014, at 11:08 , Jörn Hees notifications@github.com wrote:

so i guess something in this change (no matter based on which branch) isn't compatible with the tests introduced for #375. I still think that the two lines that i restored in #445 are part of the problem but seems to be something else still.

I'd suggest to just continue developing on this branch (after hard resetting your local one to it). If you push to it this Pull Request will update and automatically re-run the tests.

In case you want to try yourself you can still make an own Pull Request from your original change based on the six_2to3 branch, but it should be identical to #444.

—
Reply to this email directly or view it on GitHub.

Ivan Herman
Bankrashof 108
1183NW Amstelveen, The Netherlands
http://www.ivan-herman.net

joernhees · 2014-12-16T16:31:17Z

sorry for the late reply, got some urgent stuff coming in :-/

First of all: you didn't do anything wrong, i was just surprised that you based something on six_2to3 which is pretty much unstable.

The thing is: only master is meant to be stable. So everything else in branches is meant to be "in development" and unstable. This means we can just try out things and even push them online for others to see and collaborate, but the underlying agreement is that none of that is stable until merged into master (or some other stable branch(es) in other projects). Hence, what you and i did here was just that.
Another thing is that git has a garbage collection which will take care of commits that aren't referenced by anything anymore (so to say if you lose the pointers it forgets them after some time). When discussing about them in github i'm actually quite sure it will remember them in the discussions for consistency reasons, but from a development perspective later fetches won't fetch them anymore.

Let me answer the rest inline:

They way I updated the implementation last week was to update the separate version first and test it with all the microdata tests. The changes are on my machine, not on the repo yet; I want to do push it when the new microdata->RDF document is published.

Are the microdata tests in rdflib as well? Cause they should be... you essentially create a branch, push some changes, making them available online... then you can immediately create the pull request and describe what that branch is for. This has the nice side-effect that travis will notice and run the tests for all supported environments. So if you have more tests, they should be run inside rdflib as well.

To make the update on RDFLib, what I intended to do is to, essentially, copy the content of that original package into RDFLib (under pyMicrodata) and make the update of the interface. Essentially what I did in the past, thus. I did two mistakes, however

I messed up and created a branch from the six... branch instead of master. Huge stupidity on my part, that is where the mess started.

I did not realize or, more exactly, I did not remember that you (or somebody) made some changes on the pyMicrodata/init.py, ie, I blindly changed the init.py, re-introducing some old issues.

Actually that's one of the nice things about git... it can deal with this & support cleaning it up, as it's fairly common in "multiplayer development" (see below).

So... isn't it simpler if

you, somehow, using some github wizardry, roll back. Meaning removing (and really removing from the surface of the Earth!) everything I did, I to get back to a stable version. Before I was even born (o.k., not that far, but before I did anything about a week or two ago). I would then remove my local RDFLib repo and download again from the server to create a 100% clean situation.

I would do the update again, but looking first at init to find out where the changes are and use a branch off the master branch. Essentially redo the work that I did a few weeks ago.

Hmm, as described above: github will probably keep the commits somewhere as we discussed them, but they won't turn up in master unless someone merges them and so from a "release" point of view we didn't do anything yet. I'm keeping the microdata-to-rdf-second-edition-bak branch around at the moment so we can still compare / see it, but it's just a pointer and when we dealt with this i'll just remove it and it's gone.

What i actually already did with this branch (microdata-to-rdf-second-edition) was: i rebased your changes on top of master. What this means is that i actually already did the second point you mentioned with the help of git rebase. So i took the changes you made, subtracted the ones from (master to six_2to3) from it and then put that on top of master. Then i updated the online branch microdata-to-rdf-second-edition to that new state (which you probably don't have locally yet). You can see the result in this Pull Request when you click on "Files changed" on top of this page ( https://github.com/RDFLib/rdflib/pull/443/files ).

I'd suggest to watch this yourself: do a git fetch --all (this will just get you all information from all remotes, but won't change anything locally).
Then view the current state like this: git log --graph --oneline --all --decorate.
You should see some tree structure. You're currently at HEAD. There probably is a origin/microdata-to-rdf-second-edition-bak (your original state based on origin/six_2to3). There should also be a origin/master and above that a origin/microdata-to-rdf-second-edition (that's this pull request).

Now if your local microdata-to-rdf-second-edition isn't where origin/microdata-to-rdf-second-edition is, then you need to run this to get it where it belongs:
git checkout microdata-to-rdf-second-edition && git reset --hard origin/microdata-to-rdf-second-edition.

Thing is: this is now as if you developed from master, but it seems it fails some of the tests in rdflib, so i wouldn't merge this back into master, as it breaks the one stable branch we have. I already tried a simple way to fix this in #445 (by just reverting the two changes i commented on before here: joernhees@47e0416 ), but as you can see in #445 the error remains.

So i guess what you could do is investigate why the tests fail more thoroughly from the current state of microdata-to-rdf-second-edition aka this page aka #443 ... they have something to do with #375 and #406 / #403, which is why i tried reverting the two lines in question in #445, but seems your change to pyMicrodata did some other stuff that the test for #375 don't like. On all environments by the way, not only on py3.
If you think you found the error you can just test locally, then commit to the microdata-to-rdf-second-edition branch and wait for travis to check it again in #443 (you'll need to go to the github page though, as you won't see the build status via email).

This requires more work but that if fine. I made the mess, so I believe it is my job to clean this up somehow. It will take some more time, but there is no rush, after all. Note that I cannot really check the python3 version and the issues around that, but I hope this would work out nevertheless.

Actually it wasn't too much work, just some "git magic" which was understandably a bit confusing. I hope this answer clears it up a bit ;)

j

iherman · 2014-12-17T08:01:56Z

Jörn,

I will look at this, although I cannot tell you when that will happen.

One answer to a side-issue:

Are the microdata tests in rdflib as well? Cause they should be... you essentially create a branch, push some changes, making them available online... then you can immediately create the pull request and describe what that branch is for. This has the nice side-effect that travis will notice and run the tests for all supported environments. So if you have more tests, they should be run inside rdflib as well.

They are not, but... what I referred to are the tests part of the microdata-rdf repository:

https://github.com/w3c/microdata-rdf/tree/gh-pages/tests

I am not familiar with the test harness approach in RDFLib (never used that) but, in any case, that repo contains the 'official' set of tests that accompany the conversion specification. I do not think it is possible, or even a good idea, to move them to the RDFLib repo.

Cheers

Ivan

On 16 Dec 2014, at 17:31 , Jörn Hees notifications@github.com wrote:

sorry for the late reply, got some urgent stuff coming in :-/

First of all: you didn't do anything wrong, i was just surprised that you based something on six_2to3 which is pretty much unstable.

The thing is: only master is meant to be stable. So everything else in branches is meant to be "in development" and unstable. This means we can just try out things and even push them online for others to see and collaborate, but the underlying agreement is that none of that is stable until merged into master (or some other stable branch(es) in other projects). Hence, what you and i did here was just that.
Another thing is that git has a garbage collection which will take care of commits that aren't referenced by anything anymore (so to say if you lose the pointers it forgets them after some time). When discussing about them in github i'm actually quite sure it will remember them in the discussions for consistency reasons, but from a development perspective later fetches won't fetch them anymore.

Let me answer the rest inline:

They way I updated the implementation last week was to update the separate version first and test it with all the microdata tests. The changes are on my machine, not on the repo yet; I want to do push it when the new microdata->RDF document is published.

Are the microdata tests in rdflib as well? Cause they should be... you essentially create a branch, push some changes, making them available online... then you can immediately create the pull request and describe what that branch is for. This has the nice side-effect that travis will notice and run the tests for all supported environments. So if you have more tests, they should be run inside rdflib as well.

To make the update on RDFLib, what I intended to do is to, essentially, copy the content of that original package into RDFLib (under pyMicrodata) and make the update of the interface. Essentially what I did in the past, thus. I did two mistakes, however

I messed up and created a branch from the six... branch instead of master. Huge stupidity on my part, that is where the mess started.

I did not realize or, more exactly, I did not remember that you (or somebody) made some changes on the pyMicrodata/init.py, ie, I blindly changed the init.py, re-introducing some old issues.

Actually that's one of the nice things about git... it can deal with this & support cleaning it up, as it's fairly common in "multiplayer development" (see below).

So... isn't it simpler if

you, somehow, using some github wizardry, roll back. Meaning removing (and really removing from the surface of the Earth!) everything I did, I to get back to a stable version. Before I was even born (o.k., not that far, but before I did anything about a week or two ago). I would then remove my local RDFLib repo and download again from the server to create a 100% clean situation.

I would do the update again, but looking first at init to find out where the changes are and use a branch off the master branch. Essentially redo the work that I did a few weeks ago.

Hmm, as described above: github will probably keep the commits somewhere as we discussed them, but they won't turn up in master unless someone merges them and so from a "release" point of view we didn't do anything yet. I'm keeping the microdata-to-rdf-second-edition-bak branch around at the moment so we can still compare / see it, but it's just a pointer and when we dealt with this i'll just remove it and it's gone.

What i actually already did with this branch (microdata-to-rdf-second-edition) was: i rebased your changes on top of master. What this means is that i actually already did the second point you mentioned with the help of git rebase. So i took the changes you made, subtracted the ones from (master to six_2to3) from it and then put that on top of master. Then i updated the online branch microdata-to-rdf-second-edition to that new state (which you probably don't have locally yet). You can see the result in this Pull Request when you click on "Files changed" on top of this page ( https://github.com/RDFLib/rdflib/pull/443/files ).

I'd suggest to watch this yourself: do a git fetch --all (this will just get you all information from all remotes, but won't change anything locally).
Then view the current state like this: git log --graph --oneline --all --decorate.
You should see some tree structure. You're currently at HEAD. There probably is a origin/microdata-to-rdf-second-edition-bak (your original state based on origin/six_2to3). There should also be a origin/master and above that a origin/microdata-to-rdf-second-edition (that's this pull request).

Now if your local microdata-to-rdf-second-edition isn't where origin/microdata-to-rdf-second-edition is, then you need to run this to get it where it belongs:
git checkout microdata-to-rdf-second-edition && git reset --hard origin/microdata-to-rdf-second-edition.

Thing is: this is now as if you developed from master, but it seems it fails some of the tests in rdflib, so i wouldn't merge this back into master, as it breaks the one stable branch we have. I already tried a simple way to fix this in #445 (by just reverting the two changes i commented on before here: joernhees@47e0416 ), but as you can see in #445 the error remains.

So i guess what you could do is investigate why the tests fail more thoroughly from the current state of microdata-to-rdf-second-edition aka this page aka #443 ... they have something to do with #375 and #406 / #403, which is why i tried reverting the two lines in question in #445, but seems your change to pyMicrodata did some other stuff that the test for #375 don't like. On all environments by the way, not only on py3.
If you think you found the error you can just test locally, then commit to the microdata-to-rdf-second-edition branch and wait for travis to check it again in #443 (you'll need to go to the github page though, as you won't see the build status via email).

This requires more work but that if fine. I made the mess, so I believe it is my job to clean this up somehow. It will take some more time, but there is no rush, after all. Note that I cannot really check the python3 version and the issues around that, but I hope this would work out nevertheless.

Actually it wasn't too much work, just some "git magic" which was understandably a bit confusing. I hope this answer clears it up a bit ;)

j
—
Reply to this email directly or view it on GitHub.

Ivan Herman
Bankrashof 108
1183NW Amstelveen, The Netherlands
http://www.ivan-herman.net

…ct (master) base. The previous attempt went wrong because I started with a wrong branch:-( Jörn rebased it, and I re-did the __init__.py file from scratch.

iherman · 2014-12-18T15:15:23Z

So...

I did what I planned, with your help; I took the init file from the master branch, redid the necessary changes, and pushed back through the microdata-to-rdf-second-edition. The version runs on my machine with all the official microdata-rdf tests.

Let us hope that will work out now...

Ivan

On 16 Dec 2014, at 17:31 , Jörn Hees notifications@github.com wrote:

sorry for the late reply, got some urgent stuff coming in :-/

First of all: you didn't do anything wrong, i was just surprised that you based something on six_2to3 which is pretty much unstable.

The thing is: only master is meant to be stable. So everything else in branches is meant to be "in development" and unstable. This means we can just try out things and even push them online for others to see and collaborate, but the underlying agreement is that none of that is stable until merged into master (or some other stable branch(es) in other projects). Hence, what you and i did here was just that.
Another thing is that git has a garbage collection which will take care of commits that aren't referenced by anything anymore (so to say if you lose the pointers it forgets them after some time). When discussing about them in github i'm actually quite sure it will remember them in the discussions for consistency reasons, but from a development perspective later fetches won't fetch them anymore.

Let me answer the rest inline:

They way I updated the implementation last week was to update the separate version first and test it with all the microdata tests. The changes are on my machine, not on the repo yet; I want to do push it when the new microdata->RDF document is published.

Are the microdata tests in rdflib as well? Cause they should be... you essentially create a branch, push some changes, making them available online... then you can immediately create the pull request and describe what that branch is for. This has the nice side-effect that travis will notice and run the tests for all supported environments. So if you have more tests, they should be run inside rdflib as well.

To make the update on RDFLib, what I intended to do is to, essentially, copy the content of that original package into RDFLib (under pyMicrodata) and make the update of the interface. Essentially what I did in the past, thus. I did two mistakes, however

I messed up and created a branch from the six... branch instead of master. Huge stupidity on my part, that is where the mess started.

I did not realize or, more exactly, I did not remember that you (or somebody) made some changes on the pyMicrodata/init.py, ie, I blindly changed the init.py, re-introducing some old issues.

Actually that's one of the nice things about git... it can deal with this & support cleaning it up, as it's fairly common in "multiplayer development" (see below).

So... isn't it simpler if

you, somehow, using some github wizardry, roll back. Meaning removing (and really removing from the surface of the Earth!) everything I did, I to get back to a stable version. Before I was even born (o.k., not that far, but before I did anything about a week or two ago). I would then remove my local RDFLib repo and download again from the server to create a 100% clean situation.

I would do the update again, but looking first at init to find out where the changes are and use a branch off the master branch. Essentially redo the work that I did a few weeks ago.

Hmm, as described above: github will probably keep the commits somewhere as we discussed them, but they won't turn up in master unless someone merges them and so from a "release" point of view we didn't do anything yet. I'm keeping the microdata-to-rdf-second-edition-bak branch around at the moment so we can still compare / see it, but it's just a pointer and when we dealt with this i'll just remove it and it's gone.

What i actually already did with this branch (microdata-to-rdf-second-edition) was: i rebased your changes on top of master. What this means is that i actually already did the second point you mentioned with the help of git rebase. So i took the changes you made, subtracted the ones from (master to six_2to3) from it and then put that on top of master. Then i updated the online branch microdata-to-rdf-second-edition to that new state (which you probably don't have locally yet). You can see the result in this Pull Request when you click on "Files changed" on top of this page ( https://github.com/RDFLib/rdflib/pull/443/files ).

I'd suggest to watch this yourself: do a git fetch --all (this will just get you all information from all remotes, but won't change anything locally).
Then view the current state like this: git log --graph --oneline --all --decorate.
You should see some tree structure. You're currently at HEAD. There probably is a origin/microdata-to-rdf-second-edition-bak (your original state based on origin/six_2to3). There should also be a origin/master and above that a origin/microdata-to-rdf-second-edition (that's this pull request).

Now if your local microdata-to-rdf-second-edition isn't where origin/microdata-to-rdf-second-edition is, then you need to run this to get it where it belongs:
git checkout microdata-to-rdf-second-edition && git reset --hard origin/microdata-to-rdf-second-edition.

Thing is: this is now as if you developed from master, but it seems it fails some of the tests in rdflib, so i wouldn't merge this back into master, as it breaks the one stable branch we have. I already tried a simple way to fix this in #445 (by just reverting the two changes i commented on before here: joernhees@47e0416 ), but as you can see in #445 the error remains.

So i guess what you could do is investigate why the tests fail more thoroughly from the current state of microdata-to-rdf-second-edition aka this page aka #443 ... they have something to do with #375 and #406 / #403, which is why i tried reverting the two lines in question in #445, but seems your change to pyMicrodata did some other stuff that the test for #375 don't like. On all environments by the way, not only on py3.
If you think you found the error you can just test locally, then commit to the microdata-to-rdf-second-edition branch and wait for travis to check it again in #443 (you'll need to go to the github page though, as you won't see the build status via email).

This requires more work but that if fine. I made the mess, so I believe it is my job to clean this up somehow. It will take some more time, but there is no rush, after all. Note that I cannot really check the python3 version and the issues around that, but I hope this would work out nevertheless.

Actually it wasn't too much work, just some "git magic" which was understandably a bit confusing. I hope this answer clears it up a bit ;)

j
—
Reply to this email directly or view it on GitHub.

Ivan Herman
Bankrashof 108
1183NW Amstelveen, The Netherlands
http://www.ivan-herman.net

joernhees · 2014-12-19T10:04:12Z

cool, but the tests still fail wrt. #375 ... see https://github.com/RDFLib/rdflib/blob/master/test/test_issue375.py

i checked this locally and found out that the expected string differs from the result string only in the preamble.

the test expects:
@prefix dcat: <http://www.w3.org/ns/dcat#> .

but the new version seems to contain:
@prefix cat: <http://www.w3.org/ns/dcat#> .

This is another thing that seems to have been fixed in rdflib but not in the pymicrodata repo... see commits 3182efb , 9321c66 and 45cbc63 .

So is the test right or the new pymicrodata version?

What this seems to show is that we're actually dealing with 2 versions of pymicrodata which got out of sync a while ago. People fixed some bugs in rdflib, but they weren't incorporated back in the standalone pymicrodata... Maybe if you really need to keep the standalone pymicrodata, we should consider removing it from rdflib core, make it a python package and have rdflib depend on it so that pip would automagically install it when rdflib is installed? That way we could maybe defuse this in the future...

Another very weird thing i found when testing this on my machine: even on master this fails for me now!
Is it possible there is some cache or some online dependency here which is out of sync now?

iherman · 2014-12-19T10:44:11Z

Jörn,

thanks. But...

On 19 Dec 2014, at 11:04 , Jörn Hees notifications@github.com wrote:

cool, but the tests still fail wrt. #375 ... see https://github.com/RDFLib/rdflib/blob/master/test/test_issue375.py

i checked this locally and found out that the expected string differs from the result string only in the preamble.

the test expects:
@Prefix dcat: http://www.w3.org/ns/dcat# .

but the new version seems to contain:
@Prefix cat: http://www.w3.org/ns/dcat# .

I am not even sure where it comes from.

First of all, from an RDF point of view, the difference is meaningless. If you look at the relevant part of test_issue375, it has loads of @Prefix statements in the test that are not used in the required RDF code, ie, whether it is cat or dcat is unimportant; actually, neither of that is used. (I do not know how the test harness works, but it should not be textual comparison anyway...)

I also do not know where the current 'cat' version comes from. When I run the test on my machine, I get a much smaller set of @Prefixes without that one.

What has changed, though, is as follows. In the mdata generation I do add a number of namespaces with prefixes, so that, if there is a serialization, the result would look better. In the previous version I used all the prefixes from the RDFa initial context which is, actually, encoded in the pyRdfa part of the distribution (pyRdfa/initialcontext.py) and that the old mdata code simply reused the list there. However, this is not in line with the current microdata practice which uses only a few namespaces, if at all, beyond the core schema.org. I have therefore removed the generation of those namespaces, and reduced the other namespaces as well.

With all that: I have no idea where that prefix setting comes from. There is no prefix setting in the microdata code for dcat; there is one in the pyRdfa part using 'dcat' (though this is not used in the mdata code).

This is another thing that seems to have been fixed in rdflib but not in the pymicrodata repo... see commits 3182efb , 9321c66 and 45cbc63 .

These are all around the same issue, and are all referring to pyRdfa. That one uses dcat, I checked both the master branch and this microdata-to-rdf-second-version one. I must admit I have no idea where this 'cat' comes from (I did not author those tests, and I do not even know how they exactly work).

So is the test right or the new pymicrodata version?

I believe my code is correct. There seems to be some testing issue.

What this seems to show is that we're actually dealing with 2 versions of pymicrodata which got out of sync a while ago. People fixed some bugs in rdflib, but they weren't incorporated back in the standalone pymicrodata... Maybe if you really need to keep the standalone pymicrodata, we should consider removing it from rdflib core, make it a python package and have rdflib depend on it so that pip would automagically install it when rdflib is installed? That way we could maybe defuse this in the future...

This is the first time this happened... (and it is clearly my fault). And I would prefer to keep it as it is now. Not everybody use pip, that would create problem for those who are already relying on this. What I hope for is that, soon, I can get things changed on the W3C site and such things would not happen again. (Do you know how often things change? Ie, if there is a change in the RDFLib repo, how much time does it take to get into the latest release?)

Another very weird thing i found when testing this on my machine: even on master this fails for me now!
Is it possible there is some cache or some online dependency here which is out of sync now?

I really do not know. The initial context, ie, the usage of 'dcat' as a prefix, is in sync in all the versions I have and, in fact, with the 'official' set

http://www.w3.org/2011/rdfa-context/rdfa-1.1

I do add new entries to the pyRdfa code when the official list is extended, and I did that in the past. I must admit I was a little bit careless on one front: I simply updated that on the main branch, because it is easy to see if it is correct or not (I should probably do it via pull requests in future). But the last change occurred quite a while ago (almost a year ago, if I see the CVS log on the w3c page).

:-(

Thanks!

Ivan

—
Reply to this email directly or view it on GitHub.

Ivan Herman
Bankrashof 108
1183NW Amstelveen, The Netherlands
http://www.ivan-herman.net

dbs · 2014-12-19T12:33:04Z

We have at least three srparate issues here.

git submodules would be another option for keeping pyRdfa in a separate repository without running into synchronisation issues, if we really need to keep them separate. The last time I tried to use the standalone pyRdfa module, however, it was effectively useless without RDFLib. I'm skeptical that keeping it separate serves a real practical purpose.

As for test 375, it uses a comparison of the text output because it's trying to ensure that rdfpipe is, in fact, working, and as far as I can tell checking the text output is the only way to do that. The output should be consistent between runs, right? If the expected output changes because we decided to change the default contexts, then the expected output should be changed.

That said, it's possible that there is a problem with the test in how it invokes rdfpipe as a subprocess. Perhaps it is picking up on an installed version of RDFLib rather than the libraries in the repo.

iherman · 2014-12-19T13:26:17Z

On 19 Dec 2014, at 13:33 , Dan Scott notifications@github.com wrote:

We have at least three srparate issues here.

git submodules would be another option for keeping pyRdfa in a separate repository without running into synchronisation issues, if we really need to keep them separate. The last time I tried to use the standalone pyRdfa module, however, it was effectively useless without RDFLib. I'm skeptical that keeping it separate serves a real practical purpose.

Indeed. Both modules (RDFa and mdata) were originally developed on top of RDFLib; then, at some point in the past, I was convinced (I do not remember who, I must admit) that it would be better to add these to the core distribution, which I did. Note that his was quite some time ago (about two years). Going backwards now would really be an issue. Let us try to do without this.

As for test 375, it uses a comparison of the text output because it's trying to ensure that rdfpipe is, in fact, working, and as far as I can tell checking the text output is the only way to do that. The output should be consistent between runs, right?

Between runs: yes. But if something changes in one of the modules, then it should be updated. Which is where the problem comes; I did not contribute those tests, I am not even sure how they exactly work:-(

If the expected output changes because we decided to change the default contexts, then the expected output should be changed.

That said, it's possible that there is a problem with the test in how it invokes rdfpipe as a subprocess. Perhaps it is picking up on an installed version of RDFLib rather than the libraries in the repo.

That I do not know:-(

Thanks

Ivan

—
Reply to this email directly or view it on GitHub.

Ivan Herman
Bankrashof 108
1183NW Amstelveen, The Netherlands
http://www.ivan-herman.net

joernhees · 2016-01-25T10:17:47Z

Going through this again, the remaining problem is that rdflib and pymicrodata diverged when some issue was fixed in rdflib but not in pymicrodata. Then you made a large update to pymicrodata and that update now seems to break one of the rdflib tests (which was introduced after the fixes).

I think we should definitely do two things:

backport the fix from rdflib to pymicrodata if necessary, get both versions in sync
inspect what's wrong in the failing test (and maybe change it)

Wrt. to 2 you already pointed out that the test might not be the best, which i agree to... it is just a syntactic test for semantic data. The benefit is that it notified us of a problem. Part of this problem seems to have been that some default prefixes changed, which could be desired or not. If you say it's ok, it's not much work to fix the test.

Another thing (and where we definitely started to talk about too many different things at once) is how to make sure we won't have such a problem again. Let's put that decision into #582.

iherman · 2016-01-25T11:03:40Z

Jörn,

as I said in my original reply: I really have no time to deal with this issue now. Is it something that you can take care of?

Ivan

On 25 Jan 2016, at 11:17, Jörn Hees notifications@github.com wrote:

Going through this again, the remaining problem is that rdflib and pymicrodata diverged when some issue was fixed in rdflib but not in pymicrodata. Then you made a large update to pymicrodata and that update now seems to break one of the rdflib tests (which was introduced after the fixes).

I think we should definitely do two things:

backport the fix from rdflib to pymicrodata if necessary, get both versions in sync
inspect what's wrong in the failing test (and maybe change it)
Wrt. to 2 you already pointed out that the test might not be the best, which i agree to... it is just a syntactic test for semantic data. The benefit is that it notified us of a problem. Part of this problem seems to have been that some default prefixes changed, which could be desired or not. If you say it's ok, it's not much work to fix the test.

Another thing (and where we definitely started to talk about too many different things at once) is how to make sure we won't have such a problem again. Let's put that decision into #582 #582.

joernhees · 2016-01-25T18:33:13Z

short on time myself, but i'll look into this

joernhees · 2016-01-27T20:45:28Z

OK, having looked into this for a while yesterday and today, this is where i'm at:

pyMicrodata contains @iherman's latest version
there are the following commits that touch plugins/parsers/pyMicrodata/* after commits by @iherman only and i've checked they are in pyMicrodata already:
- fade112 - Fix local microdata parsing in Python3 as well (also see Avoid bytes vs. str error in Python3 #377) @dbs
- d77e3a4 - add file:// to base if it's a filename (also see microdata fix: add file:// to base if it's a filename #406) @gromgull
- 0529f0d - Microdata parser: The handling of the and elements were missing @iherman
- 227b41f - cleaned up trailing whitespace @gromgull
  - this is a whitespace only commit, which i'll do anyhow, see backports from rdflib pymicrodata#4
- 4b44585 - Avoid class reference to imported function @dbs
  - backport in backports from rdflib pymicrodata#4

As the changes are backported already it seems, i'll next copy all the new version back into rdflib and make a PR to see how that goes with our tests.

Finally, when that's figured out, i'll close this and we'll think about which way to resolve this massive timesink once and for all in #582.

…587, #443, #444, #445 * microdata-to-rdf-third-edition: some whitespace cleanup updated microdata test for #375 wrt. #587 (microdata-rdf 2014) modified microdata test for #375 wrt. isomorphism and better output updated MicrodataParser to reflect that pyMicrodata no longer has vocab expansion and cache args syncing changes from pyMicrodata rev c760db0e77c13c4e80fdef675f3c65497f8d08bf

gromgull · 2018-10-27T20:35:43Z

See #828

Microdata parser: updated the parser to the latest version of the mic…

2b36180

…rodata->rdf note (published in December 2014)

joernhees referenced this pull request Dec 15, 2014

Microdata parser: updated the parser to the latest version of the mic…

b082c48

…rodata->rdf note (published in December 2014)

joernhees mentioned this pull request Dec 15, 2014

Microdata to rdf second edition bak #444

Closed

joernhees reviewed Dec 15, 2014
View reviewed changes

joernhees mentioned this pull request Dec 16, 2014

fixup of #443 #445

Closed

Second attempt to update the new microdata parser, now with the corre…

d1ab445

…ct (master) base. The previous attempt went wrong because I started with a wrong branch:-( Jörn rebased it, and I re-did the __init__.py file from scratch.

iherman mentioned this pull request Jan 25, 2016

Turn off the rdf List generation? RDFLib/pymicrodata#3

Open

joernhees mentioned this pull request Jan 25, 2016

code duplication issue between rdflib and pymicrodata #582

Closed

4 tasks

joernhees added bug Something isn't working enhancement New feature or request parsing Related to a parsing. testing meta Relates primarily to the project and not users of the project. discussion labels Jan 25, 2016

joernhees added this to the rdflib 4.2.2 milestone Jan 25, 2016

joernhees self-assigned this Jan 25, 2016

joernhees mentioned this pull request Jan 27, 2016

syncing changes from pyMicrodata #587

Merged

joernhees modified the milestones: rdflib 5.0.0, rdflib 4.2.2 Jan 28, 2016

joernhees added the fix-in-progress label Jan 28, 2016

gromgull mentioned this pull request May 29, 2017

Added the changes in the initial context #745

Closed

gromgull closed this Oct 27, 2018

gromgull deleted the microdata-to-rdf-second-edition branch October 30, 2018 09:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Microdata parser: updated the parser to the latest version of the microdata->rdf note (published in December 2014) #443

Microdata parser: updated the parser to the latest version of the microdata->rdf note (published in December 2014) #443

joernhees commented Dec 15, 2014

joernhees Dec 15, 2014

joernhees commented Dec 16, 2014

joernhees commented Dec 16, 2014

iherman commented Dec 16, 2014

joernhees commented Dec 16, 2014

iherman commented Dec 17, 2014

iherman commented Dec 18, 2014

joernhees commented Dec 19, 2014

iherman commented Dec 19, 2014

dbs commented Dec 19, 2014

iherman commented Dec 19, 2014

joernhees commented Jan 25, 2016

iherman commented Jan 25, 2016

joernhees commented Jan 25, 2016

joernhees commented Jan 27, 2016

gromgull commented Oct 27, 2018

Microdata parser: updated the parser to the latest version of the microdata->rdf note (published in December 2014) #443

Microdata parser: updated the parser to the latest version of the microdata->rdf note (published in December 2014) #443

Conversation

joernhees commented Dec 15, 2014

joernhees Dec 15, 2014

Choose a reason for hiding this comment

joernhees commented Dec 16, 2014

joernhees commented Dec 16, 2014

iherman commented Dec 16, 2014

joernhees commented Dec 16, 2014

iherman commented Dec 17, 2014

iherman commented Dec 18, 2014

joernhees commented Dec 19, 2014

iherman commented Dec 19, 2014

dbs commented Dec 19, 2014

iherman commented Dec 19, 2014

joernhees commented Jan 25, 2016

iherman commented Jan 25, 2016

joernhees commented Jan 25, 2016

joernhees commented Jan 27, 2016

gromgull commented Oct 27, 2018