Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle HTTPS for schema.org? #118

Closed
tobiasschweizer opened this issue Jan 31, 2022 · 3 comments
Closed

How to handle HTTPS for schema.org? #118

tobiasschweizer opened this issue Jan 31, 2022 · 3 comments

Comments

@tobiasschweizer
Copy link

Hi there

Using pyshacl for my schema.org resources, I recently encountered a change in the reported errors. Instead of schema I now see schema1 as a prefix. Because I experienced the same behaviour when building the docs from my SHACL shapes using https://github.com/lambdamusic/Ontospy I thought this could be related to https://github.com/RDFLib. This turned out to be the reason since they recently switched from http://schema.org/ to https://schema.org/ (TLS). schema now seems to refer to https://schema.org/ and schema1 to the non TLS version of it.

I am aware that this is not a pyshacl issue. However, I wonder whether http://schema.org/Thing is different from https://schema.org/Thing. In a technical sense it's clearly two different things but I think everyone would agree that they refer to the same concept, as stated here.

Are there any best practices to handle this? Should one switch to HTTPs for schema.org asap?

@tobiasschweizer tobiasschweizer changed the title How to handle HTTPS fro schema.org? How to handle HTTPS for schema.org? Jan 31, 2022
@ashleysommer
Copy link
Collaborator

ashleysommer commented Jan 31, 2022

This turned out to be the reason since they recently switched from http://schema.org/ to https://schema.org/

Correct. It was the change in latest RDFLib v6.1.1 that causes this.

Interesting back story, RDFLib changed to HTTPS version of Schema.org two years ago in the 5.0.0 development cycle but it somehow got changed back to HTTP in the codebase before release. This has recently been fixed (in v6.1) but it didn't affect many people until a different change in 6.1.1 came in effect. There is new behaviour in v6.1.1 to pre-register all known common built-in namespace prefixes into the graph Namespaces dict when a new graph is parsed. That is why the "schema" prefix is now taken up, and your non-HTTPS namespace is added as "schema1".

One thing you could try. After creating the rdflib.Graph() and before running .parse() try to do:

graph.namepsace_manager.bind("schema", "http://schema.org/", override=True, replace=True)

Are there any best practices to handle this?

No, and they're trying to work through that issue over in RDFLib. I don't know how the people at rdflib.js javascript library handle it, it would be good to touch base with them.

Should one switch to HTTPs for schema.org asap?

I would.

@tobiasschweizer
Copy link
Author

Hi @ashleysommer

Thanks a lot for your response and the hint to override the default namespace.

All clear now :-)

@tobiasschweizer
Copy link
Author

Since I've updated pyshacl to v0.20.0 and rdflib to 6.2.0, schema1 is gone and it behaves like it did before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants