Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turtle Parse Error: illegal subject type: literal #306

Open
GeraldGrootRoessink opened this issue Dec 19, 2018 · 12 comments
Open

Turtle Parse Error: illegal subject type: literal #306

GeraldGrootRoessink opened this issue Dec 19, 2018 · 12 comments
Labels

Comments

@GeraldGrootRoessink
Copy link

GeraldGrootRoessink commented Dec 19, 2018

I have this set of triples:

cdm:example
        sh:PropertyShape ;
	sh:name "heeft reden uitschrijving"@nl ;
	sh:nodeKind sh:IRIOrLiteral ;
	sh:path cdm:redenUitschrijvingHR-v01 ;
	sh:class cdm:RedenUitschrijvingHR-v01.1 ;
	sh:datatype xsd:token ;
	sh:minLength 1 ;
	sh:maxLength 70 ;
	sh:maxCount 1 ;
	sh:pattern "[A-Z0-9_]*" ;
.

Easyrdf complains unexpectedly about the sh:class line.
Or am I missing something?

@njh
Copy link
Collaborator

njh commented Jun 4, 2020

I have just tested using this more complete turtle document:

@prefix cdm: <http://publications.europa.eu/ontology/cdm#> .
@prefix sh:   <http://www.w3.org/ns/shacl#> .
@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .

cdm:example
        a sh:PropertyShape ;
	sh:name "heeft reden uitschrijving"@nl ;
	sh:nodeKind sh:IRIOrLiteral ;
	sh:path cdm:redenUitschrijvingHR-v01 ;
	sh:class cdm:RedenUitschrijvingHR-v01.1 ;
	sh:datatype xsd:token ;
	sh:minLength 1 ;
	sh:maxLength 70 ;
	sh:maxCount 1 ;
	sh:pattern "[A-Z0-9_]*" ;
.

It looks like the parser doesn't like the . in cdm:RedenUitschrijvingHR-v01.1.

I will need to check the Turtle grammar and see why it isn't working:
https://www.w3.org/TR/turtle/#sec-grammar-grammar

Thanks for reporting.

@njh njh added the bug label Jun 4, 2020
@billyk18278
Copy link

billyk18278 commented Jun 15, 2022

Hi, any news on that.
URIs that use prefix : and contain . are causing a parsing error
in the example above using
sh:class http://publications.europa.eu/ontology/cdm#RedenUitschrijvingHR-v01.1 ;
is parsed without issue

@zozlak
Copy link
Collaborator

zozlak commented Jun 17, 2022

It goes down to https://www.w3.org/TR/turtle/#grammar-production-PN_LOCAL where we can see that a dot can't be the first nor the last character of the "after the semicolon part of a prefixed name" but is allowed in the middle. This isn't honored by the Turtle::isNameChar() which always treats it as a prefixed name end.

@billyk18278
Copy link

I have posted a specific example in #396
The syntax is valid protege/topbraid/jena work with prefixed classes/properties with dot in the middle of the name.

It is not clear (in https://github.com/easyrdf/easyrdf/blob/main/lib/Parser/Turtle.php#L1305 that parses char by char) how this can be solved. Is it enough to add . (0x2E) in the list of accepted chars in Names?
Do you have any proposal on how to tackle this?

@zozlak
Copy link
Collaborator

zozlak commented Jun 18, 2022

I would say add to the list of accepted ones and on the pname end check if the last character is a dot. If so, remove it from the pname and return it to the input characters queue.

@billyk18278
Copy link

I am not that familiar with the flow of parsing and the functions used.
but i was thinking to say inside the isNameChar()
$next=$this->peek();//not sure whether this gives next char
$onext=ord($next);
and then add the condition to retun true if thi char is dot and next is text. i dont care for prefix:prop.5 cases at all.

||( $c=="." &( $onext >= 0x0300 && $onext <= 0x036F ||
$onext >= 0x203F && $onext <= 0x2040;)

any help appreciated. thanks

@billyk18278
Copy link

I got it working by sending to isNameChar (which is static) also the next character using $this->peek() in each call.
When $c=='.' i check that the next is a latin character, if so i return true.

Here are my changes they seem to work but i am not sure this is a proper fix though since i did not go through the spec.

    public static function isNameChar($c,$cn)
    {
        $o = ord($c);
        $on = ord($cn);
        return
            self::isNameStartChar($c) ||
            $o >= 0x30 && $o <= 0x39 ||     # 0-9
            $c == '-' ||
            $o == 0x00B7 ||
            $o >= 0x0300 && $o <= 0x036F ||
            $o >= 0x203F && $o <= 0x2040 ||
            ($c=='.' && ($on >= 0x40 && $on <= 0x5b ||
            $on >= 0x60 && $on <= 0x7b))
            ;
    }

@billyk18278
Copy link

In order for RdfNamespace::expand to work i had also to add . in the regExpr match list.

L432 } elseif (preg_match('/^(\w+?):([\w-.]+)$/', $shortUri, $matches)) {

shorten works as it is...

@k00ni
Copy link
Contributor

k00ni commented Jul 4, 2022

If you think that is helpful to others, you should open a pull request.

@morindamanik
Copy link

@billyk18278 why your code looking like that? Is that on purpose and we should add that to our work or is that a bug??

@billyk18278
Copy link

@billyk18278 why your code looking like that? Is that on purpose and we should add that to our work or is that a bug??
I do not remember what i did, it was a workaround (maybe even a wrong one).
The issue is what @njh wrote, and since as @zozlak wrote that based on the spec dot is allowed in the middle the name then not being able to parse such triples is a bug or a lack of a feature.

@zozlak
Copy link
Collaborator

zozlak commented Mar 1, 2023

@billyk18278 as far as I can tell the EasyRdf has been abandoned by the @njh. Still he's the only person with rights to merge the code into the main branch and issue new releases. Summing it up there are no chances for a fix, even if it's as simple as you proposed.

This leaves you with two options:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants