Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement indexes discovery for skills and cities #30

Open
lecoqlibre opened this issue Dec 15, 2023 · 6 comments
Open

Implement indexes discovery for skills and cities #30

lecoqlibre opened this issue Dec 15, 2023 · 6 comments
Assignees

Comments

@lecoqlibre
Copy link
Collaborator

Implement new strategies that use an entry point index to discover the indexes.

Ex:

<> a ex:Index.

<#1> a ex:SkillIndexRegistration;
  ex:forSkill ex:skill12;
  ex:instance </path/to/index>.

<#2> a ex:CityIndexRegistration;
  ex:forCity ex:paris;
  ex:instance <path/to/index>.

The strategies will query this kind of index to discover the link to specific indexes (like skill and city indexes).

@lecoqlibre lecoqlibre self-assigned this Dec 15, 2023
@pchampin
Copy link
Collaborator

pchampin commented Dec 19, 2023

NB: this could be made even more generic:

<> a ex:Index.

<#1> a ex:PropertyIndexRegistration;
  ex:forProperty ex:hasSkil ;
  ex:forValue ex:skill12;
  ex:instance </path/to/index>.

<#2> a ex:PropertyIndexRegistration;
  ex:forProperty ex:location ;
  ex:forValue ex:paris;
  ex:instance <path/to/index>.

@pchampin
Copy link
Collaborator

pchampin commented Dec 19, 2023

A few more ideas:

  1. In addition to ex:instance, index entries could use an ex:instancesIn property for index entries, which would not point to the instance IRI directly, but to an RDF resources containing one (or several) matching instances.

  2. rdfs:seeAlso could be used (as it is already in WebID profiles) to suggest that more information about the subject could be found in an additional resource. Useful for delocalize long list of instances in a separate resource.

  3. You could then use the same vocabulary for indexing indexes, and indexing users:

First level

rootIndex.ttl:

<> a ex:Index. # Indexing indexes by their property

<#1> a :exPropertyIndexRegistration;
  ex:forProperty ex:forProperty;
  ex:forValue ex:hasSkill;
  ex:instancesIn <skillIndex.ttl>.

<#2> a :exPropertyIndexRegistration;
  ex:forProperty ex:forProperty;
  ex:forValue ex:location;
  ex:instancesIn <cityIndex.ttl>.

This way, you don't need to load the index entries related to cities when you are only interested in skill indexes.

Second level

skillIndex.ttl:

<> a ex:Index. # Indexing users by their skills

<#1> a :exPropertyIndexRegistration;
  ex:forProperty ex:hasSkill;
  ex:forValue ex:skill1;
  rfds:seeAlso <skill1.ttl>. # because the list of instances may be big
   
<#2> a :exPropertyIndexRegistration;
  ex:forProperty ex:hasSkill;
  ex:forValue ex:skill2;
  rfds:seeAlso <skill2.ttl>. # because the list of instances may be big
# ...

cityIndex.ttl:

<> a ex:Index. # Indexing users by their city

<#1> a :exPropertyIndexRegistration;
  ex:forProperty ex:location;
  ex:forValue ex:toulouse;
  rfds:seeAlso <skill1.ttl>. # because the list of instances may be big
   
<#2> a :exPropertyIndexRegistration;
  ex:forProperty ex:location;
  ex:forValue ex:lyon;
  rfds:seeAlso <skill1.ttl>. # because the list of instances may be big
# ...

Third level

skill1.ttl:

# additional triples about the <skillIndex.ttl#1> entry defined in skillIndex.ttl
<skillIndex.ttl#1> ex:instance
    <https://localhost:8001/users/user1#me>,
    <https://localhost:8002/users/user2#me>,
    #...

skill2.ttl:

# additional triples about the <skillIndex.ttl#1> entry defined in skillIndex.ttl
<skillIndex.ttl#2> ex:instance
    <https://localhost:8002/users/user2#me>,
    <https://localhost:8003/users/user3#me>,
    #...

lyon.ttl:

# additional triples about the <cityIndex.ttl#1> entry defined in cityIndex.ttl
<cityIndex.ttl#1> ex:instance
    <https://localhost:8001/users/user1#me>,
    <https://localhost:8003/users/user3#me>,
    #...

etc...

NB: the third level is not strictly required. ex:instance properties could be included directly in the 2nd level (especially in entries that have only a few values).

@lecoqlibre
Copy link
Collaborator Author

From PA, the SPARQL request, using named graphs, to query according to the proposal:

PREFIX ex: <http://example.org#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/#>

SELECT DISTINCT ?user ?firstName ?lastName ?city ?skills WHERE {
  [] ex:forProperty ex:forProperty ;
      ex:forValue ex:hasSkill ;
      rdfs:seeAlso ?skillIndex.

  GRAPH ?skillIndex {
    ?entry ex:forProperty ex:hasSkill ;
      ex:forValue "${skill}" ;
    rdfs:seeAlso ?skillSubIndex.
  }

  GRAPH ?skillSubIndex {
    ?entry ex:instanceIn ?user.
  }

BIND ( ... ?user ... AS ?userProfile) # remove the trainling fragment

  GRAPH ?userProfile {
    ?user foaf:givenName ?givenName ;
        foaf:familyName ?familyName ;
        ex:city ?city ;
        ex:skill ?skill.
  }
}

@balessan
Copy link
Collaborator

balessan commented Feb 25, 2024

So I forked our Skill package to add the generation of an index whenever you save a skill information, and the content of the Index I generate is looking as follows:

@prefix ns1: <http://cdn.startinblox.com/owl/ttl/vocab.ttl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<http://localhost:8000/skills/1/> a <hd:skill> ;
    rdfs:label "" ;
    ns1:hasSkill <http://localhost:8000/users/balessan/>,
        <http://localhost:8000/users/balessanvir/>,
        <http://localhost:8000/users/benito/> .

<http://localhost:8000/skills/2/> a <hd:skill> ;
    rdfs:label "" ;
    ns1:hasSkill <http://localhost:8000/users/balessan/>,
        <http://localhost:8000/users/benito/> .

What should I change @lecoqlibre ?

For this first version it means that every instance on which the Skill package is installed will provide both an indexes/skills.ttl and an indexes/skills.jsonld files which is updated every time a Skill is saved.

Work branch is here: https://git.startinblox.com/djangoldp-packages/djangoldp-skill/-/merge_requests/19

Thinking about it, it costs nothing to specifically use TTL for the index management if it is more performant. RDFLib does the parsing job in an easy way.

@balessan
Copy link
Collaborator

balessan commented Feb 25, 2024

I did the same for our user profile package so I now have an TTL draft index generated on any profile save action, which looks as follows:

@prefix ns1: <https://cdn.startinblox.com/owl/ttl/vocab.ttl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

ns1:berlin a ns1:Place ;
    ns1:hasMember <http://localhost:8000/users/benito/> .

ns1:chambéry a ns1:Place ;
    ns1:hasMember <http://localhost:8000/users/balessanvir/> .

ns1:evian a ns1:Place ;
    ns1:hasMember <http://localhost:8000/users/admin/> .

<https://cdn.startinblox.com/owl/ttl/vocab.ttl#indexes/cities> a ns1:Index ;
    rdfs:comment "Indexing users by their city" .

ns1:paris a ns1:Place ;
    ns1:hasMember <http://localhost:8000/users/balessan/>,
        <http://localhost:8000/users/balessanter/> .

ns1:venise a ns1:Place ;
    ns1:hasMember <http://localhost:8000/users/benito2/> .

I arbitrary decided to format to lowercase the city name, as this is only a literal in our user profile. Unsure if that index files makes sense though.

Draft MR available here: https://git.startinblox.com/djangoldp-packages/djangoldp-profile/-/merge_requests/22

Ping @lecoqlibre

@lecoqlibre
Copy link
Collaborator Author

lecoqlibre commented Feb 26, 2024

Good news @balessan.

So you generated distributed indexes. In the current state this would give medium/bad performances. One case where these could be used efficiently is when you search something on one particular instance (this is not currently proposed in Hubl). Example: I search users with skill 1, 2 and 3 on the instance 1 only.

To be able to respond in less than a second for the existing use cases we should use federated indexes: indexes that are on the federation instance. We can generated these federated indexes using the notifications coming from distributed instances whenever a user is modified. Another option is to periodically fetch the distributed indexes and create/update the federated index accordingly to what has been fetched.

Another remark: the skill and city indexes you generated are too much bound to your business domain because you are using your domain vocabulary as index predicates. It would be great to make them more generic following the previous comment by PA #30 (comment). We should also split the indexes into smaller indexes, depending on the size of the data.

Ideas:

  • Add a predicate to say if the index is federated or distributed?
  • Add a predicate to say what is indexing the index?

So, on the federation instance you would have one "meta" index:

@prefix ex: <https://example.org#>.
@prefix ns1: <https://cdn.startinblox.com/owl/ttl/vocab.ttl#>.

<> a ex:Index. # Indexing indexes by their property

<#1> a ex:PropertyIndexRegistration;
  ex:forProperty ex:forProperty;
  ex:forValue ns1:hasSkill;
  ex:instancesIn <skillIndex.ttl>.

<#2> a ex:PropertyIndexRegistration;
  ex:forProperty ex:forProperty;
  ex:forValue ns1:hasPlace; # replace hasPlace by the existing predicate in your ontology
  ex:instancesIn <cityIndex.ttl>.

On this same federation instance you would have "meta" skill index:

@prefix ex: <https://example.org#>.
@prefix ns1: <https://cdn.startinblox.com/owl/ttl/vocab.ttl#>.

<> a ex:Index. # Indexing users by their skills

<#1> a ex:PropertyIndexRegistration;
  ex:forProperty ns1:hasSkill;
  ex:forValue ns1:skill1;
  rfds:seeAlso <skill1.ttl>.
   
<#2> a ex:PropertyIndexRegistration;
  ex:forProperty ns1:hasSkill;
  ex:forValue ns1:skill2;
  rfds:seeAlso <skill2.ttl>.
# ...

And also "meta" city index:

@prefix ex: <https://example.org#>.
@prefix ns1: <https://cdn.startinblox.com/owl/ttl/vocab.ttl#>.

<> a ex:Index. # Indexing users by their city

<#1> a :exPropertyIndexRegistration;
  ex:forProperty ns1:hasPlace; # replace hasPlace by the existing predicate in your ontology
  ex:forValue "toulouse";
  rfds:seeAlso <toulouse.ttl>.
   
<#2> a :exPropertyIndexRegistration;
  ex:forProperty ex:hasPlace; # replace hasPlace by the existing predicate in your ontology
  ex:forValue "lyon";
  rfds:seeAlso <lyon.ttl>.
# ...

Then on the same federated instance you would have skill indexes like:

@prefix ex: <https://example.org#>.

<skillIndex.ttl> ex:instance
    <https://localhost:8001/users/user1#me>,
    <https://localhost:8002/users/user2#me>,
    #...

And city indexes like:

@prefix ex: <https://example.org#>.

<lyon.ttl> ex:instance
    <https://localhost:8001/users/user1#me>,
    <https://localhost:8003/users/user3#me>,
    #...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants