Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

review internal table naming #19

Closed
chile12 opened this Issue May 17, 2017 · 5 comments

Comments

Projects
None yet
4 participants
@chile12
Copy link

chile12 commented May 17, 2017

Here:
https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-partition-parent/sansa-rdf-partition-sparqlify/src/main/scala/net/sansa_stack/rdf/partition/sparqlify/SparqlifyUtils2.scala#L32

Will trigger an exception when dealing with prefix IRIs not ending with a slash.
I changed it to:

val lastSeparatorIndex = Math.max(pred.lastIndexOf("/"), pred.lastIndexOf("#")) val tableName = pred.substring(lastSeparatorIndex + 1)

But it would probably be even better to settle with some less error prone approach all together.

@Aklakan

This comment has been minimized.

Copy link
Member

Aklakan commented Jul 12, 2017

The more general two questions are:

  • What is a nice way to shorten the URIs? For that we could use all prefixes from prefix.cc for starters
  • How can we safely encode URIs for spark and flink (the encoding function itself could be passed as a lambda to make the partitioning code independent from flink/spark). For this point I do not know the answer yet - any input would greatly ease our (my :) ) lives.
@LorenzBuehmann

This comment has been minimized.

Copy link
Member

LorenzBuehmann commented Jul 13, 2017

  1. Use an URL shortener resp. its algorithm maybe?
  2. The most common way should be dictionary encoding or not? Most efficient datatype should be Int resp. an Array[Int] for triples. Simply using state-of-the-art should work. For example, we could use the JENA API in particular its TDB classes. I've some code somewhere in my workspace to reuse the code.
@Aklakan

This comment has been minimized.

Copy link
Member

Aklakan commented Jul 28, 2017

We should bundle the table name algo with a copy of known prefixes at http://prefix.cc
This should be a super set of standard prefixes defined by the initial context of RDFa: https://www.w3.org/2011/rdfa-context/rdfa-1.1

@Aklakan

This comment has been minimized.

Copy link
Member

Aklakan commented Nov 17, 2017

The current table naming strategy will probably still result in clashes;

The easiest working approach for now could be to just use the full predicates themselves as table names - under the premise that spark and flink support this with proper escaping.

We should add test cases with predicates having (a) special characters, (b) ending in # and / and (c) have the same local name in different namespaces.

@JensLehmann JensLehmann added this to the 0.3 milestone Nov 17, 2017

@Aklakan

This comment has been minimized.

Copy link
Member

Aklakan commented Dec 14, 2017

It seems that spark now properly handles escaped table names.
This means, we now just use the whole URI as a table name; so table naming should be no longer a headache; therefore closing this issue.

We could nonetheless add support for a prefix map as future feature.

@Aklakan Aklakan closed this Dec 14, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.