Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

review internal table naming #19

Closed
chile12 opened this issue May 17, 2017 · 5 comments
Closed

review internal table naming #19

chile12 opened this issue May 17, 2017 · 5 comments
Assignees

Comments

@chile12
Copy link

@chile12 chile12 commented May 17, 2017

Here:
https://github.com/SANSA-Stack/SANSA-RDF/blob/develop/sansa-rdf-partition-parent/sansa-rdf-partition-sparqlify/src/main/scala/net/sansa_stack/rdf/partition/sparqlify/SparqlifyUtils2.scala#L32

Will trigger an exception when dealing with prefix IRIs not ending with a slash.
I changed it to:

val lastSeparatorIndex = Math.max(pred.lastIndexOf("/"), pred.lastIndexOf("#")) val tableName = pred.substring(lastSeparatorIndex + 1)

But it would probably be even better to settle with some less error prone approach all together.

@Aklakan
Copy link
Member

@Aklakan Aklakan commented Jul 12, 2017

The more general two questions are:

  • What is a nice way to shorten the URIs? For that we could use all prefixes from prefix.cc for starters
  • How can we safely encode URIs for spark and flink (the encoding function itself could be passed as a lambda to make the partitioning code independent from flink/spark). For this point I do not know the answer yet - any input would greatly ease our (my :) ) lives.

@LorenzBuehmann
Copy link
Member

@LorenzBuehmann LorenzBuehmann commented Jul 13, 2017

  1. Use an URL shortener resp. its algorithm maybe?
  2. The most common way should be dictionary encoding or not? Most efficient datatype should be Int resp. an Array[Int] for triples. Simply using state-of-the-art should work. For example, we could use the JENA API in particular its TDB classes. I've some code somewhere in my workspace to reuse the code.

@Aklakan
Copy link
Member

@Aklakan Aklakan commented Jul 28, 2017

We should bundle the table name algo with a copy of known prefixes at http://prefix.cc
This should be a super set of standard prefixes defined by the initial context of RDFa: https://www.w3.org/2011/rdfa-context/rdfa-1.1

@Aklakan
Copy link
Member

@Aklakan Aklakan commented Nov 17, 2017

The current table naming strategy will probably still result in clashes;

The easiest working approach for now could be to just use the full predicates themselves as table names - under the premise that spark and flink support this with proper escaping.

We should add test cases with predicates having (a) special characters, (b) ending in # and / and (c) have the same local name in different namespaces.

@JensLehmann JensLehmann added this to the 0.3 milestone Nov 17, 2017
@Aklakan
Copy link
Member

@Aklakan Aklakan commented Dec 14, 2017

It seems that spark now properly handles escaped table names.
This means, we now just use the whole URI as a table name; so table naming should be no longer a headache; therefore closing this issue.

We could nonetheless add support for a prefix map as future feature.

@Aklakan Aklakan closed this Dec 14, 2017
Aklakan pushed a commit that referenced this issue Oct 6, 2020
Aklakan pushed a commit that referenced this issue Oct 7, 2020
Two issues- scala classe name differ in casing and DBSCAN hard code value
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants