New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Max/Avg Per Property measure #65

Closed
LorenzBuehmann opened this Issue Jul 3, 2018 · 0 comments

Comments

Projects
None yet
3 participants
@LorenzBuehmann
Copy link
Member

LorenzBuehmann commented Jul 3, 2018

def MaxPerProperty(triples: RDD[Triple]): (Triple, Int) = {
val max_per_property_def = triples.filter(triple => (triple.getObject.toString().equals(XSD.xint)
| triple.getObject.toString().equals(XSD.xfloat) | triple.getObject.toString().equals(XSD.dateTime)))
val properties_fr = max_per_property_def.map(f => (f, 1)).reduceByKey(_ + _)
val ordered = properties_fr.takeOrdered(1)(Ordering[Int].reverse.on(_._2))
ordered.maxBy(_._2)
}

does not what it should do:

  1. the filter does filter for triples with an object having a URI xsd:int ,etc. - this is clearly wrong, it has to be filtered by the datatype of objects being a literal
  2. right now, it looks more like computing a histogram, but still not on the property but on a whole triple. This in fact is always 1 for each triple in a single RDF graph
  3. result of ordering is retrieved to driver, thus, no RDD anymore?
  4. takeOrdered(1) returns exactly 1 element from the RDD w.r.t. the ordering, thus, you'll get only one pair with the highest value as second element among all pairs independently of the property
  5. ordering is hard-coded for Int only, thus, only xsd:int would be covered - what about xsd:float and xsd:dateTime values?
  6. (Triple, Int) is returned, but it should be (Node, Scalatype_of_Literal) - how do you want to do this generic? I guess we should return RDD[(Node, Node)]

the same holds for Avg Per Property measure.

In addition, what would be the avg. of some xsd:dateTime values?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment