New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JENA-1313: compare using a Collator when both literals are tagged with same language #237

Merged
merged 5 commits into from Jun 13, 2017

Conversation

Projects
None yet
6 participants
@kinow
Member

kinow commented Apr 12, 2017

Mimics the behaviour of Dydra described here.

When there are strings with the same language, then instead of simply comparing the text, it uses java.text.Collator and the language locale to compare strings.

This does not create a collate:collate function as described in JENA-1313 as a possible solution, but could be still useful for users that expect the sort results to follow the values' language collation.

Needs further tests and discussion.

Show outdated Hide outdated jena-arq/src/main/java/org/apache/jena/sparql/expr/NodeValue.java
// same lang tag, handle collation
// TBD: cache locales? cache collators? pre define both/any? a simple in-memory lru-map-cache?
Locale desiredLocale = Locale.forLanguageTag(node1.getLiteralLanguage());
Collator collator = Collator.getInstance(desiredLocale);

This comment has been minimized.

@rvesse

rvesse Apr 12, 2017

Member

This needs to be much more defensively coded. Language tags are not constrained to be valid languages so they could be no locale or collation associated with a given tag so we need to make sure that we fall back to the existing behaviour for unknown languages

@rvesse

rvesse Apr 12, 2017

Member

This needs to be much more defensively coded. Language tags are not constrained to be valid languages so they could be no locale or collation associated with a given tag so we need to make sure that we fall back to the existing behaviour for unknown languages

This comment has been minimized.

@ajs6f

ajs6f Apr 12, 2017

Member

Or would we want to collate all literals with the same lang tag together and for those with a lang-specific collator available, subsort according thereto?

@ajs6f

ajs6f Apr 12, 2017

Member

Or would we want to collate all literals with the same lang tag together and for those with a lang-specific collator available, subsort according thereto?

This comment has been minimized.

@afs

afs Apr 12, 2017

Member

Yes.

@afs

afs Apr 12, 2017

Member

Yes.

@afs

This comment has been minimized.

Show comment
Hide comment
@afs

afs Apr 12, 2017

Member

The collator is applied multiple times so if there is a mixed set of lang-tagged-literals this is inconsistent. It only applies to a same-lang,same-lang comparison while it may also be called with same first argument but different, different-lang argument at some other point in the sort.

Earlier email

This runs into the problem of unstable sorts or Java sort crashing out.

We need to switch to sorting by an ordering of (lang, lex), not (lex, lang), not just reorder within same-lang.

Member

afs commented Apr 12, 2017

The collator is applied multiple times so if there is a mixed set of lang-tagged-literals this is inconsistent. It only applies to a same-lang,same-lang comparison while it may also be called with same first argument but different, different-lang argument at some other point in the sort.

Earlier email

This runs into the problem of unstable sorts or Java sort crashing out.

We need to switch to sorting by an ordering of (lang, lex), not (lex, lang), not just reorder within same-lang.

@osma

This comment has been minimized.

Show comment
Hide comment
@osma

osma Apr 13, 2017

Contributor

I agree with the other commenters, the general order should be (lang, lex) to avoid potentially inconsistent ordering. Also the language tag may not match any Locale. We also need to have unit tests that verify that the code works in corner cases like this.

But what about subtags like en-US and en-GB? If the language tag is the primary sort key, then all en-GB values would sort before "a"@en-US, which I think would be confusing for most users.
The sort order and collation locale could be based on just the main tag (en in this case) ignoring the subtags, but I'm quite sure there is some language subtag out there in the world that requires a different collation order from that of the main language...

Contributor

osma commented Apr 13, 2017

I agree with the other commenters, the general order should be (lang, lex) to avoid potentially inconsistent ordering. Also the language tag may not match any Locale. We also need to have unit tests that verify that the code works in corner cases like this.

But what about subtags like en-US and en-GB? If the language tag is the primary sort key, then all en-GB values would sort before "a"@en-US, which I think would be confusing for most users.
The sort order and collation locale could be based on just the main tag (en in this case) ignoring the subtags, but I'm quite sure there is some language subtag out there in the world that requires a different collation order from that of the main language...

@kinow

This comment has been minimized.

Show comment
Hide comment
@kinow

kinow Apr 13, 2017

Member

I agree with the other commenters, the general order should be (lang, lex) to avoid potentially inconsistent ordering.

Ack, that makes sense +1

Also the language tag may not match any Locale. We also need to have unit tests that verify that the code works in corner cases like this.

Sure, tests and more defensive programming will come later. Right now looking more for comments on how to sort, where to sort, etc.

Besides typos/mispellings, there are also valid tags such as i-klingon (I believe this is mentioned in some specification linked in the SPARQL spec page). For cases like this I think we would simply try to match against the JVM's available locales, and if not existing, then just use normal string comparison.

But what about subtags like en-US and en-GB? If the language tag is the primary sort key, then all en-GB values would sort before "a"@en-us, which I think would be confusing for most users.
The sort order and collation locale could be based on just the main tag (en in this case) ignoring the subtags, but I'm quite sure there is some language subtag out there in the world that requires a different collation order from that of the main language...

The sort order of accented letters is different for fr-CA and fr-FR.

fr-FR:

  • cote
  • coté
  • côte
  • côté

fr-CA:

  • cote
  • côte
  • coté
  • côté

the general order should be (lang, lex)

So for the examples above, I think the general order would be:

  • cote@fr-CA
  • côte@fr-CA
  • coté@fr-CA
  • côté@fr-CA
  • cote@fr-FR
  • coté@fr-FR
  • côte@fr-FR
  • côté@fr-FR

Right? (lang, lex)...

Member

kinow commented Apr 13, 2017

I agree with the other commenters, the general order should be (lang, lex) to avoid potentially inconsistent ordering.

Ack, that makes sense +1

Also the language tag may not match any Locale. We also need to have unit tests that verify that the code works in corner cases like this.

Sure, tests and more defensive programming will come later. Right now looking more for comments on how to sort, where to sort, etc.

Besides typos/mispellings, there are also valid tags such as i-klingon (I believe this is mentioned in some specification linked in the SPARQL spec page). For cases like this I think we would simply try to match against the JVM's available locales, and if not existing, then just use normal string comparison.

But what about subtags like en-US and en-GB? If the language tag is the primary sort key, then all en-GB values would sort before "a"@en-us, which I think would be confusing for most users.
The sort order and collation locale could be based on just the main tag (en in this case) ignoring the subtags, but I'm quite sure there is some language subtag out there in the world that requires a different collation order from that of the main language...

The sort order of accented letters is different for fr-CA and fr-FR.

fr-FR:

  • cote
  • coté
  • côte
  • côté

fr-CA:

  • cote
  • côte
  • coté
  • côté

the general order should be (lang, lex)

So for the examples above, I think the general order would be:

  • cote@fr-CA
  • côte@fr-CA
  • coté@fr-CA
  • côté@fr-CA
  • cote@fr-FR
  • coté@fr-FR
  • côte@fr-FR
  • côté@fr-FR

Right? (lang, lex)...

@kinow

This comment has been minimized.

Show comment
Hide comment
@kinow

kinow Apr 13, 2017

Member

The collator is applied multiple times so if there is a mixed set of lang-tagged-literals this is inconsistent. It only applies to a same-lang,same-lang comparison while it may also be called with same first argument but different, different-lang argument at some other point in the sort.
(...)
We need to switch to sorting by an ordering of (lang, lex), not (lex, lang), not just reorder within same-lang.

Thanks @afs ! Will try to update the pull request following the comments here so far.

Member

kinow commented Apr 13, 2017

The collator is applied multiple times so if there is a mixed set of lang-tagged-literals this is inconsistent. It only applies to a same-lang,same-lang comparison while it may also be called with same first argument but different, different-lang argument at some other point in the sort.
(...)
We need to switch to sorting by an ordering of (lang, lex), not (lex, lang), not just reorder within same-lang.

Thanks @afs ! Will try to update the pull request following the comments here so far.

@osma

This comment has been minimized.

Show comment
Hide comment
@osma

osma Apr 13, 2017

Contributor

I've been thinking about this, and I can't see how this could produce a usable order when there are several language tags (even subtags) involved. For example, in a multilingual SKOS thesaurus, it's quite likely that there are en-US and en-GB labels mixed together (I know at least a couple of thesauri that do this in their SKOS files). Think about e.g.

ex:zea_mays a skos:Concept ;
  skos:prefLabel "corn"@en-US, "maize"@en-GB .

ex:coffee a skos:Concept ;
  skos:prefLabel "coffee"@en-US, "coffee"@en-GB .

Now an ORDER BY that sorts primarily by language tag, then by language-specific collation rules, would order these skos:prefLabels as:

  1. "coffee"@en-GB
  2. "maize"@en-GB
  3. "coffee"@en-US
  4. "corn"@en-US

I have a hard time seeing how this order would be useful to anyone. These are all English language words; as a user I don't care whether they are sorted by GB or US collation rules (even if they differed, as in fr-CA vs. fr-FR), but this is clearly worse than the current behavior which sorts first by lexical value, then by language tag.

My conclusion of this thought experiment is that there should be a way to specify the collation order in the ORDER BY statement independent of the language tags of the literals being sorted.

Contributor

osma commented Apr 13, 2017

I've been thinking about this, and I can't see how this could produce a usable order when there are several language tags (even subtags) involved. For example, in a multilingual SKOS thesaurus, it's quite likely that there are en-US and en-GB labels mixed together (I know at least a couple of thesauri that do this in their SKOS files). Think about e.g.

ex:zea_mays a skos:Concept ;
  skos:prefLabel "corn"@en-US, "maize"@en-GB .

ex:coffee a skos:Concept ;
  skos:prefLabel "coffee"@en-US, "coffee"@en-GB .

Now an ORDER BY that sorts primarily by language tag, then by language-specific collation rules, would order these skos:prefLabels as:

  1. "coffee"@en-GB
  2. "maize"@en-GB
  3. "coffee"@en-US
  4. "corn"@en-US

I have a hard time seeing how this order would be useful to anyone. These are all English language words; as a user I don't care whether they are sorted by GB or US collation rules (even if they differed, as in fr-CA vs. fr-FR), but this is clearly worse than the current behavior which sorts first by lexical value, then by language tag.

My conclusion of this thought experiment is that there should be a way to specify the collation order in the ORDER BY statement independent of the language tags of the literals being sorted.

@kinow

This comment has been minimized.

Show comment
Hide comment
@kinow

kinow Apr 13, 2017

Member

My conclusion of this thought experiment is that there should be a way to specify the collation order in the ORDER BY statement independent of the language tags of the literals being sorted.

Good points. I believe we want to give users the option to specify the collation and override the language tag then. I think we could, however, still offer this as fall back, in case no collation is specified.

Member

kinow commented Apr 13, 2017

My conclusion of this thought experiment is that there should be a way to specify the collation order in the ORDER BY statement independent of the language tags of the literals being sorted.

Good points. I believe we want to give users the option to specify the collation and override the language tag then. I think we could, however, still offer this as fall back, in case no collation is specified.

@osma

This comment has been minimized.

Show comment
Hide comment
@osma

osma Apr 14, 2017

Contributor

On a second thought, I guess implementing collation like this would still allow forcing a particular collation order at query time using e.g. ORDER BY STRLANG(?label, 'en')
So maybe this is worthwhile after all...

Contributor

osma commented Apr 14, 2017

On a second thought, I guess implementing collation like this would still allow forcing a particular collation order at query time using e.g. ORDER BY STRLANG(?label, 'en')
So maybe this is worthwhile after all...

@kinow

This comment has been minimized.

Show comment
Hide comment
@kinow

kinow Apr 30, 2017

Member

Hmm, messed up the pull request. Let me try to update it correctly...

Member

kinow commented Apr 30, 2017

Hmm, messed up the pull request. Let me try to update it correctly...

@kinow

This comment has been minimized.

Show comment
Hide comment
@kinow

kinow Apr 30, 2017

Member

Sorry about the mess. I reverted the previous changes, and wanted to keep everything in the branch history in case we decided to go back that way, but messed up with a git rebase. Cherry picked a few commits, now it's looking OK.

So now this updated pull request is following a different direction. Instead of changing the default behaviour, based on language tags, it contains a 2-parameters "collation" function. All changes in ARQ.

Please, ignore comments/unit tests/code readability/etc, as what this pull request is right now is a mere suggestion of an alternative for JENA-1313, and may be again discarded in case there are too many problems with this implementation.

The FN_Collation.java contains the code for the new function. The first argument is a locale, used for finding the collator. The second argument to the function is the NodeValue (Expr). What the function does, is quite simple - and possibly naïve?. It extracts the string literal from the Expr part, then creates a new NodeValue that contains both String + locale.

Further down, the NodeValueString was modified as well to keep track of the string locale. Alternatively, we could create a new NodeValue subtype, instead of adding an optional locale (backward binary compatible change, as we add, but not change existing methods).

Then, when the SortCondition in the Query is evaluated, and then the NodeValueString#compare method is called, it checks if it was given a desired locale. If so, it sorts using that locale.

Notice that this function will be applied always in the String Value Space in ARQ, as even when we have a Language Tag, it is discarded and we use only the string. Basically, any node with a literal string will become a NodeValueString, when this function is applied to the node.

With this, users are able to choose a Collation, overriding any language tags. This way, if your data contains @en and @en-GB, you can decide to use any Collation you desire on your query.

Thoughts?

Cheers
Bruno

Member

kinow commented Apr 30, 2017

Sorry about the mess. I reverted the previous changes, and wanted to keep everything in the branch history in case we decided to go back that way, but messed up with a git rebase. Cherry picked a few commits, now it's looking OK.

So now this updated pull request is following a different direction. Instead of changing the default behaviour, based on language tags, it contains a 2-parameters "collation" function. All changes in ARQ.

Please, ignore comments/unit tests/code readability/etc, as what this pull request is right now is a mere suggestion of an alternative for JENA-1313, and may be again discarded in case there are too many problems with this implementation.

The FN_Collation.java contains the code for the new function. The first argument is a locale, used for finding the collator. The second argument to the function is the NodeValue (Expr). What the function does, is quite simple - and possibly naïve?. It extracts the string literal from the Expr part, then creates a new NodeValue that contains both String + locale.

Further down, the NodeValueString was modified as well to keep track of the string locale. Alternatively, we could create a new NodeValue subtype, instead of adding an optional locale (backward binary compatible change, as we add, but not change existing methods).

Then, when the SortCondition in the Query is evaluated, and then the NodeValueString#compare method is called, it checks if it was given a desired locale. If so, it sorts using that locale.

Notice that this function will be applied always in the String Value Space in ARQ, as even when we have a Language Tag, it is discarded and we use only the string. Basically, any node with a literal string will become a NodeValueString, when this function is applied to the node.

With this, users are able to choose a Collation, overriding any language tags. This way, if your data contains @en and @en-GB, you can decide to use any Collation you desire on your query.

Thoughts?

Cheers
Bruno

@kinow

This comment has been minimized.

Show comment
Hide comment
@kinow

kinow Apr 30, 2017

Member

I have a sandbox project here https://github.com/kinow/jena-arq-filter, not really unit tests, but some renamed main methods that I use for experimenting. You can try checking out this pull request, opening in the same workspace both projects, then trying something like this:

        // Query String
        final String queryString = "PREFIX skos: <http://www.w3.org/2004/02/skos/core#>\n" + 
                "PREFIX arq: <http://jena.apache.org/ARQ/function#>\n"  +
                "SELECT ?label WHERE {\n" + 
                "   VALUES ?label { \"tsahurin kieli\"@fi \"tšekin kieli\"@fi \"tulun kieli\"@fi \"töyhtöhyyppä\"@fi }\n" + 
                "}\n" + 
                "ORDER BY arq:collation(\"fi\", ?label)";
        // --- Model
        Model model = ModelFactory.createDefaultModel();
        // Query object
        Query query = QueryFactory.create(queryString);
        // Execute query
        try (QueryExecution qExec = QueryExecutionFactory.create(query, model)) {
            ResultSet results = qExec.execSelect();
            while (results.hasNext()) {
                QuerySolution solution = results.nextSolution();
                System.out.println(solution);
            }
        }

The result will be:

( ?label = "tsahurin kieli"@fi )
( ?label = "tšekin kieli"@fi )
( ?label = "tulun kieli"@fi )
( ?label = "töyhtöhyyppä"@fi )

If you change the locale for "en", then it will be:

( ?label = "töyhtöhyyppä"@fi )
( ?label = "tsahurin kieli"@fi )
( ?label = "tšekin kieli"@fi )
( ?label = "tulun kieli"@fi )
Member

kinow commented Apr 30, 2017

I have a sandbox project here https://github.com/kinow/jena-arq-filter, not really unit tests, but some renamed main methods that I use for experimenting. You can try checking out this pull request, opening in the same workspace both projects, then trying something like this:

        // Query String
        final String queryString = "PREFIX skos: <http://www.w3.org/2004/02/skos/core#>\n" + 
                "PREFIX arq: <http://jena.apache.org/ARQ/function#>\n"  +
                "SELECT ?label WHERE {\n" + 
                "   VALUES ?label { \"tsahurin kieli\"@fi \"tšekin kieli\"@fi \"tulun kieli\"@fi \"töyhtöhyyppä\"@fi }\n" + 
                "}\n" + 
                "ORDER BY arq:collation(\"fi\", ?label)";
        // --- Model
        Model model = ModelFactory.createDefaultModel();
        // Query object
        Query query = QueryFactory.create(queryString);
        // Execute query
        try (QueryExecution qExec = QueryExecutionFactory.create(query, model)) {
            ResultSet results = qExec.execSelect();
            while (results.hasNext()) {
                QuerySolution solution = results.nextSolution();
                System.out.println(solution);
            }
        }

The result will be:

( ?label = "tsahurin kieli"@fi )
( ?label = "tšekin kieli"@fi )
( ?label = "tulun kieli"@fi )
( ?label = "töyhtöhyyppä"@fi )

If you change the locale for "en", then it will be:

( ?label = "töyhtöhyyppä"@fi )
( ?label = "tsahurin kieli"@fi )
( ?label = "tšekin kieli"@fi )
( ?label = "tulun kieli"@fi )
@kinow

This comment has been minimized.

Show comment
Hide comment
@kinow

kinow Apr 30, 2017

Member

Also, note that I left comments that are more questions for this implementation. I wonder if it would be OK to change NodeValueString, or if it would be better to create a new NodeValue. Also the prefix used for the function, etc.

Member

kinow commented Apr 30, 2017

Also, note that I left comments that are more questions for this implementation. I wonder if it would be OK to change NodeValueString, or if it would be better to create a new NodeValue. Also the prefix used for the function, etc.

@osma

This comment has been minimized.

Show comment
Hide comment
@osma

osma May 2, 2017

Contributor

@kinow I think this looks promising! I don't have much comments about the implementation code, but just being able to use ORDER BY arq:collation("fi", ?label) seems that it would do a good job of solving my original problem. I do like the way you have smuggled in the collation information so that it's accessible to the NodeValue.compare function and it can thus rely on Collator.compare, which should be pretty fast. Maybe there's a more elegant way of doing that smuggling, but at least it seems to get the job done based on your example results.

Did you by any chance test this with the performance test case that I wrote up earlier? I'd like to know how it compares to a plain ORDER BY in terms of performance. I can test that myself too when I have a suitable slot of time, but that might take a while since many deadlines are coming up in the next few days...

Contributor

osma commented May 2, 2017

@kinow I think this looks promising! I don't have much comments about the implementation code, but just being able to use ORDER BY arq:collation("fi", ?label) seems that it would do a good job of solving my original problem. I do like the way you have smuggled in the collation information so that it's accessible to the NodeValue.compare function and it can thus rely on Collator.compare, which should be pretty fast. Maybe there's a more elegant way of doing that smuggling, but at least it seems to get the job done based on your example results.

Did you by any chance test this with the performance test case that I wrote up earlier? I'd like to know how it compares to a plain ORDER BY in terms of performance. I can test that myself too when I have a suitable slot of time, but that might take a while since many deadlines are coming up in the next few days...

Show outdated Hide outdated jena-arq/src/main/java/org/apache/jena/sparql/expr/NodeValue.java
@@ -954,7 +971,8 @@ public boolean isDayTimeDuration()
public boolean getBoolean() { raise(new ExprEvalTypeException("Not a boolean: "+this)) ; return false ; }
public String getString() { raise(new ExprEvalTypeException("Not a string: "+this)) ; return null ; }
public String getLang() { raise(new ExprEvalTypeException("Not a string: "+this)) ; return null ; }
public String getCollation() { raise(new ExprEvalTypeException("Not a collation: "+this)) ; return null ; }

This comment has been minimized.

@ajs6f

ajs6f May 2, 2017

Member

Isn't this "Doesn't have a collation: "?

@ajs6f

ajs6f May 2, 2017

Member

Isn't this "Doesn't have a collation: "?

This comment has been minimized.

@kinow

kinow May 2, 2017

Member

Nice catch! I'm updating the code, experimenting with the idea of having a NodeValueSortKey type instead. If that works well, will update the message to "Not a sort key"...

@kinow

kinow May 2, 2017

Member

Nice catch! I'm updating the code, experimenting with the idea of having a NodeValueSortKey type instead. If that works well, will update the message to "Not a sort key"...

@afs

This comment has been minimized.

Show comment
Hide comment
@afs

afs May 2, 2017

Member

What is the outcome of ordering mixed language tags and also plain xsd:string?

Changing NodeValueString looks dubious - NodeValueString is an xsd:string and can be used in expressions involving strings. makeNode() is lossy so round tripping Node<->NodeValue loses the collation.

A NodeValueLangString is much safer (NodeValue.makeNode returns "foo"@fi).

I still think there is merit in ORDER BY arq:collation("fi", ?label) approach. Have the app say what collation it wants. I like this because it means xsd:strings can be sorted by locale collation,even differently for different users.

A possibility is a NodeValueSortKey which does not have a Node form.

Member

afs commented May 2, 2017

What is the outcome of ordering mixed language tags and also plain xsd:string?

Changing NodeValueString looks dubious - NodeValueString is an xsd:string and can be used in expressions involving strings. makeNode() is lossy so round tripping Node<->NodeValue loses the collation.

A NodeValueLangString is much safer (NodeValue.makeNode returns "foo"@fi).

I still think there is merit in ORDER BY arq:collation("fi", ?label) approach. Have the app say what collation it wants. I like this because it means xsd:strings can be sorted by locale collation,even differently for different users.

A possibility is a NodeValueSortKey which does not have a Node form.

@ajs6f

This comment has been minimized.

Show comment
Hide comment
@ajs6f

ajs6f May 2, 2017

Member

@afs I may be (as usual) confused, but isn't ORDER BY arq:collation("fi", ?label) just what @kinow is trying to do?

Member

ajs6f commented May 2, 2017

@afs I may be (as usual) confused, but isn't ORDER BY arq:collation("fi", ?label) just what @kinow is trying to do?

@afs

This comment has been minimized.

Show comment
Hide comment
@afs

afs May 2, 2017

Member

Yes - I was agreeing with the approach of a collation function and it being app-decided not fixed by the nature of the data.

If it can be done without to be hardwired in to NodeValue.compare and overloading themeaning of NodeValueString I think we will be better placed long term.

NodeValueString says "// A plain string, with no language tag, or an xsd:string." and it seems likely this assumption is used elsewhere. I think sneaking in something to NodeValueString may have effects elsewhere as it violates the assumption of NodeValueString (e.g. equality).

NodeValueSortKey, and a new value space VSPACE_SORTKEY, which only is useful in ORDER BY (and makeNode is an exception) means we can say "only for sorting". arq:collation then stops being usable as a general function.

Member

afs commented May 2, 2017

Yes - I was agreeing with the approach of a collation function and it being app-decided not fixed by the nature of the data.

If it can be done without to be hardwired in to NodeValue.compare and overloading themeaning of NodeValueString I think we will be better placed long term.

NodeValueString says "// A plain string, with no language tag, or an xsd:string." and it seems likely this assumption is used elsewhere. I think sneaking in something to NodeValueString may have effects elsewhere as it violates the assumption of NodeValueString (e.g. equality).

NodeValueSortKey, and a new value space VSPACE_SORTKEY, which only is useful in ORDER BY (and makeNode is an exception) means we can say "only for sorting". arq:collation then stops being usable as a general function.

@kinow

This comment has been minimized.

Show comment
Hide comment
@kinow

kinow May 3, 2017

Member

@osma,

Did you by any chance test this with the performance test case that I wrote up earlier? I'd like to know how it compares to a plain ORDER BY in terms of performance. I can test that myself too when I have a suitable slot of time, but that might take a while since many deadlines are coming up in the next few days...

Well remembered. Updated my sandbox to include a JMH test.

Initial version was using the average time. Here are the results.

Result "br.eti.kinoshita.jena.ArqOrderByTest.testOrderByCollation":
  3058822481.830 ±(99.9%) 51737778.554 ns/op [Average]
  (min, avg, max) = (2669383311.000, 3058822481.830, 3841515554.000), stdev = 219060994.044
  CI (99.9%): [3007084703.276, 3110560260.384] (assumes normal distribution)

Result "br.eti.kinoshita.jena.ArqOrderByTest.testOrderByLang":
  3017545546.455 ±(99.9%) 47500397.951 ns/op [Average]
  (min, avg, max) = (2646169688.000, 3017545546.455, 3499258012.000), stdev = 201119659.239
  CI (99.9%): [2970045148.504, 3065045944.406] (assumes normal distribution)


# Run complete. Total time: 00:41:33

Benchmark                            Mode  Cnt           Score          Error  Units
ArqOrderByTest.testOrderByCollation  avgt  200  3058822481.830 ± 51737778.554  ns/op
ArqOrderByTest.testOrderByLang       avgt  200  3017545546.455 ± 47500397.951  ns/op

Then updated it to actually benchmark the throughput.

Result "br.eti.kinoshita.jena.ArqOrderByTest.testOrderByCollation":
  ≈ 10⁻⁹ ops/ns

Result "br.eti.kinoshita.jena.ArqOrderByTest.testOrderByLang":
  ≈ 10⁻⁹ ops/ns


Benchmark                             Mode  Cnt   Score     Error   Units
ArqOrderByTest.testOrderByCollation  thrpt  200  ≈ 10⁻⁹            ops/ns
ArqOrderByTest.testOrderByLang       thrpt  200  ≈ 10⁻⁹            ops/ns

Throughput displays no difference. Average time was about the same for minimum, but average and max displayed a slight increase when using collation. But I think the overhead won't be really noticeable for end users.

Member

kinow commented May 3, 2017

@osma,

Did you by any chance test this with the performance test case that I wrote up earlier? I'd like to know how it compares to a plain ORDER BY in terms of performance. I can test that myself too when I have a suitable slot of time, but that might take a while since many deadlines are coming up in the next few days...

Well remembered. Updated my sandbox to include a JMH test.

Initial version was using the average time. Here are the results.

Result "br.eti.kinoshita.jena.ArqOrderByTest.testOrderByCollation":
  3058822481.830 ±(99.9%) 51737778.554 ns/op [Average]
  (min, avg, max) = (2669383311.000, 3058822481.830, 3841515554.000), stdev = 219060994.044
  CI (99.9%): [3007084703.276, 3110560260.384] (assumes normal distribution)

Result "br.eti.kinoshita.jena.ArqOrderByTest.testOrderByLang":
  3017545546.455 ±(99.9%) 47500397.951 ns/op [Average]
  (min, avg, max) = (2646169688.000, 3017545546.455, 3499258012.000), stdev = 201119659.239
  CI (99.9%): [2970045148.504, 3065045944.406] (assumes normal distribution)


# Run complete. Total time: 00:41:33

Benchmark                            Mode  Cnt           Score          Error  Units
ArqOrderByTest.testOrderByCollation  avgt  200  3058822481.830 ± 51737778.554  ns/op
ArqOrderByTest.testOrderByLang       avgt  200  3017545546.455 ± 47500397.951  ns/op

Then updated it to actually benchmark the throughput.

Result "br.eti.kinoshita.jena.ArqOrderByTest.testOrderByCollation":
  ≈ 10⁻⁹ ops/ns

Result "br.eti.kinoshita.jena.ArqOrderByTest.testOrderByLang":
  ≈ 10⁻⁹ ops/ns


Benchmark                             Mode  Cnt   Score     Error   Units
ArqOrderByTest.testOrderByCollation  thrpt  200  ≈ 10⁻⁹            ops/ns
ArqOrderByTest.testOrderByLang       thrpt  200  ≈ 10⁻⁹            ops/ns

Throughput displays no difference. Average time was about the same for minimum, but average and max displayed a slight increase when using collation. But I think the overhead won't be really noticeable for end users.

@kinow

This comment has been minimized.

Show comment
Hide comment
@kinow

kinow May 3, 2017

Member

@afs,

Changing NodeValueString looks dubious

Agreed. I thought there should be another way Having a new NodeValue type like NodeValueSortKey sounds like a good plan, as well as a new value space type. I'm updating the pull request (99% done in my local working copy), but will spend some time reviewing the code, and writing initial unit tests now :-)

Will update this pull request in the next hours.
Bruno

Member

kinow commented May 3, 2017

@afs,

Changing NodeValueString looks dubious

Agreed. I thought there should be another way Having a new NodeValue type like NodeValueSortKey sounds like a good plan, as well as a new value space type. I'm updating the pull request (99% done in my local working copy), but will spend some time reviewing the code, and writing initial unit tests now :-)

Will update this pull request in the next hours.
Bruno

@kinow

This comment has been minimized.

Show comment
Hide comment
@kinow

kinow May 3, 2017

Member

Pull request updated. Now we have a new NodeValueSortKey type, and a value space for it (VSPACE_SORTKEY). Unit tests added.

When the query contains text annotated with multiple languages, and you use the arq:collation function, everything gets overwritten as a sort key < collation, string >. So say that you have values such as "Casa"@es, "Casa"@pt, "Haus"@de, and "House" in your values.

The function will get the string part (i.e. Casa, Casa, Haus, House), discard anything else, and will sort everything according to the collation that the user provided.

All unit tests passing. Might be interesting testing now in Fuseki, with more elaborated queries. I believe the desired behaviour will work with this change, but would be nice to have others playing a little bit with the function and checking if there are no undesired changes.

Thanks!
Bruno

Member

kinow commented May 3, 2017

Pull request updated. Now we have a new NodeValueSortKey type, and a value space for it (VSPACE_SORTKEY). Unit tests added.

When the query contains text annotated with multiple languages, and you use the arq:collation function, everything gets overwritten as a sort key < collation, string >. So say that you have values such as "Casa"@es, "Casa"@pt, "Haus"@de, and "House" in your values.

The function will get the string part (i.e. Casa, Casa, Haus, House), discard anything else, and will sort everything according to the collation that the user provided.

All unit tests passing. Might be interesting testing now in Fuseki, with more elaborated queries. I believe the desired behaviour will work with this change, but would be nice to have others playing a little bit with the function and checking if there are no undesired changes.

Thanks!
Bruno

@ajs6f

This comment has been minimized.

Show comment
Hide comment
@ajs6f

ajs6f May 3, 2017

Member

It's hard for me to reason about what will happen if the first param is allowed to vary along the bindings... do we care about weirdness like that or is that just such a strange thing to do that we mark it as unsupported?

Member

ajs6f commented May 3, 2017

It's hard for me to reason about what will happen if the first param is allowed to vary along the bindings... do we care about weirdness like that or is that just such a strange thing to do that we mark it as unsupported?

@kinow

This comment has been minimized.

Show comment
Hide comment
@kinow

kinow May 4, 2017

Member

@ajs6f,

The first param, you mean the collation? If we get two NodeValueSortKey with different collation language tags, it will sort the text as normal strings (i.e. ignoring locale specific collation orders, using JVM default behaviour of String.compareTo I think).

Not sure if we could take another action here...

we mark it as unsupported?

I think so, and add to the function documentation (probably somewhere here ?) about the implementation details, risks, considerations, and so on.

Member

kinow commented May 4, 2017

@ajs6f,

The first param, you mean the collation? If we get two NodeValueSortKey with different collation language tags, it will sort the text as normal strings (i.e. ignoring locale specific collation orders, using JVM default behaviour of String.compareTo I think).

Not sure if we could take another action here...

we mark it as unsupported?

I think so, and add to the function documentation (probably somewhere here ?) about the implementation details, risks, considerations, and so on.

@osma

This comment has been minimized.

Show comment
Hide comment
@osma

osma May 4, 2017

Contributor

If we get two NodeValueSortKey with different collation language tags, it will sort the text as normal strings

Doesn't this run into the unstable sort issue that @afs cautioned against?

I think it could be avoided by the following logic: If two NodeValueSortKeys have different collation languages, sort them by the collation languages instead of even looking at the text.

This is the (lang, lex) approach discussed earlier, just applied in a slightly different context.

Contributor

osma commented May 4, 2017

If we get two NodeValueSortKey with different collation language tags, it will sort the text as normal strings

Doesn't this run into the unstable sort issue that @afs cautioned against?

I think it could be avoided by the following logic: If two NodeValueSortKeys have different collation languages, sort them by the collation languages instead of even looking at the text.

This is the (lang, lex) approach discussed earlier, just applied in a slightly different context.

@kinow

This comment has been minimized.

Show comment
Hide comment
@kinow

kinow May 4, 2017

Member

Doesn't this run into the unstable sort issue that @afs cautioned against?

Not sure. I think not because of this approach, but I tried to find if sort could be unstable, and think I found one case.

I think it could be avoided by the following logic: If two NodeValueSortKeys have different collation languages, sort them by the collation languages instead of even looking at the text.

This is the (lang, lex) approach discussed earlier, just applied in a slightly different context.

Sounds like a plan. Let's wait and see what others think.

Now, on stability...

I tried finding ways that the sort would be unstable, but for two values A and B, with same collation, the result would be stable. For two values C and D with different collations, or missing collations, the result would be the sort by the string literal. The node produced would be a Node_Literal (function rewrites any node given to it as a Node_Literal).

Now here is the interesting part. #equals(Object) and #hashcode() use the node value, i.e. the Node_Literal string to compare values. Using the approach suggested by @osma NodeValue#compare(NodeValue, NodeValue) for NodeValueSortKey("Casa", "es") and NodeValueSortKey("Casa", "pt") would return that NodeValueSortKey("Casa", "es") < NodeValueSortKey("Casa", "pt"). i.e. since both values have different collation language tags, we would compare "es" and "pt".

However, #equals(Object) and #hashcode() would report true based only on the Node_Literal node. So NodeValueSortKey("Casa", "es").equals( NodeValueSortKey("Casa", "pt") ) would return true.

I believe this could cause problems, where the merge-sort sort would be stable (I think), but using the elements (sorted or not) in a map/set could result in weird behaviours...

Some code to illustrate the above stated:

NodeValueSortKey nvsk1 = new NodeValueSortKey("Casa", "es");
NodeValueSortKey nvsk2 = new NodeValueSortKey("Casa", "pt");
System.out.println(nvsk1.equals(nvsk2));
// true

NodeValueLang nvl1 = new NodeValueLang("Casa", "es");
NodeValueLang nvl2 = new NodeValueLang("Casa", "pt");
System.out.println(nvl1.equals(nvl2));
// false

For NodeValueLangs, when a Node_Literal is created, it is given a LiteralLabel object that it wraps. Then, when you call NodeValueLang#equals(Object), NodeValueLang uses the LiteralLabel#equals(Object) to compare the other NodeValueLang. LiteralLabel is checking the language tag.

I wonder if we should create a new Node_Concrete implementation in org.apache.jena.graph (Node_SortKey?), or if we should modify Node_Literal... I feel like the latter would be less elegant than the former.

By using the current implementation, plus @osma's suggestion of comparing the collation language tag, and finally by making sure equals/hashcode agree with what our comparable says; then I believe we would have a stable sort. Thoughts?

Member

kinow commented May 4, 2017

Doesn't this run into the unstable sort issue that @afs cautioned against?

Not sure. I think not because of this approach, but I tried to find if sort could be unstable, and think I found one case.

I think it could be avoided by the following logic: If two NodeValueSortKeys have different collation languages, sort them by the collation languages instead of even looking at the text.

This is the (lang, lex) approach discussed earlier, just applied in a slightly different context.

Sounds like a plan. Let's wait and see what others think.

Now, on stability...

I tried finding ways that the sort would be unstable, but for two values A and B, with same collation, the result would be stable. For two values C and D with different collations, or missing collations, the result would be the sort by the string literal. The node produced would be a Node_Literal (function rewrites any node given to it as a Node_Literal).

Now here is the interesting part. #equals(Object) and #hashcode() use the node value, i.e. the Node_Literal string to compare values. Using the approach suggested by @osma NodeValue#compare(NodeValue, NodeValue) for NodeValueSortKey("Casa", "es") and NodeValueSortKey("Casa", "pt") would return that NodeValueSortKey("Casa", "es") < NodeValueSortKey("Casa", "pt"). i.e. since both values have different collation language tags, we would compare "es" and "pt".

However, #equals(Object) and #hashcode() would report true based only on the Node_Literal node. So NodeValueSortKey("Casa", "es").equals( NodeValueSortKey("Casa", "pt") ) would return true.

I believe this could cause problems, where the merge-sort sort would be stable (I think), but using the elements (sorted or not) in a map/set could result in weird behaviours...

Some code to illustrate the above stated:

NodeValueSortKey nvsk1 = new NodeValueSortKey("Casa", "es");
NodeValueSortKey nvsk2 = new NodeValueSortKey("Casa", "pt");
System.out.println(nvsk1.equals(nvsk2));
// true

NodeValueLang nvl1 = new NodeValueLang("Casa", "es");
NodeValueLang nvl2 = new NodeValueLang("Casa", "pt");
System.out.println(nvl1.equals(nvl2));
// false

For NodeValueLangs, when a Node_Literal is created, it is given a LiteralLabel object that it wraps. Then, when you call NodeValueLang#equals(Object), NodeValueLang uses the LiteralLabel#equals(Object) to compare the other NodeValueLang. LiteralLabel is checking the language tag.

I wonder if we should create a new Node_Concrete implementation in org.apache.jena.graph (Node_SortKey?), or if we should modify Node_Literal... I feel like the latter would be less elegant than the former.

By using the current implementation, plus @osma's suggestion of comparing the collation language tag, and finally by making sure equals/hashcode agree with what our comparable says; then I believe we would have a stable sort. Thoughts?

@afs

afs approved these changes May 12, 2017

I'm not so worried about unstable sorts when using this collation function approach. It is possible to write bad functions anyway (example ORDER BY 1)

I would worry if it was built-in to ORDER BY. By having the query writer ask for a certain collation, the responsibility is passed to the query writer to use it when valid. Basically, don't misuse on on mixed data. Use something like ORDER BY lang(?x) arq:collate(lang(?x), ?x) if necessary.

Show outdated Hide outdated jena-arq/src/main/java/org/apache/jena/sparql/expr/NodeValue.java
// collators are already stored in a concurrent map by the JVM, with <locale, softref<collator>>
Collator collator = Collator.getInstance(desiredLocale);
cmp = collator.compare(nv1.getString(), nv2.getString());
} else {

This comment has been minimized.

@afs

afs May 12, 2017

Member

Would it be better to have NodeSortKey being comparable rather than putting the comparison here?

The then removes the need for getCollation.

And, as a general mechanism, NodeSortKey are not restricted to language-based collection. Maybe an extension is for NodeSortKey based on an enum-like interpretation of a value (e.g. a persons title or rank or even "one", "two", "three").

@afs

afs May 12, 2017

Member

Would it be better to have NodeSortKey being comparable rather than putting the comparison here?

The then removes the need for getCollation.

And, as a general mechanism, NodeSortKey are not restricted to language-based collection. Maybe an extension is for NodeSortKey based on an enum-like interpretation of a value (e.g. a persons title or rank or even "one", "two", "three").

This comment has been minimized.

@kinow

kinow May 12, 2017

Member

What a great idea being comparable! Done, it reduced changes in NodeValue, and made writing tests for the comparison (core part of this change) easier. Thanks Andy!

As for the general mechanism, I'd be +1, but maybe later I think. The class is final for now and has a few comments. If I understand it correctly, someone in the future may remove the final mark, remove the collation and move it to subtype/enums. This way we could re-use NodeSortKey later for comparing other things. One could even write a simple function that would compare values by some string edit distance.

@kinow

kinow May 12, 2017

Member

What a great idea being comparable! Done, it reduced changes in NodeValue, and made writing tests for the comparison (core part of this change) easier. Thanks Andy!

As for the general mechanism, I'd be +1, but maybe later I think. The class is final for now and has a few comments. If I understand it correctly, someone in the future may remove the final mark, remove the collation and move it to subtype/enums. This way we could re-use NodeSortKey later for comparing other things. One could even write a simple function that would compare values by some string edit distance.

protected Node makeNode() {
return NodeFactory.createLiteral(string);
}

This comment has been minimized.

@afs

afs May 12, 2017

Member

Add comments that makeNode are fake (they don't round trip).

This could be one of the XSD binary datatypes (base64binary, hexBinary) but really we have to acknowledge to ourselves that NodeSortKey is "internal" and appearing in output or in expressions is not going to fully work.

@afs

afs May 12, 2017

Member

Add comments that makeNode are fake (they don't round trip).

This could be one of the XSD binary datatypes (base64binary, hexBinary) but really we have to acknowledge to ourselves that NodeSortKey is "internal" and appearing in output or in expressions is not going to fully work.

This comment has been minimized.

@kinow

kinow May 12, 2017

Member

Added a simple comment, not sure if good enough. Also marked NodeSortKey as final for now to show users it is not supposed to be extended yet, and must be treated as for internal use only.

@kinow

kinow May 12, 2017

Member

Added a simple comment, not sure if good enough. Also marked NodeSortKey as final for now to show users it is not supposed to be extended yet, and must be treated as for internal use only.

This comment has been minimized.

@afs

afs May 13, 2017

Member

Marking final makes sense. There are some details in the way NodeValueSortKey does not follow the pattern of other NodeValues e.g. the "getXYZ" operations return its value (so getString isfor xsd:string to return their string value, getInteger on NodeValueInteger etc.

@afs

afs May 13, 2017

Member

Marking final makes sense. There are some details in the way NodeValueSortKey does not follow the pattern of other NodeValues e.g. the "getXYZ" operations return its value (so getString isfor xsd:string to return their string value, getInteger on NodeValueInteger etc.

This comment has been minimized.

@kinow

kinow May 14, 2017

Member

I see, so a NodeValueXyx would have the equivalent method getXyz. In our case, we have NodeValueSortKey and we had (before I just deleted) the getCollation.

Should I simply add back the getCollation method, and return a NodeValueSortKey?

Would it make sense in the future try to add a generic type <T> to NodeValueSortKey, making a single method public T getNodeValue()? I think this way we wouldn't have to add one method for each new NodeValue, and the design contract would be easier to be followed?

@kinow

kinow May 14, 2017

Member

I see, so a NodeValueXyx would have the equivalent method getXyz. In our case, we have NodeValueSortKey and we had (before I just deleted) the getCollation.

Should I simply add back the getCollation method, and return a NodeValueSortKey?

Would it make sense in the future try to add a generic type <T> to NodeValueSortKey, making a single method public T getNodeValue()? I think this way we wouldn't have to add one method for each new NodeValue, and the design contract would be easier to be followed?

This comment has been minimized.

@ajs6f

ajs6f May 14, 2017

Member

@kinow, I like the idea of making that sort of contract more specifically visible in the type system. You are distinguishing the two case @afs mentioned by putting getString, getInteger, etc. into the generic and leaving "control" functions like getCollation out of it?

Wouldn't this be just as much a good pattern for NodeValue itself, except we would have to be careful about migrating it?

@ajs6f

ajs6f May 14, 2017

Member

@kinow, I like the idea of making that sort of contract more specifically visible in the type system. You are distinguishing the two case @afs mentioned by putting getString, getInteger, etc. into the generic and leaving "control" functions like getCollation out of it?

Wouldn't this be just as much a good pattern for NodeValue itself, except we would have to be careful about migrating it?

This comment has been minimized.

@kinow

kinow May 14, 2017

Member

I thought that in this case, getCollation was not just a control function. I thought it was the equivalent of getInteger for an int node, and getString for a string.

So in this case getCollation would return the NodeSortKey itself, as we may need string+collation. And with that, we wouldn't have to cast the node value to NodeSortKey, we could simply call getCollation().compareTo(other.getCollation()).

Or with the generics alternative, it would call nv1.getNodeValue().compareTo(nv2.getNodeValue()), where getNodeValue() for a NodeSortKey extends NodeValue<NodeSortKey> would return a NodeSortKey.

Hope that makes sense

@kinow

kinow May 14, 2017

Member

I thought that in this case, getCollation was not just a control function. I thought it was the equivalent of getInteger for an int node, and getString for a string.

So in this case getCollation would return the NodeSortKey itself, as we may need string+collation. And with that, we wouldn't have to cast the node value to NodeSortKey, we could simply call getCollation().compareTo(other.getCollation()).

Or with the generics alternative, it would call nv1.getNodeValue().compareTo(nv2.getNodeValue()), where getNodeValue() for a NodeSortKey extends NodeValue<NodeSortKey> would return a NodeSortKey.

Hope that makes sense

This comment has been minimized.

@ajs6f

ajs6f May 14, 2017

Member

I guess my confusion is coming in as follows: my (very possibly wrong) understanding is that for (e.g.) NodeValueInteger, etc, the whole state of the thing is in the (e.g.) BigInteger, which would come entirely from the underlying binding, from the RDF. But the collation is coming from the query. It seems different. But like I say, maybe I've got the wrong end of the stick on this and I hope I'm not confusing things further!

@ajs6f

ajs6f May 14, 2017

Member

I guess my confusion is coming in as follows: my (very possibly wrong) understanding is that for (e.g.) NodeValueInteger, etc, the whole state of the thing is in the (e.g.) BigInteger, which would come entirely from the underlying binding, from the RDF. But the collation is coming from the query. It seems different. But like I say, maybe I've got the wrong end of the stick on this and I hope I'm not confusing things further!

This comment has been minimized.

@kinow

kinow May 14, 2017

Member

I'm still learning how/where NodeValue is used, and its relation to a Node. So far, NodeSortKey indeed does not completely match other NodeValues. But I think if generics worked here (and it may not work due to how NodeValue is designed) maybe it would make more sense. NodeSortKey would be a bit weird, but would be a weird type, not interfering with the NodeValue definition per se.

@kinow

kinow May 14, 2017

Member

I'm still learning how/where NodeValue is used, and its relation to a Node. So far, NodeSortKey indeed does not completely match other NodeValues. But I think if generics worked here (and it may not work due to how NodeValue is designed) maybe it would make more sense. NodeSortKey would be a bit weird, but would be a weird type, not interfering with the NodeValue definition per se.

} else {
cmp = XSDFuncOp.compareString(nv1, nv2) ;
if (!(nv1 instanceof NodeValueSortKey) || !(nv2 instanceof NodeValueSortKey)) {
raise(new ExprNotComparableException("Can't compare (not node value sort keys) "+nv1+" and "+nv2)) ;

This comment has been minimized.

@afs

afs May 13, 2017

Member

That shouldn't happen if the comparison is VALUE_SORTKEY

@afs

afs May 13, 2017

Member

That shouldn't happen if the comparison is VALUE_SORTKEY

This comment has been minimized.

@kinow

kinow May 14, 2017

Member

Agree, right now there is no way of that happening, as the value space for sort keys is returned only when both node values are NodeSortKeys. Added more to prevent bugs in case anyone ever changed the function returning the value space.

Should we remove it?

@kinow

kinow May 14, 2017

Member

Agree, right now there is no way of that happening, as the value space for sort keys is returned only when both node values are NodeSortKeys. Added more to prevent bugs in case anyone ever changed the function returning the value space.

Should we remove it?

@@ -978,7 +967,6 @@ public boolean isDayTimeDuration()
public boolean getBoolean() { raise(new ExprEvalTypeException("Not a boolean: "+this)) ; return false ; }
public String getString() { raise(new ExprEvalTypeException("Not a string: "+this)) ; return null ; }
public String getLang() { raise(new ExprEvalTypeException("Not a string: "+this)) ; return null ; }
public String getCollation() { raise(new ExprEvalTypeException("Not a sort key: "+this)) ; return null ; }

This comment has been minimized.

@afs

afs May 13, 2017

Member

This would be the "value" of a NodeValueSortKey ... which is itself so NodeValueSortKey then the casts aren't needed in the comparision.

@afs

afs May 13, 2017

Member

This would be the "value" of a NodeValueSortKey ... which is itself so NodeValueSortKey then the casts aren't needed in the comparision.

This comment has been minimized.

@kinow

kinow May 14, 2017

Member

Yup, that makes sense. See comment above. I'd be fine re-adding this method, and making it return a NodeValueSortKey, though wouldn't mind postponing it to try some generics-fu first :-)

@kinow

kinow May 14, 2017

Member

Yup, that makes sense. See comment above. I'd be fine re-adding this method, and making it return a NodeValueSortKey, though wouldn't mind postponing it to try some generics-fu first :-)

@afs

This comment has been minimized.

Show comment
Hide comment
@afs

afs Jun 2, 2017

Member

How about merging this PR to establish a baseline in the codebase and refine it, if needed, from this point onwards?

Member

afs commented Jun 2, 2017

How about merging this PR to establish a baseline in the codebase and refine it, if needed, from this point onwards?

@ajs6f

This comment has been minimized.

Show comment
Hide comment
@ajs6f

ajs6f Jun 2, 2017

Member

+1

Member

ajs6f commented Jun 2, 2017

+1

@kinow

This comment has been minimized.

Show comment
Hide comment
@kinow

kinow Jun 2, 2017

Member

+1 once merged, I can start working on documentation in the website :+1

Member

kinow commented Jun 2, 2017

+1 once merged, I can start working on documentation in the website :+1

@kinow

This comment has been minimized.

Show comment
Hide comment
@kinow

kinow Jun 12, 2017

Member

If there are no objections I will merge this PR tomorrow or over the weekend, and will start working on the documentation.

Member

kinow commented Jun 12, 2017

If there are no objections I will merge this PR tomorrow or over the weekend, and will start working on the documentation.

@osma

This comment has been minimized.

Show comment
Hide comment
@osma

osma Jun 13, 2017

Contributor

+1 for merging.

Contributor

osma commented Jun 13, 2017

+1 for merging.

@afs

This comment has been minimized.

Show comment
Hide comment
@afs

afs Jun 13, 2017

Member

+1

Member

afs commented Jun 13, 2017

+1

@asfgit asfgit merged commit 3b12f83 into apache:master Jun 13, 2017

asfgit pushed a commit that referenced this pull request Jun 13, 2017

@kinow

This comment has been minimized.

Show comment
Hide comment
@kinow

kinow Jun 13, 2017

Member

Merged. Will start working on the documentation, probably this next weekend. Thanks everyone!!!

Member

kinow commented Jun 13, 2017

Merged. Will start working on the documentation, probably this next weekend. Thanks everyone!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment