Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] range:index-keys-for-field broken since 5.3.0 when called from XQueryServlet #4074

Open
olvidalo opened this issue Oct 27, 2021 · 1 comment

Comments

@olvidalo
Copy link
Contributor

olvidalo commented Oct 27, 2021

tl;dr: Since eXist 5.3.0, range:index-keys-for-field is not properly usable only when called from the XQueryServlet. When called by other means e.g. through eXide (or using the XQuery testing suite as in the existing range index tests) it works as expected. The issue still persists in the current develop branch. This seems to be related to which statically known documents are available when the query is executed.

This affects not only my app that uses this function to implement faceted search, but also Monex: as of eXist 5.3.0, listing the keys of a lucene range index field does not work, the list of keys in Monex for fields is always empty even if the index is populated. There is a mailing list post that describes this problem and is probably related: https://sourceforge.net/p/exist/mailman/message/37333142/.

Steps to reproduce

  1. Create collection.xconf in /db/system/config/db/fieldtest
<collection xmlns="http://exist-db.org/collection-config/1.0">
        <index xmlns:tei="http://www.tei-c.org/ns/1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema">
            <range>
                <create qname="tei:test">
                    <field name="elem-field" match="tei:elem" type="xs:string" case="no"/>
                </create>
            </range>
        </index>
</collection>
  1. Create test.xml in /db/fieldtest
<test xmlns="http://www.tei-c.org/ns/1.0">
        <elem>a</elem>
        <elem>b</elem>
        <elem>c</elem>
</test>
  1. Create test2.xml in /db/fieldtest/test2
    <test xmlns="http://www.tei-c.org/ns/1.0">
        <elem>a</elem>
        <elem>b</elem>
        <elem>c</elem>
        <elem>b</elem>
        <elem>y</elem>
    </test>
  1. Make the following function call once using XQueryServlet and once using other means (eXide etc.):
  range:index-keys-for-field("elem-field", function($key, $nums) { $key }, 3)}

Expected results

A list of distinct values of elem-Elements:

   ("a", "b", "c", "y")

Actual results

When calling index-keys-for-field through eXide, the result is as expected. When calling it through the XQueryServlet, the result is empty.

Self-contained test

I have attached a self-contained test (test.xql) that sets up the testing environment and calls index-keys-for-field in different ways. It does not use the XQuery Testing Suite because I have found no way to make these tests use the XQueryServlet.

How to run:

  1. Save test.xql under /db/apps/test/ or similar.
  2. Set the setuid bit because it needs admin permissions to create collections and docs.
  3. Run the script through the XQueryServlet (e.g. open http://localhost:8080/exist/apps/test/test.xql)
  4. Open the script in eXide and run it using the "eval" button

The results from step 3 and 4 will differ in interesting ways, not only depending on whether it has been run through the XQueryServlet but also on how the function is called. The script calls index-keys-for-field in four different ways:

  1. Without explicit context:
    range:index-keys-for-field("elem-field", function($key, $nums) { $key }, 3)

  2. With explicit context:
    collection("/db/fieldtest")/range:index-keys-for-field("elem-field", function($key, $nums) { $key }, 3)

  3. Wrapped in a function, with context:

declare function local:func($a, $b, $c) {
    range:index-keys-for-field($a, $b, $c)
};

collection("/db/fieldtest")/local:func("elem-field", function($key, $nums) { $key }, 3)
  1. Wrapped in an inline function, with context:
let $inlineFunc := function($a, $b, $c) {
    range:index-keys-for-field($a, $b, $c)
}

collection("/db/fieldtest")/$inlineFunc("elem-field", function($key, $nums) { $key }, 3)

These are the results from current develop using eXide:

<result>
    <without-context>a b c y</without-context> <!-- expected result! -->
    <with-context>a b c a b c y</with-context>
    <func-with-context>a b c y a b c y</func-with-context>
    <inline-func-with-context>a b c y</inline-func-with-context>
</result>

And these are the results from current develop using XQueryServlet:

<result>
    <without-context/>
    <with-context>a b c a b c y</with-context>
    <func-with-context/>
    <inline-func-with-context/>
</result>

This shows that using the XQueryServlet, there is currently no way to get the expected results. Wenn calling the function with explicit context, it seems to be evaluated for each document or subcollection, so that the results have duplicates.

More information

Faulty commit

A git bisect session pointed to commit 0f76cf2 as the one that introduced this behavior. Here are the results at commit 35249a7 which is one commit before (same results as with eXist 5.2.0):

Last "good" commit (35249a7) using eXide

<result>
    <without-context>a b c y</without-context> <!-- expected result! -->
    <with-context>a b c a b c y</with-context>
    <func-with-context>a b c a b c y</func-with-context>
    <inline-func-with-context>a b c y</inline-func-with-context>
</result>

Last "good" commit (35249a7) using XQueryServlet

<result>
    <without-context/> 
    <with-context>a b c a b c y</with-context>
    <func-with-context>a b c a b c y</func-with-context>
    <inline-func-with-context>a b c y</inline-func-with-context> <!-- expected result! -->
</result>

The results using eXide are the same, but with XQueryServlet they differ: previously, it was possible to achieve the expected result also with XQueryServlet by wrapping the call to range:index-keys-for-field in an inline function. In issue #1226 I described this previously and also mentioned the workaround – this issue has since been marked as fixed. But it turns out it still persists when using the XQueryServlet (and is thus not caught by the corresponding test).

When calling range:index-keys-for-field with explicit context, the results seem unintuitive to me for both versions.

Related code

In case this might be useful for investigating this issue... What exactly works differently when calling the function through the XQueryServlet seems to be this line:

Occurrences[] occur = worker.scanIndexByField(field, contextSequence == null ? context.getStaticallyKnownDocuments() : contextSequence.getDocumentSet(), start, max);

When there is no context, context.getStaticallyKnownDocuments is used. It seems that when using XQueryServlet, there are no statically known documents, while when calling the function through other means, the statically known documents are all documents in the database. I have no idea, however, where the strange results that are seen when calling it with explicit context are coming from.

@olvidalo
Copy link
Contributor Author

Sorry, couldn't attach the test file due to file extension limitations... I'll paste it here:

test.xql

xquery version "3.1";

declare function local:func($a, $b, $c) {
    range:index-keys-for-field($a, $b, $c)
};

let $dataTest1 :=
    <test xmlns="http://www.tei-c.org/ns/1.0">
        <elem>a</elem>
        <elem>b</elem>
        <elem>c</elem>
    </test>

let $dataTest2 :=
    <test xmlns="http://www.tei-c.org/ns/1.0">
        <elem>a</elem>
        <elem>b</elem>
        <elem>c</elem>
        <elem>b</elem>
        <elem>y</elem>
    </test>

let $collectionConfig := 
    <collection xmlns="http://exist-db.org/collection-config/1.0">
        <index xmlns:tei="http://www.tei-c.org/ns/1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema">
            <range>
                <create qname="tei:test">
                    <field name="elem-field" match="tei:elem" type="xs:string" case="no"/>
                </create>
            </range>
        </index>
    </collection>

let $collectionName := "index-keys-test"
let $collection := "/db/" || $collectionName

let $setUp := (
    xmldb:create-collection("/db/system/config/db", $collectionName),
    xmldb:store("/db/system/config/db/" || $collectionName, "collection.xconf", $collectionConfig),
    xmldb:create-collection("/db", $collectionName),
    xmldb:store($collection, "test.xml", $dataTest1),
    xmldb:create-collection($collection, "test2"),
    xmldb:store($collection || "/test2", "test2.xml", $dataTest2),
    xmldb:reindex($collection)
)

let $inlineFunc := function($a, $b, $c) {
    range:index-keys-for-field($a, $b, $c)
}

return
   <result>
    <without-context>{range:index-keys-for-field("elem-field", function($key, $nums) { $key }, 3000)}</without-context>
    <with-context>{collection($collection)/range:index-keys-for-field("elem-field", function($key, $nums) { $key }, 3000)}</with-context>
    <func-with-context>{collection($collection)/local:func("elem-field", function($key, $nums) { $key }, 3000)}</func-with-context>
    <inline-func-with-context>{collection($collection)/$inlineFunc("elem-field", function($key, $nums) { $key }, 3000)}</inline-func-with-context>
   </result>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant