Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Regression SQL module do not close unused JNDI connections and can starve pool #3555

Closed
gmella opened this issue Oct 7, 2020 · 25 comments
Assignees
Labels
discuss ask for feedback

Comments

@gmella
Copy link

gmella commented Oct 7, 2020

Describe the bug
Was not present in 4.7.1 but if eXistDB 5.2 opens some JNDI connections and do not have statement in the same context then connection close are missing. Any further sibling queries may draw to db pool starvation and block any new connection requests if pool already is full.

Expected behavior
Any query that open for a new connection must close it before being ended even if no statement has been executed.

To Reproduce

Because it is related to the context, multiples call are required to highlight the connection leak.

xquery version "3.1";
let $requests-nb := 5 (: choose a value lower than your maxTotal pool limit  and repeat multiple query calls :)
let $jndi-name := "my-jndi-name"
for $l in 1 to $requests-nb
  return sql:get-jndi-connection($jndi-name)

On my org.apache.commons.dbcp2.BasicDataSource config, any query with request-nb higher than maxTotal pool blocks on first call. Then any new additional sql database is blocked, but eXistDB still respond properly. To retrieve sql connection back I do need to restart eXistDB instance (even if I kill my JDBC connections from postgresql).

Context (please always complete the following information):

  • OS: docker
  • eXist-db version: 5.2
  • Java Version Java8u121

Additional context

  • docker with modified jetty.conf
<!-- BEGIN OIDB Data source -->
<New id="DSOiDB" class="org.eclipse.jetty.plus.jndi.Resource">
   <Arg></Arg>
   <Arg>jdbc/oidb</Arg>
   <Arg>
       <New class="org.apache.commons.dbcp2.BasicDataSource">
          <Set name="driverClassName">org.postgresql.Driver</Set>
          <Set name="url">jdbc:postgresql://db:5432/oidb?loggerLevel=DEBUG</Set>
          <Set name="username">XXX</Set>
          <Set name="password">XXX</Set>
<!-- pool settings -->
          <Set name="initialSize">5</Set>
          <Set name="maxIdle">10</Set>
          <Set name="maxTotal">20</Set>
          <Set name="poolPreparedStatements">true</Set>
<!-- validation query -->
          <Set name="validationQuery">SELECT 1</Set>
<!-- abandonned checks -->
          <Set name="logAbandoned">true</Set>
       </New>
   </Arg>
 </New>

A more sophisticated data source configuration may also improve reliability and I do recognize that this is not a bug but related to my bad usage (some handles where created from bad context).

both 4.7 and 5.2 docker version have the same datasource for jetty and using the same jar files.

CLASSPATH=/exist/lib/exist.uber.jar:/exist/lib/user/commons-logging-1.1.3.jar:/exist/lib/user/commons-dbcp2-2.8.0.jar:/exist/lib/user/commons-pool2-2.8.1.jar:/exist/lib/user/postgresql-42.2.16.jar

I also noticed that connections are no more properly closed (was ok on 4.7.2) if performed in another library module (context issue?).
Please find the reproducer files:

xquery version "3.0";
module namespace mymodule="http://exist-db.org/apps/myapp/mymodule";
declare function mymodule:get-handle(){
    let $handle := sql:get-jndi-connection("jdbc/oidb")
    return 
        sql:execute($handle, "SELECT 1", false())
};
xquery version "3.1";
import module namespace mymodule="http://exist-db.org/apps/myapp/mymodule" at "/db/mymodule.xql";

(: This code leaks sql connections on 5.2.0 but not on 4.7.2 :)
mymodule:get-handle()
xquery version "3.1";

(: This code does not leak sql connection on 5.2.0 and 4.7.2 :)
let $handle := sql:get-jndi-connection("jdbc/oidb")
    return 
        sql:execute($handle, "SELECT 1", false())

Should possible improvement save time for whom would use sql module.

I may open a PR with some more log that could helped me to get a faster diagnostic.

Thx!

@gmella gmella changed the title [BUG] SQL module do not close unused JNDI connections and can starve pool [BUG] Regression SQL module do not close unused JNDI connections and can starve pool Oct 7, 2020
@bourgesl
Copy link

bourgesl commented Oct 8, 2020

Maybe related to this change in 5.0.0:
#2053

SQLModule is an InternalModule and stores connections in the XQueryContext map but only closed when the context.reset() is called. However if the context is reused (xquery.getContext().prepareForReuse()), there is no call to reset().

Maybe it is related to query caching or context reuse ?

@gmella
Copy link
Author

gmella commented Oct 8, 2020

No idea. I just noticed that those idle remaining connections get closed and free the pool just after running a Garbage Collection (through monex).

@dizzzz
Copy link
Member

dizzzz commented Oct 17, 2020

Hmmm if running the GC solves the it, then something scary is happening; running GC means that the objects are not referenced and hence not used anymore. @adamretter ?

@adamretter
Copy link
Member

adamretter commented Oct 20, 2020

@gmella I wrote some tests that follow your examples to try and reproduce your issue. I was unable to reproduce your issue, you can find the tests here - #3574

I am testing against 5.3.0-SNAPSHOT.

@adamretter adamretter added the discuss ask for feedback label Oct 20, 2020
@gmella
Copy link
Author

gmella commented Oct 21, 2020

Thx a lot @adamretter for the new IT tests! I just repeated the simple scenario (no jdbc) interacting with fresh snapshots of docker and standalone jar installs. Both still have the issue. May the tests be executed in the same context so connections are properly closed ? Is that related to postgres incompatibility with existdb 5 ? ...

xquery version "3.0";
module namespace mymodule="http://exist-db.org/apps/myapp/mymodule";
declare function mymodule:get-handle(){
    (:    let $handle := sql:get-jndi-connection("jdbc/oidb"):)
    (: 172.17.0.1 is fine for both modes if docker is runnning , else use localhost or any server name:)
    let $handle := sql:get-connection("org.postgresql.Driver", "jdbc:postgresql://172.17.0.1/oidb", "oidb", "oidb")
    return 
        sql:execute($handle, "SELECT 1", false())
};

$ CLASSPATH_PREFIX=$PWD/postgresql-42.2.16.jar bin/startup.sh
$ # count remaining connection after next calls
$ curl http://localhost:8080/exist/rest/db/mytest.xql

21 Oct 2020 09:33:41,327 [main] INFO  (JettyStart.java [run]:186) - [eXist Version : 5.3.0-SNAPSHOT] 
21 Oct 2020 09:33:41,328 [main] INFO  (JettyStart.java [run]:187) - [eXist Build : 20200928145551] 
21 Oct 2020 09:33:41,328 [main] INFO  (JettyStart.java [run]:188) - [Git commit : 22ab41e95a7a19a73b0b281b867e313c74c703b2] 
21 Oct 2020 09:33:41,328 [main] INFO  (JettyStart.java [run]:190) - [Operating System : Linux 4.9.0-13-amd64 amd64] 
21 Oct 2020 09:33:41,328 [main] INFO  (JettyStart.java [run]:191) - [log4j.configurationFile : /exist/etc/log4j2.xml] 
21 Oct 2020 09:33:41,357 [main] INFO  (JettyStart.java [run]:192) - [jetty Version: 9.4.31.v20200723] 

Is there any other application using sql extension on top of eXistDB 5.2+ ? my install may have some side effect....
Thx again for your valuable help.

@adamretter
Copy link
Member

adamretter commented Oct 21, 2020 via email

@gmella
Copy link
Author

gmella commented Oct 21, 2020

I think there are plenty of other people using SQL extension with eXist-db.

Good news !

As you are using JNDI, are you also using a Connection Pool? I think Tomcat provides that by default. If so, it could likely be the pool which is keeping the connections open (and rightly so)

Yes I use a connection pool as mentioned in the issue description and everything was fine on 4.7. I do add some timeout config (jetty) to improve and avoid infinite connections. I may have better to open a more simple issue since this first address simple connection mode using sql:get-connection without dealing with more complex points related to connection pool...
This is why I give my updated test procedure in my last comment.

I could try to provide another feedback on top of another sql engine & verbatim eXistDB install.

@adamretter
Copy link
Member

adamretter commented Oct 21, 2020 via email

@gmella
Copy link
Author

gmella commented Oct 21, 2020

If you still believe this is an issue, could you contribute a test case?

Yes I thought about it and believe that this must be done with multiple calls on a query to be sure to use multiples context calls...
My knowledge is not very good, but I will try to help starting to search for other tests on the same schema.

@adamretter
Copy link
Member

Yes I thought about it and believe that this must be done with multiple calls on a query to be sure to use multiples context calls...

I am not sure how you mean... do you mean that I should execute the query that takes the connection more than once? And if so, should that be done sequentially and/or in parallel?

@gmella
Copy link
Author

gmella commented Oct 26, 2020

I experienced issue sequentially.

I still need for time (:;or help ;) to set up a working java test. I take the direction of setting next kind of test:

xquery version "3.1";

let $module := 'xquery version "3.0";
module namespace mymodule="http://exist-db.org/apps/myapp/mymodule";
declare function mymodule:get-handle(){
    (: let $handle := sql:get-jndi-connection("jdbc/oidb"):)
    let $handle := sql:get-connection("org.postgresql.Driver", "jdbc:postgresql://172.17.0.1/oidb", "oidb", "oidb")    
    (: 172.17.0.1 is fine as dbhost name for both mode if docker is running , else use localhost or any other server name:)
    return 
        sql:execute($handle, "SELECT 1", false())
};'

let $script := 'xquery version "3.1";
import module namespace mymodule="http://exist-db.org/apps/myapp/mymodule" at "/db/mymodule.xql";
<a>{ mymodule:get-handle() }</a>'

let $script-name := "mytest.xql"
let $store := xmldb:store("/db", "mymodule.xql", $module)
let $store := xmldb:store("/db", $script-name, $script)

(: may fail if value is bigger than max connections allowed by sql db engine (100 on a default postgres) :)
let $nb-loops := 200
for $e in 1 to $nb-loops
return 
    doc("http://localhost:8080/exist/rest/db/"||$script-name)

@gmella
Copy link
Author

gmella commented Oct 30, 2020

@bourgesl told me to try again disabling cache.

I then started to edit query-pool in conf.xml I noticed that -1 is not allowed for size to disable caching mecanism, but 0 is ok to highlight the link between the query-pool activity and sql connection close issues. Heavy load (>1000 calls) can then be performed with minimal failures getting a valid sql connection. But all of that works quite nicely (on both standalone and docker configs) if and only if there are some other queries (other imports) interleaved. I don't know the details but this seems to point to the query caching mecanism that could be related to the reset of the cached context.

Sorry for not giving test material, but please find my script which helped me to 'measure' the impact of changing config or adjusting elements that help the flushing phase (without GC I guess).

xquery version "3.1";

let $module := 'xquery version "3.0";
module namespace mymodule="http://exist-db.org/apps/myapp/mymodule";
declare function mymodule:get-handle(){
(: let $handle := sql:get-jndi-connection("jdbc/oidb"):)
    let $handle := sql:get-connection("org.postgresql.Driver", "jdbc:postgresql://172.17.0.1/oidb", "oidb", "oidb")    
    (: 172.17.0.1 is fine as dbhost name for both mode if docker is running , else use localhost or any other server name:)
    return 
        sql:execute($handle, "SELECT 1", false())
};'

let $script := 'xquery version "3.1";
import module namespace mymodule="http://exist-db.org/apps/myapp/mymodule" at "/db/mymodule.xql";
mymodule:get-handle()'

let $script-name := "mytest.xql"
let $store := try { xmldb:store("/db", "mymodule.xql", $module) } catch * { () }
let $store := try { xmldb:store("/db", $script-name, $script) } catch * { () }

(: may fail if value is bigger than max connections allowed by sql db engine (100 on a default postgres) :)
let $nb-loops := 2000

let $fails :=
    for $e at $i in 1 to $nb-loops
    return
        try {
            let $op := doc("http://localhost:8080/exist/rest/db/" || $script-name)
                      
            (: uncommenting next line fixes the issue but always add an additional step :)
            (: let $op := try { doc("http://localhost:8080/exist/apps/eXide/modules/deployment.xql?info=__test__") } catch * { () }:)
            let $log := util:log("info", "valid iteration " || $i)

            return ()
        } catch * {
            let $log := util:log("error", "invalid iteration " || $i)
            let $multiple-calls := 10 (: 50 seems to be ok on default size query-pool so that next sql connection works (does not work using jndi+pool) :)
            
            (: Next call helps to flush query-pool if called enough times :) 
            (: let $op := for $do in 1 to $multiple-calls return doc("http://localhost:8080/exist/apps/eXide/repo.xml") :)
            
            (: Next call fails (non xml response) but helps efficiently to flush the query-pool :)
            let $op := for $do in 1 to $multiple-calls
                return try { doc("http://localhost:8080/exist/apps/eXide/modules/deployment.xql?info=__test__") } catch * { () }
                
            (: additional delay does not seem to help for connection closing :) 
            (: let $op := util:wait(1000) :)
            
            return 
                $i
        }
return
    (
        count($fails) || " failed queries : " || string-join($fails, ", ")
        ,doc("file:///exist/etc/conf.xml")//query-pool
        ,doc("file:///exist/etc/jetty/jetty.xml")//New[Arg='jdbc/oidb']
    )

Additionally jndi is fine without connection pool, but using connection pool prevents the same recovery mecanism :(

Anyway, this interleaved calls may be an intermediate workarround to wait a bug fix on my side. Could man think about a new sql:close-handle($handle) function ?

Hope this does not add too much noise...

@PieterLamers
Copy link

This problem seems to be related to the issues we reported on Slack on Sep 25, 2020:https://exist-db.slack.com/archives/CG2MRUZ35/p1601034141060200
For reproducing this, Erik Siegel wrote the attached files.
sql-connections-problem.zip

@adamretter
Copy link
Member

@gmella There seem to be a lot of things mixed up in your query with notes of different things going on. Is the issue in your mind still related to JNDI and connections not being released once the query finishes? or is it now something else?

@gmella
Copy link
Author

gmella commented May 5, 2021

@adamretter I guess that connections are not properly released when connection is opened in another context (/ another xql file). The test code provided by Erik Siegel is on the same schema. I agree, my issue lacks focus. I propose to forget the JNDI and make basic calls with postgres or mysql reliable and repeatable back again. Does it sound better ?

@adamretter
Copy link
Member

@gmella So I just want to be make sure we all understand how the SQL Module should work:

  1. If you use sql:get-connection from an XQuery then a connection to the SQL database is opened (there is no pooling used). That connection is assigned a numeric identifier which is what is returned to you. That connection is then kept for the entire-lifetime of the calling XQuery, and is available via its numeric identifier. When the XQuery caller finishes executing, the SQL connection(s) should be automatically closed. eXist-db imposes no limits on the number of connections you can open via sql:get-connection however your SQL database may have limits configured.

  2. If you use sql:get-jndi-connection, the story is similar to (1). The difference is that eXist-db requests a connection from the JNDI provider rather than opening it itself (the connection it is provided may already be open if JNDI is offering pooling). When the XQuery finishes, eXist-db tells the JNDI provider to close the connection. The JNDI provider may transparently implement connection pooling, but this is invisible to eXist-db. If the JDNI provider does provide connection pooling, then it may opt to keep the connection hot (open) is outside of eXist-db's responsibility.

  3. With the new function that was just added, sql:get-connection-from-pool, eXist-db requests a connection from the connection pool that was configured in it's conf.xml file. The connection may already be open if the pool has a hot but free connection. Like (1) and (2) the connection is kept and owned by the XQuery for the entire duration of its lifetime, when the query finishes executing, eXist-db returns the query to the pool. Whether the connection pool actually closes the connection or keeps it hot (open) is again outside of eXist-db's responsibility.

Hopefully that behaviour was as everyone had in their mind.

@gmella Do you consider one of the above to not work how I described, and therefore contains bugs?

@PieterLamers
Copy link

PieterLamers commented May 5, 2021

@adamretter re (1): as Erik's code shows, as well as the conversation in Slack, the connection is not closed when the xquery finishes executing. Therefore we are force-closing all connections now after 30 seconds (via the mysql wait_timeout parameter), to avoid exceeding the maximum number of connections that mysql has for us (which is currently at 1.000).

@adamretter
Copy link
Member

adamretter commented May 5, 2021

@PieterLamers Okay, it's good to reconfirm... but so far I have been unable to reproduce that, behaviour even with the code that was provided :-(

@bourgesl
Copy link

bourgesl commented May 5, 2021

Maybe related to this change in 5.0.0:
#2053

SQLModule is an InternalModule and stores connections in the XQueryContext map but only closed when the context.reset() is called. However if the context is reused (xquery.getContext().prepareForReuse()), there is no call to reset().

Maybe it is related to query caching or context reuse ?

Is there any case where xquery context is cached and then leaks connection even if the xquery is finished ?

@adamretter
Copy link
Member

@bourgesl XQuery Context is reused for the next execution of the same query, but reset() is called on it before that happens. The reset method should close down the connections

@bourgesl
Copy link

bourgesl commented May 6, 2021

XQueryContexts can be inherited (global) or how to you manage scopes & life cycle ?

Are you sure to call reset() on all the context hierachy ?

If not, it may be our leak source, depending in which context map (scope) the sql connection is stored.

@gmella
Copy link
Author

gmella commented Sep 8, 2021

After some other tries and discussions with @bourgesl I guess that getting a new close-connection function is the best option to give a way to provide reliable services at least for the ones which could have an heavy use of the sql extension.

I didn't manage to explain why, however even if I use sql extension inside a single xquery module I can get a longer uptime but I still have a starved connection pool. (and this prevent existdb to restart quickly)...

@adamretter , how sounds this option to add such explicitly callable function to close the connection ?

@PieterLamers
Copy link

@gmella Do you know that there is an sql:close-connection in 5.3.0? Not showing up in fundocs on the site because that is not up to date, but it is showing on my localhost, see image. We no longer have any connection problems now that we are using the connection pooling and this close function (sometimes redundantly).
image

@gmella
Copy link
Author

gmella commented Sep 8, 2021

Thanks a lots @PieterLamers for this information. I was not aware of and this is a very good opportunity to update my 5.2 versions and close this issue.
Thank you to all people and associated work behind all of these wonderful tools.

@gmella gmella closed this as completed Sep 8, 2021
@adamretter
Copy link
Member

@gmella We did also fix a Connection leak when xquery library modules are involved. So it may be that 5.3.0 fixes it without you needing to make any changes at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss ask for feedback
Projects
None yet
Development

No branches or pull requests

5 participants