Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: improved multiValueBind #1328

Closed
wants to merge 27 commits into from
Closed

Conversation

rPraml
Copy link
Contributor

@rPraml rPraml commented Mar 2, 2018

This fixes the bug, if multivalueBind is supported, we cannot say in genral that we can use it on ID columns.

If ID column is from type string or integer it would work, but if ID column is of a type that multivaluebind does not support, you'll get an error.

It als re-adds my existing implementation for SqlServer and oracle

@rbygrave
Copy link
Member

rbygrave commented Mar 2, 2018

Have you checked the query plans that Oracle and SQL Server produce? Can you post them up?

@rPraml
Copy link
Contributor Author

rPraml commented Mar 2, 2018

example query for oracle

16:28:51.864 [main] DEBUG io.ebean.SQL - select t0.id c0, t0.status c1, t0.order_date c2, t0.ship_date c3, t1.name c4, t0.cretime c5, t0.updtime c6, t0.kcustomer_id c7 from o_order t0 join o_customer t1 on t1.id = t0.kcustomer_id  where t0.cretime in (SELECT * FROM TABLE (SELECT ? FROM DUAL))  and t0.id <= ? ; --bind(Array[1100]={2018-03-02 16:28:51.555,1970-01-01 01:00:01.234,...},4)
16:28:51.865 [main] DEBUG io.ebean.SUM - FindMany type[Order] origin[BmouL7.A.A] exeMicros[22078] rows[1] predicates[t0.cretime in (SELECT * FROM TABLE (SELECT ? FROM DUAL))  and t0.id <= ? ] bind[Array[1100]={2018-03-02 16:28:51.555,1970-01-01 01:00:01.234,...},4]
16:28:51.881 [main] DEBUG io.ebean.SQL - select t0.id c0, t0.status c1, t0.order_date c2, t0.ship_date c3, t1.name c4, t0.cretime c5, t0.updtime c6, t0.kcustomer_id c7 from o_order t0 join o_customer t1 on t1.id = t0.kcustomer_id  where t0.id in (SELECT * FROM TABLE (SELECT ? FROM DUAL))  and t0.id <= ? ; --bind(Array[1100]={1,2,3,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,...},4)
16:28:51.882 [main] DEBUG io.ebean.SUM - FindMany type[Order] origin[BmouL7.A.A] exeMicros[12871] rows[3] predicates[t0.id in (SELECT * FROM TABLE (SELECT ? FROM DUAL))  and t0.id <= ? ] bind[Array[1100]={1,2,3,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,...},4]
16:28:51.887 [main] DEBUG io.ebean.SQL - select t0.id c0, t0.status c1, t0.order_date c2, t0.ship_date c3, t1.name c4, t0.cretime c5, t0.updtime c6, t0.kcustomer_id c7 from o_order t0 join o_customer t1 on t1.id = t0.kcustomer_id  where 1=1 and t0.id > ? ; --bind(0)
16:28:51.888 [main] DEBUG io.ebean.SUM - FindMany type[Order] origin[BmouL7.A.A] exeMicros[5370] rows[5] predicates[1=1 and t0.id > ? ] bind[0]
16:28:51.907 [main] DEBUG io.ebean.SQL - select t0.id c0, t0.status c1, t0.name c2, t0.smallnote c3, t0.anniversary c4, t0.cretime c5, t0.updtime c6, t0.version c7, t0.billing_address_id c8, t0.shipping_address_id c9 from o_customer t0 where t0.name in (SELECT * FROM TABLE (SELECT ? FROM DUAL))  and t0.id <= ? ; --bind(Array[1100]={Rob,Fiona,FooBar2,FooBar3,FooBar4,FooBar5,...},4)
16:28:51.908 [main] DEBUG io.ebean.SUM - FindMany type[Customer] origin[RwAkj.A.A] exeMicros[17590] rows[2] predicates[t0.name in (SELECT * FROM TABLE (SELECT ? FROM DUAL))  and t0.id <= ? ] bind[Array[1100]={Rob,Fiona,FooBar2,FooBar3,FooBar4,FooBar5,...},4]

and sqlserver

16:32:33.554 [main] DEBUG io.ebean.SQL - select t0.id c0, t0.status c1, t0.order_date c2, t0.ship_date c3, t1.name c4, t0.cretime c5, t0.updtime c6, t0.kcustomer_id c7 from o_order t0 join o_customer t1 on t1.id = t0.kcustomer_id  where t0.cretime in (SELECT * FROM ?)  and t0.id <= ? ; --bind(Array[2200]={2018-03-02 16:32:33.199,1970-01-01 01:00:01.234,...},4)
16:32:33.554 [main] DEBUG io.ebean.SUM - FindMany type[Order] origin[BmouL7.A.A] exeMicros[29471] rows[1] predicates[t0.cretime in (SELECT * FROM ?)  and t0.id <= ? ] bind[Array[2200]={2018-03-02 16:32:33.199,1970-01-01 01:00:01.234,...},4]
16:32:33.564 [main] DEBUG io.ebean.SQL - select t0.id c0, t0.status c1, t0.order_date c2, t0.ship_date c3, t1.name c4, t0.cretime c5, t0.updtime c6, t0.kcustomer_id c7 from o_order t0 join o_customer t1 on t1.id = t0.kcustomer_id  where t0.id in (SELECT * FROM ?)  and t0.id <= ? ; --bind(Array[2200]={1,2,3,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,...},4)
16:32:33.565 [main] DEBUG io.ebean.SUM - FindMany type[Order] origin[BmouL7.A.A] exeMicros[6361] rows[3] predicates[t0.id in (SELECT * FROM ?)  and t0.id <= ? ] bind[Array[2200]={1,2,3,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,...},4]
16:32:33.570 [main] DEBUG io.ebean.SQL - select t0.id c0, t0.status c1, t0.order_date c2, t0.ship_date c3, t1.name c4, t0.cretime c5, t0.updtime c6, t0.kcustomer_id c7 from o_order t0 join o_customer t1 on t1.id = t0.kcustomer_id  where 1=1 and t0.id > ? ; --bind(0)
16:32:33.571 [main] DEBUG io.ebean.SUM - FindMany type[Order] origin[BmouL7.A.A] exeMicros[5719] rows[5] predicates[1=1 and t0.id > ? ] bind[0]
16:32:33.596 [main] DEBUG io.ebean.SQL - select t0.id c0, t0.status c1, t0.name c2, t0.smallnote c3, t0.anniversary c4, t0.cretime c5, t0.updtime c6, t0.version c7, t0.billing_address_id c8, t0.shipping_address_id c9 from o_customer t0 where t0.name in (SELECT * FROM ?)  and t0.id <= ? ; --bind(Array[2200]={Rob,Fiona,FooBar2,FooBar3,FooBar4,FooBar5,...},4)
16:32:33.597 [main] DEBUG io.ebean.SUM - FindMany type[Customer] origin[RwAkj.A.A] exeMicros[21726] rows[2] predicates[t0.name in (SELECT * FROM ?)  and t0.id <= ? ] bind[Array[2200]={Rob,Fiona,FooBar2,FooBar3,FooBar4,FooBar5,...},4]

import java.sql.Connection;
import java.sql.SQLException;

import oracle.jdbc.OracleConnection;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: this is a text file, as oracle.jdbc.OracleConnection is not public available. So I included the class in binary form:
https://github.com/ebean-orm/ebean/pull/1328/files#diff-1d0fda0b6639c184544f2e59c5ae5036

Do you think this hack is acceptable?

@rbygrave
Copy link
Member

rbygrave commented Mar 9, 2018

Sorry, I actually meant the explain plan for the queries ... that the explain plan shows them hitting the index. We need to check and confirm ...

@rPraml
Copy link
Contributor Author

rPraml commented Mar 9, 2018

Did I get it right, that I should execute "explain select * from xxx where ..." manually and check which indices are hit? Or is there a feature in ebean that dumps the "explain plan" for each query?

@rbygrave
Copy link
Member

Did I get it right, that I should execute "explain select * from xxx where ..." manually and check which indices are hit?

Yes. Specifically we want to compare the 2 explain plans ... to confirm that they are effectively the same. That we still hit the indexes etc and yes we need to do this manually at the moment.

(Yes, there is a desire and plan to automate the collection of explain plans as part of a performance monitoring tool).

# Conflicts:
#	pom.xml
#	src/main/java/io/ebeaninternal/server/deploy/BeanDescriptorManager.java
Copy link
Contributor Author

@rPraml rPraml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rbygrave I analyzed the queries for postgres and came to the conclusion that the internal query optimizer will produce the same query plans (at least for postgres).

select t0.id, t0.status, t0.order_date, t0.ship_date, t1.name, t0.cretime, t0.updtime, 
  t0.kcustomer_id from o_order t0 join o_customer t1 on t1.id = t0.kcustomer_id  
where order_date in (?, ?, <REMOVED~1000> ?, ? )  and t0.id <= ? 

and

select t0.id, t0.status, t0.order_date, t0.ship_date, t1.name, t0.cretime, t0.updtime, 
  t0.kcustomer_id from o_order t0 join o_customer t1 on t1.id = t0.kcustomer_id  
where order_date = any(?) and t0.id <= ? 

results to the SAME query plan

Hash Join  (cost=22.16..624.38 rows=422 width=134)
  Hash Cond: (t0.kcustomer_id = t1.id)
  ->  Bitmap Heap Scan on o_order t0  (cost=7.43..604.34 rows=422 width=36)
        Recheck Cond: (id <= 4)
        Filter: (order_date = ANY ('{2018-03-27,2018-03-28,2018-03-29 ... 31,2018-04-01,2018-04-02}'::date[]))
        ->  Bitmap Index Scan on pk_o_order  (cost=0.00..7.32 rows=423 width=0)
              Index Cond: (id <= 4)
  ->  Hash  (cost=12.10..12.10 rows=210 width=102)
        ->  Seq Scan on o_customer t1  (cost=0.00..12.10 rows=210 width=102)

I also checked sporadically the plans for other DBMS and came to the conclusion, that they are OK.
I added a quick & dirty mechanism to log the plans in ebean. See here:
FOCONIS@172bc44

case SQLSERVER16:
case SQLSERVER17:
case SQLSERVER:
return new SqlServerMultiValueBind();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -73,7 +90,8 @@ protected String getArrayType(int dbType) {
case TIMESTAMP:
case TIME_WITH_TIMEZONE:
case TIMESTAMP_WITH_TIMEZONE:
return "timestamp";
return null; // NO: Does not work reliable due time zone issues! - Fall back to normal query
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestamps don't work reliable due timezone issues. It would require to perform timezone conversion before putting the timestamps into the multi-value datastructure.

@rbygrave
Copy link
Member

Right sorry, I already knew that Postgres ANY was good - I checked that before allowing that in and we have been using it to good effect in Postgres for a while now.

What I really want to do is absolutely confirm that the query plans for Oracle and SQL Server (which are the 2 platforms this change wants to add this support for right) are good. So I need to see actual query plans - we need to be sure.

@rPraml
Copy link
Contributor Author

rPraml commented Mar 28, 2018

So I need to see actual query plans - we need to be sure.

good that you are so persistent ;)

I checked the query plans and plans like where id in (?,?,?) are more efficient as the multi-value one where id in (SELECT * .... ) in oracle / sqlserver

I found an interesting article here: https://www.spiderstrategies.com/blog/2014-11-03-sql-server-query-type-performance.html

there is an intersting summary:

You should definitely stop using batched parameterized queries for selecting rows by ID. They were the bottom performer in every test. They should be replaced with temporary tables if you're willing to do a little work to make sure you're not hitting the create temporary table delay. If, for whatever reason, the temp table approach is not chosen, you should use the constructed query approach within a framework that prevents SQL injection attacks.

but here the query plans for oracle and sqlserver

Oracle normal:

select t0.id c0, t0.name c1 from tuuid_entity t0 where t0.id in (?, ?, ?, ?, ?, ?, ?, ?, ?, ? ) 
 
------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                 |     1 |   139 |     1   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR             |                 |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| TUUID_ENTITY    |     1 |   139 |     1   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | PK_TUUID_ENTITY |     1 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("T0"."ID"=:1 OR "T0"."ID"=:2 OR "T0"."ID"=:3 OR "T0"."ID"=:4 OR 
              "T0"."ID"=:5 OR "T0"."ID"=:6 OR "T0"."ID"=:7 OR "T0"."ID"=:8 OR "T0"."ID"=:9 OR 
              "T0"."ID"=:10)
 
Note
-----
   - dynamic sampling used for this statement (level=2)

Oracle with multi value bind

select t0.id c0, t0.name c1 from tuuid_entity t0 where t0.id in (SELECT * FROM TABLE (SELECT ? FROM DUAL)) 
 
----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |              |     1 | 16524 |    32   (4)| 00:00:01 |
|*  1 |  HASH JOIN SEMI                     |              |     1 | 16524 |    32   (4)| 00:00:01 |
|   2 |   TABLE ACCESS FULL                 | TUUID_ENTITY |    27 |  3753 |     2   (0)| 00:00:01 |
|   3 |   VIEW                              | VW_NSO_1     |  8168 |   127M|    29   (0)| 00:00:01 |
|   4 |    COLLECTION ITERATOR PICKLER FETCH|              |  8168 | 16336 |    29   (0)| 00:00:01 |
|   5 |     FAST DUAL                       |              |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - access("T0"."ID"="COLUMN_VALUE")
 
Note
-----
   - dynamic sampling used for this statement (level=2)

SQL Server normal

grafik

SQL Server Multi value bind

grafik

@rPraml
Copy link
Contributor Author

rPraml commented Mar 29, 2018

What do you think, if we use the multivaluebind for SqlServer & Oracle only for higher parameter count? (currently hard coded > 100)

# Conflicts:
#	src/main/java/io/ebeaninternal/server/core/OrmQueryRequest.java
#	src/main/java/io/ebeaninternal/server/persist/Binder.java
#	src/test/resources/dbmigration/migrationtest/sqlserver17/1.2__dropsFor_1.1.sql
#	src/test/resources/dbmigration/migrationtest/sqlserver17/1.4__dropsFor_1.3.sql
@rPraml
Copy link
Contributor Author

rPraml commented Sep 24, 2018

I updated this PR - I have some new information: Using TVPs is not always optimal
https://stackoverflow.com/questions/23120360/table-valued-parameters-with-estimated-number-of-rows-1
So I think it is a good strategy to switch to TVPs only, if the number of parameters exceed a limit (currently 100 for SqlServer and Oracle)

# Conflicts:
#	src/main/resources/io/ebeaninternal/dbmigration/builtin-extra-ddl.xml
@rPraml
Copy link
Contributor Author

rPraml commented Oct 25, 2018

@rbygrave updated and resolved merge conflicts, maybe you have time to review

btw: where are the travis builds?

# Conflicts:
#	src/main/java/io/ebeaninternal/server/deploy/id/IdBinder.java
#	src/main/java/io/ebeaninternal/server/expression/InExpression.java
# Conflicts:
#	pom.xml
#	src/main/java/io/ebeaninternal/server/expression/InExpression.java
#	src/main/java/io/ebeaninternal/server/persist/platform/PostgresMultiValueBind.java
#	src/main/java/io/ebeaninternal/server/query/CQueryBindCapture.java
#	src/test/java/org/tests/model/basic/xtra/TestInsertBatchThenUpdate.java
@rbygrave
Copy link
Member

it is a good strategy to switch to TVPs only, if the number of parameters exceed a limit (currently 100 for SqlServer and Oracle)

In OLTP applications are we going to be binding more than 100 Ids frequently ? I don't think so and what that suggests (given the TVP's have worse query plans for sql server and oracle) is that we maybe should not use them at all for sql server and oracle.

Do you still want to push for this?

@rbygrave
Copy link
Member

I think we are going to let this change go. Multi-value binding with Oracle and SQL Server has worse query execution plans and for me I don't think it isn't worth having this for > 100 bind values.

So unless you are going to push hard for this we should close this PR. So only use MVB with Postgres ANY (until such time other DB's make the query plans as good as IN).

@rPraml
Copy link
Contributor Author

rPraml commented Aug 30, 2019

Hello Rob, sorry for the late response, I was in vacation the last weeks. I am back in office at monday and try to discuss the further plans with my team.
Background: We have a reporting/filter in our application, where the user can select/deselect certain values (like Autofilter in excel)
The values that the end user can select are often out of our control and the resulting query may hit the parameter limit of 2100 in Sqlserver.

@rbygrave
Copy link
Member

Closing.

@rbygrave rbygrave closed this Sep 12, 2019
@rPraml rPraml deleted the multivaluebind branch August 3, 2022 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants