Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Bulk Copy API for batch insert operation #686

Merged
merged 43 commits into from
Jun 27, 2018

Conversation

peterbae
Copy link
Contributor

Improves the batch insert operation performace against Azure DW.

Fixes issue #331.

@codecov-io
Copy link

codecov-io commented Apr 26, 2018

Codecov Report

Merging #686 into dev will increase coverage by 0.35%.
The diff coverage is 59.33%.

Impacted file tree graph

@@             Coverage Diff              @@
##                dev     #686      +/-   ##
============================================
+ Coverage     48.03%   48.39%   +0.35%     
- Complexity     2631     2741     +110     
============================================
  Files           118      120       +2     
  Lines         26753    27210     +457     
  Branches       4493     4589      +96     
============================================
+ Hits          12852    13167     +315     
- Misses        11773    11873     +100     
- Partials       2128     2170      +42
Flag Coverage Δ Complexity Δ
#JDBC42 47.89% <59.33%> (+0.43%) 2693 <85> (+111) ⬆️
#JDBC43 48.28% <59.33%> (+0.31%) 2736 <85> (+109) ⬆️
Impacted Files Coverage Δ Complexity Δ
...om/microsoft/sqlserver/jdbc/SQLServerResource.java 100% <ø> (ø) 4 <0> (ø) ⬇️
.../microsoft/sqlserver/jdbc/SQLServerDataSource.java 44.73% <0%> (-0.47%) 66 <0> (ø)
.../com/microsoft/sqlserver/jdbc/SQLServerDriver.java 77.11% <100%> (+0.12%) 25 <0> (ø) ⬇️
...n/java/com/microsoft/sqlserver/jdbc/Parameter.java 61.92% <100%> (-0.76%) 64 <1> (+1)
...om/microsoft/sqlserver/jdbc/SQLServerBulkCopy.java 54.61% <100%> (+2.06%) 261 <2> (+22) ⬆️
...oft/sqlserver/jdbc/SQLServerBulkCSVFileRecord.java 46.34% <25%> (+0.41%) 28 <4> (-2) ⬇️
...sqlserver/jdbc/SQLServerBulkBatchInsertRecord.java 42.28% <42.28%> (ø) 27 <27> (?)
.../microsoft/sqlserver/jdbc/SQLServerBulkCommon.java 47.22% <47.22%> (ø) 5 <5> (?)
...m/microsoft/sqlserver/jdbc/SQLServerStatement.java 59.42% <70%> (-0.08%) 136 <3> (+1)
...oft/sqlserver/jdbc/SQLServerPreparedStatement.java 54.42% <72.26%> (+3.81%) 210 <39> (+50) ⬆️
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 09d7967...096d78e. Read the comment docs.

@cheenamalhotra cheenamalhotra added this to Under Peer Review in MSSQL JDBC May 1, 2018
case java.sql.Types.TIMESTAMP:
case microsoft.sql.Types.DATETIMEOFFSET:
// The precision is just a number long enough to hold all types of temporal data, doesn't need to be exact precision.
columnMetadata.put(positionInTable, new ColumnMetadata(colName, jdbcType, 50, scale, dateTimeFormatter));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the "50", which I'm assuming is the precision, a const so changing it is easier when necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 217 explains why this is unnecessary.

Copy link
Member

@rene-ye rene-ye May 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporal data being < 50 precision might not be true forever. If it's an arbitrarily assigned number than it can be just as arbitrarily changed. And also because it's arbitrary, we might not know what number to look for if we do have a need to update it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precision here should be the one passed by user. I'm assuming 50 is used as precision based on the BulkCopy CSV implementation, it makes sense in case of CSV, because temporal types can be stored in CSV using any of the supported string literal format (https://docs.microsoft.com/en-us/sql/t-sql/data-types/datetime-transact-sql?view=sql-server-2017#supported-string-literal-formats-for-datetime) and we pass it as varchar/nvarchar to Server.


boolean isAzureDW() throws SQLServerException, SQLException {
if (null == isAzureDW) {
try (Statement stmt = this.createStatement(); ResultSet rs = stmt.executeQuery("SELECT CAST(SERVERPROPERTY('EngineEdition') as INT)");)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't seem to catch/handle any exceptions from this try statement, is there a reason why we're using it here.

Copy link
Contributor Author

@peterbae peterbae May 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally, when we create a new Statement or ResultSet object, it's good practice to close it after we're done. However, the try-with-resources syntax (which i've used here) allows those objects to get closed automatically. You can read more on this page: https://blogs.oracle.com/weblogicserver/using-try-with-resources-with-jdbc-objects

// Base data type: int
final int ENGINE_EDITION_FOR_SQL_AZURE_DW = 6;
rs.next();
int engineEdition = rs.getInt(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change block to
isAzureDW = rs.getInt(1) == ENGINE_EDITION_FOR_SQL_AZURE_DW
or even more concise but possibly a little confusing
return (rs.getInt(1) == ENGINE_EDITION_FOR_SQL_AZURE_DW)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think right now is fine, i merely adopted the same part of code from our fx framework anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should at least remove
if (boolean condition) {
bool = true
} else {
bool = false
}


String destinationTableName = tableName;
// Get destination metadata
try (SQLServerResultSet rs = ((SQLServerStatement) connection.createStatement())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try with resources, but we don't handle any exceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment above


// ignore all comments
if (localUserSQL.substring(0, 2).equalsIgnoreCase("/*")) {
int temp = localUserSQL.indexOf("*/") + 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is done quite a lot. Can we maybe make a private function that takes in a string and returns what we need. So the final result for all these if blocks would be something like
if(localUserSQL.substring(0, 2).equalsIgnoreCase("/*")) { return newPrivateFunction("*/"); }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

return isInsert(temp.substring(index));
}
char c = temp.charAt(0);
if (c != 'i' && c != 'I')
Copy link
Member

@rene-ye rene-ye May 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were burned before using this kind of logic in an attempt to parse SQL (in pstmt metadata caching), is this not breakable? What if I put a use statement before my insert, will it not flag my SQL as an insert statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user decides to put a use statement, it won't be flagged as an insert statement, and the old implementation for handling batch statements will get executed instead. This isInsert statement is currently only being used for queries that come through as an execute batch statement - we don't expect the user to run thousands of the same batch query with a use statement in front of it.

Copy link
Contributor

@ulvii ulvii Jun 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the following lines here to eliminate non-insert queries faster? Ideally, the SQL passed to this method is always an insert, so these lines can be removed?

        char c = temp.charAt(0);
        if (c != 'i' && c != 'I')
            return false;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insert isn't the only way to use batch - there's many ways we would be calling isInsert on a non-insert query. I don't really think this fast elimination is worth the space, so i'm going to remove it anyways though.

Copy link
Member

@v-mabarw v-mabarw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only gave this a quick skim through and have a couple of extra thoughts:

  • have we tested the performance impact on the normal execution path?
  • is this scenario also applicable to SQL Server Parallel Data Warehouse (PDW)?


boolean isAzureDW() throws SQLServerException, SQLException {
if (null == isAzureDW) {
try (Statement stmt = this.createStatement(); ResultSet rs = stmt.executeQuery("SELECT CAST(SERVERPROPERTY('EngineEdition') as INT)");)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to avoid an extra round trip to the server to get this info? For example, can we get any useful info from the TDS pre-login or login packets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can avoid making a call to get the table metadata, but as it turns out, we were making an exact same FMTONLY call in SQLServerBulkCopy::getDestinationMetadata method (which is called as part of the bulk copy process shortly down the line). So instead of making the FMTONLY call to the database twice in the same fashion, the driver will now store the resultset from the first FMTONLY so it can be re-used in getDestinationMetadata. This means we won't be making any additional trips to the database compared to the way it was working before.

}

// It shouldn't come here. If we did, something is wrong.
throw new IllegalArgumentException("localUserSQL");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you pick throwing an IllegalArgumentException (for this & the other parse routines)? It's probably better to have a descriptive error message instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to throw an IllegalArgumentException because I thought it best described the symptom. It would be easy to throw an exception detailing at which index the parsing failed if the parsing actually failed at some point, but for these instances where the parsing fails due to getting to the end of the parsing method (which it shouldn't - this means the user's SQL query was incorrect somehow), all we know is that the user's argument was incorrect. I do agree that I could give a better description, though - I'll add that in.

@@ -167,6 +230,61 @@ public void testExecuteBatch1() {
fail(TestResource.getResource("R_executeBatchFailed") + ": " + e.getMessage());
}
}

public void testExecuteBatch1UseBulkCopyAPI() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies to all the tests, don't you need to set useBulkCopyForBatchInsert property to true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

*/
@Test
@Tag("slow")
public void testStatementPoolingUseBulkCopyAPI() throws SQLException, NoSuchFieldException, SecurityException,
Copy link
Contributor

@ulvii ulvii Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the new tests are just duplicates of the existing tests with minor changes. Can't we refactor this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -167,6 +230,61 @@ public void testExecuteBatch1() {
fail(TestResource.getResource("R_executeBatchFailed") + ": " + e.getMessage());
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is almost the exact copy of testExecuteBatch1(). Please refactor.

int retValue[] = {0, 0, 0};
int updCountLength = 0;
try {
String sPrepStmt = "update ctstable2 set PRICE=PRICE*20 where TYPE_ID=?";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sPrepStmt is never closed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's closed.

retValue[i++] = rs.getInt(1);
}

pstmt1.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better move this into a try-with-resources block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's handled at the end of the test anyhoo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The connection itself is available in the class, they're all static objects (don't know why), but why create a new one anywhere else in the class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the modifyConnectionForBulkCopyAPI method modifies the connection object. If we don't create a new connection object (with the try block), the tests that come after this one will be affected by the change I made to the connection (since the only other connection object available is static). Therefore, to prevent that happening, I need to make a new connection object just to test my PR.

Also, they connection object is static because originally the test could just re-use the same connection object that doesn't need to change, but now we need to change our connection object accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's handled at the end of the test anyhoo.

  • Not if executeQuery fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it'll be closed in the terminateVariation() (this is what I meant by end of the test)

}
}
catch (BatchUpdateException b) {
fail("BatchUpdateException : Call to executeBatch is Failed!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call to executeBatch Failed!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same below

fail("BatchUpdateException : Call to executeBatch is Failed!");
}
catch (SQLException sqle) {
fail("Call to executeBatch is Failed!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These catch blocks make no sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that they could handle exception better here, but this isn't my code. I'm just gonna fix it anyways though.

@@ -111,6 +115,65 @@ public void testAddBatch1() {
fail(TestResource.getResource("R_addBatchFailed") + ": " + e.getMessage());
}
}

Copy link
Contributor

@ulvii ulvii Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments for testExecuteBatch1UseBulkCopyAPI() apply to this test too.

*
* @throws Exception
*/
@Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate of Repro47239large. Please refactor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@DisplayName("Regression test for using 'large' methods")
public void Repro47239largeUseBulkCopyAPI() throws Exception {

assumeTrue("JDBC42".equals(Utils.getConfiguredProperty("JDBC_Version")), TestResource.getResource("R_incompatJDBC"));
Copy link
Contributor

@ulvii ulvii Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed. assumeTrue("JDBC42".equals(Utils.getConfiguredProperty("JDBC_Version")), TestResource.getResource("R_incompatJDBC"));

error = "RAISERROR ('raiserror level 11',11,1) WITH LOG";
severe = "RAISERROR ('raiserror level 20',20,1) WITH LOG";
}
con.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use try-with-resources

}
catch (Exception ignored) {
}
stmt.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try-with-resources.

}

try {
stmt.executeLargeUpdate("drop table " + tableName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Utils.dropTableIfExists()

@@ -267,7 +269,227 @@ public void Repro47239() throws SQLException {
stmt.close();
conn.close();
}

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very similar to Repro47239largeUseBulkCopyAPI(). Please apply the comments to this test too.

@@ -313,7 +317,116 @@ public void batchWithLargeStringTest() throws SQLException {
}

}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate of batchWithLargeStringTest(). Please refactor.


@Test
public void batchWithLargeStringTestUseBulkCopyAPI() throws SQLException {
Connection con = DriverManager.getConnection(connectionString + ";useBulkCopyForBatchInsert=true;");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connection is never closed.

// create a table with two columns
boolean createPrimaryKey = false;
try {
stmt.execute("if object_id('" + testTable + "', 'U') is not null\ndrop table " + testTable + ";");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Utils.dropTableIfExists()

}

try {
stmt.executeUpdate("drop table " + tableName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the method from Utils class.

@BeforeEach
public void testSetup() throws TestAbortedException, Exception {
connection = DriverManager.getConnection(connectionString + ";useBulkCopyForBatchInsert=true;");
stmt = (SQLServerStatement) connection.createStatement();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better move this into a try-with-resources block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the stmt gets cleaned up during the cleanup stage - i think that's okay?

// SERVERPROPERTY('EngineEdition') can be used to determine whether the db server is SQL Azure.
// It should return 6 for SQL Azure DW. This is more reliable than @@version or serverproperty('edition').
// Reference: http://msdn.microsoft.com/en-us/library/ee336261.aspx
//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This topic is no longer available
We're sorry—the topic that you requested is no longer available. Use the search box to find related information.

@peterbae peterbae merged commit 6ddd9d0 into microsoft:dev Jun 27, 2018
MSSQL JDBC automation moved this from Under Peer Review to Closed/Merged PRs Jun 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
MSSQL JDBC
  
Closed/Merged PRs
Development

Successfully merging this pull request may close these issues.

None yet

9 participants