Change Sha1HashKey to CityHash128Key for generating PreparedStatement handle and metadata cache keys #717

cheenamalhotra · 2018-06-06T17:05:50Z

Fixes issue #716

codecov-io · 2018-06-06T17:26:38Z

Codecov Report

Merging #717 into dev will increase coverage by 0.09%.
The diff coverage is 69.77%.

@@             Coverage Diff              @@
##                dev     #717      +/-   ##
============================================
+ Coverage     48.52%   48.61%   +0.09%     
- Complexity     2747     2775      +28     
============================================
  Files           115      116       +1     
  Lines         27156    27342     +186     
  Branches       4547     4562      +15     
============================================
+ Hits          13177    13293     +116     
- Misses        11809    11879      +70     
  Partials       2170     2170

Flag	Coverage Δ	Complexity Δ
#JDBC42	`48.13% <66.66%> (+0.13%)`	`2732 <34> (+28)`	⬆️
#JDBC43	`48.59% <69.77%> (+0.3%)`	`2775 <35> (+38)`	⬆️

Impacted Files	Coverage Δ	Complexity Δ
...oft/sqlserver/jdbc/SQLServerParameterMetaData.java	`24.14% <ø> (ø)`	`31 <0> (ø)`	⬇️
...oft/sqlserver/jdbc/SQLServerPreparedStatement.java	`54.64% <100%> (+0.03%)`	`211 <0> (ø)`	⬇️
...m/microsoft/sqlserver/jdbc/SQLServerStatement.java	`60.1% <100%> (ø)`	`136 <0> (ø)`	⬇️
...om/microsoft/sqlserver/jdbc/ParsedSQLMetadata.java	`100% <100%> (ø)`	`0 <0> (ø)`	⬇️
...in/java/com/microsoft/sqlserver/jdbc/CityHash.java	`66.66% <66.66%> (ø)`	`26 <26> (?)`
...c/main/java/com/microsoft/sqlserver/jdbc/Util.java	`62% <75%> (+0.66%)`	`89 <0> (+1)`	⬆️
.../microsoft/sqlserver/jdbc/SQLServerConnection.java	`48.6% <78.12%> (-0.1%)`	`333 <9> (ø)`
...om/microsoft/sqlserver/jdbc/SQLServerBulkCopy.java	`53.29% <0%> (-0.25%)`	`262% <0%> (-1%)`
...in/java/com/microsoft/sqlserver/jdbc/IOBuffer.java	`53.7% <0%> (-0.2%)`	`0% <0%> (ø)`
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 49f95c9...b442832. Read the comment docs.

ulvii · 2018-06-13T23:18:45Z

src/main/java/com/microsoft/sqlserver/jdbc/Util.java

@@ -1096,3 +1096,364 @@ else if (databaseName.length() > 0)
        return fullName.toString();
    }
 }
+


Maybe a separate file for this?

brettwooldridge · 2018-06-14T19:13:02Z

src/main/java/com/microsoft/sqlserver/jdbc/SQLServerConnection.java

-        Sha1HashKey(String s) {
-            bytes = getSha1Digest().digest(s.getBytes());
+        CityHash128Key(String s) {
+            segments = CityHash.cityHash128(s.getBytes(), 0, s.length());


Performance of this can be improved, especially in the case of large SQL strings. s.getBytes() ends up calling StringCoding.encode(...), which converts the internal String chars to bytes using the platform default encoding.

I recommend using s.getBytes(int srcBegin, int srcEnd, byte dst[], int dstBegin), which, while deprecated (but will never be removed), has substantially higher performance.

CityHash128Key(String s) { byte[] bytes = byte[s.length()]; s.getBytes(0, s.length(), bytes, 0); segments = CityHash.cityHash128(bytes, 0, s.length()); }

s.getBytes(int srcBegin, int srcEnd, byte dst[], int dstBegin) truncates the upper 8 bits of Java's internal Unicode-16 representation. Since this is SQL before inserting parameter values, it should be mostly 7-bit ASCII -- the only exception I can think of would be database names/table names/column names, at least some of which can be full Unicode. For European or other alphabetic languages, their Unicode-16 spaces tend to be 256 characters wide and the odds of this truncation causing a spurious collision seem very small -- but I'd be less confident if the language in question was, say, Chinese, that you couldn't ever have two SQLs that differed only by one ideogram in a table/column name, and the two ideograms in question unfortunately happened to have the same lower 8 bits. It's not very likely, but it's significantly less astronomically unlikely than a 128-bit hash collision.

So yes, we need to avoid character reencoding, but I think we also need to avoid throwing away hash entropy while we do so. I was wondering about something like getBytes(Charset.UTF_16BE) -- if it matches the internal representation used by String, will it skip reencoding? So maybe something like:

byte[] bytes = s.getBytes(Charset.UTF_16BE); // avoid character reencoding segments = CityHash.cityHash128(bytes, 0, bytes.length); // bytes.length = 2*s.length() + 2, due to Byte Order Mark?

@RDearnaley Looking at the code of StringCoding.encode() and the various encoder subclasses, I don't find any optimization for UTF_16BE. 😞

@RDearnaley It appears only option would be a custom Charset and CharsetEncoder. It is possible, as there is a CharsetProvider SPI available. The "raw encoder" would simply "encode" chars as pairs of bytes (upper 8-bits, lower 8-bits). The cost is a 2x sized byte array due to the naive approach, but possibly worth it as the byte array is quickly garbage collected, or could potentially be cached as a WeakReference in a ThreadLocal inside of the CityHash128Key class.

How strange -- yes, taking a look in the Java code for them, it appears that none of UTF_16BE, UTF_16 etc make use of the obvious optimization that one of them has to be a no-op. I'm wondering if someone might have already created a raw double byte encoding to speed this up -- it seems a fairly obvious performance optimization for cases like this where you care about the entropy content rather than the specific values. If not, it might make a good addition to the Java version of CityHash

There is another possible approach whose speed would be worth testing:

private static byte[] toBytes( final String str ) { final int len = str.length(); final char[] chars = str.getChars(); final byte[] buff = new byte[2 * len]; for (int i = 0, j = 0; i < len; i++) { buff[j++] = (byte) chars[i]; // Or str.charAt(i) and skipping str.getChars() might be faster? buff[j++] = (byte) (chars[i] >>> 8); // Or >> might be faster? } return buff; }

or just inline this since we only use it once. Old-school, I know.

Of course it would work, but getChars() is going to create an additional copy, incurring a CPU hit as well as increasing GC pressure. Regarding inlining, probably unnecessary as the JIT will inline it when it gets hot.

I think a custom encoder would offer better performance, as the internal char array is passed without copy.

Would the str.charAt(i) approach also be making a copy, or would that just go on the stack/in a register?

str.charAt(i) is just a simple array access, with the value passed on the stack. If the compiler inlines the call, it could be a viable option. You'd have to find it in the output of -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:+PrintCompilation.

brettwooldridge · 2018-06-14T19:24:57Z

src/main/java/com/microsoft/sqlserver/jdbc/SQLServerConnection.java


-        Sha1HashKey(String sql,
+        CityHash128Key(String sql,
                String parametersDefinition) {
            this(String.format("%s%s", sql, parametersDefinition));


String.format() is a terrible way to concatenate two Strings. If I wrote this in my original commit, I must have been high. Just change to this(sql + parametersDefinition).

Having taken another look, yes, this showed up in my YourKit runs -- in fact I saw it roughly as much as the character encoding issue we've been discussing above.

1) Further speedups to prepared statement hashing 2) Caching of '?' chararacter positiobs in prepared statements to speed parameter substitution

RDearnaley · 2018-06-26T20:39:39Z

@cheenamalhotra I have sent you a pull request cheenamalhotra#11 with many of the additional speedups we've discussed above

Prepared statement performance fixes

Pull latest changes from Microsoft:dev branch

# Conflicts: # src/main/java/com/microsoft/sqlserver/jdbc/SQLServerConnection.java # src/main/java/com/microsoft/sqlserver/jdbc/SQLServerPreparedStatement.java # src/main/java/com/microsoft/sqlserver/jdbc/Util.java

added missing line for bulkcopy tests.

…by Rene) Add String comparison with CityHashKey and fix test failures

rene-ye · 2018-07-05T23:00:08Z

src/main/java/com/microsoft/sqlserver/jdbc/SQLServerConnection.java


-        ParsedSQLCacheItem cacheItem = new ParsedSQLCacheItem(parsedSql, paramCount, procName, returnValueSyntax);
+        ParsedSQLCacheItem  cacheItem = new ParsedSQLCacheItem (parsedSql, parameterPositions, procName, returnValueSyntax);


Seems to have an extra space? Might need to apply formatter. Same line 267.

rene-ye · 2018-07-05T23:07:56Z

src/main/java/com/microsoft/sqlserver/jdbc/SQLServerConnection.java

+      *          SQL text to parse for positions of parameters to intialize.
+      */
+    private static int[] locateParams(String sql) {
+        List<Integer> parameterPositions = new ArrayList<Integer>(0);


Template type inferred from LHS, not necessary to declare it a second time.

Proposed change:

LinkedList<Integer> parameterPositions = new LinkedList<>();

List<Integer> parameterPositions = new LinkedList<>();

would be better imo

rene-ye · 2018-07-05T23:10:03Z

src/main/java/com/microsoft/sqlserver/jdbc/SQLServerConnection.java

+        int i = 0;
+        for (Integer parameterPosition : parameterPositions) {
+            result[i++] = parameterPosition;
+        }


Use a LinkedList to make adding/iterating faster OR use a parallel stream with ArrayList.

Edit: The stream option isn't very viable, and since order matters, wouldn't offer much performance gain. LinkedList is probably the way to go.

Edit2:

int[] result = parameterPosition.stream().mapToInt(Integer::valueOf).toArray()

is a good streaming alternative

rene-ye · 2018-07-05T23:12:09Z

src/main/java/com/microsoft/sqlserver/jdbc/SQLServerConnection.java

+            }
+            else {
+                srcEnd = paramPositions[paramIndex];
+            }


5388 - 5393 can be replaced with

srcEnd = (paramIndex >= paramPositions.length) ? sqlSrc.length() : paramPositions[paramIndex];

RDearnaley

I need to test this and seem how the performance is with the (IMO unecessary for many customers, including us) new string comparison added is. If necessary we might ship with a version patched to remove the string comparison. But this is definitely an improvement, so I'm marking it 'Approve'.

cleaner code & logic

peterbae · 2018-07-09T18:40:58Z

src/main/java/com/microsoft/sqlserver/jdbc/SQLServerConnection.java

+        CityHash128Key(String s) {
+            unhashedString = s;
+            byte[] bytes = new byte[s.length()];
+            s.getBytes(0, s.length(), bytes, 0);


getBytes is a deprecated method - suppress warning or replace?

Suppress the warning -- there is an excellent performance reason why we're using it. And as Brett has pointed out, it's never going away, because it's sometimes the right thing to do.

Fix for review comments

* Update Snapshot for upcoming RTW release v7.0.0 * Change order of logic for checking the condition for using Bulk Copy API (#736) Fix | Change order of logic for checking the condition for using Bulk Copy API (#736) * Update CHANGELOG.md * Merge pull request #732 from cheenamalhotra/module (Export driver in automatic module) Introduce Automatic Module Name in POM. * Update CHANGELOG.md * Change Sha1HashKey to CityHash128Key for generating PreparedStatement handle and metadata cache keys (#717) * Change Sha1HashKey to CityHash128Key * Formatted code * Prepared statement performance fixes 1) Further speedups to prepared statement hashing 2) Caching of '?' chararacter positiobs in prepared statements to speed parameter substitution * String compare for hash keys added missing line for bulkcopy tests. * comment change * Move CityHash class to a separate file * spacings fixes cleaner code & logic * Add | Adding useBulkCopyForBatchInsert property to Request Boundary methods (#739) * Apply the collation name change to UTF8SupportTest * Package changes for CityHash with license information (#740) * Reformatted Code + Updated formatter (#742) * Reformatted Code + Updated formatter * Fix policheck issue with 'Country' keyword (#745) * Adding a new test for beginRequest()/endRequest() (#746) * Add | Adding a new test to notify the developers to consider beginRequest()/endRequest() when adding a new Connection API * Fix | Fixes for issues reported by static analysis tools (SonarQube + Fortify) (#747) * handle buffer reading for invalid buffer input by user * Revert "handle buffer reading" This reverts commit 11e2bf4. * updated javadocs (#754) * fixed some typos in javadocs (#760) * API and JavaDoc changes for Spatial Datatypes (#752) Add | API and JavaDoc changes for Spatial Datatypes (#752) * Disallow non-parameterized queries for Bulk Copy API for batch insert (#756) fix | Disallow non-parameterized queries for Bulk Copy API for batch insert (#756) * Formatting | Change scope of unwanted Public APIs + Code Format (#757) * Fix unwanted Public APIs. * Updated formatter to add new line to EOF + formatted project * Release | Release 7.0 changelog and version update (#748) * Updated Changelog + release version changes * Changelog entry updated as per review. * Updated changelog for new changes * Terminology update: request boundary declaration APIs * Trigger Appveyor * Update Samples and add new samples for new features (#761) * Update Samples and add new Samples for new features * Update samples from Peter * Updated JavaDocs * Switch to block comment * Update License copyright (#767)

Change Sha1HashKey to CityHash128Key

ad1fd3e

cheenamalhotra mentioned this pull request Jun 6, 2018

Performance problems with statement caching due to SHA1 hash #716

Closed

Formatted code

cc30eff

ulvii reviewed Jun 13, 2018

View reviewed changes

brettwooldridge reviewed Jun 14, 2018

View reviewed changes

cheenamalhotra added this to In progress in MSSQL JDBC Jun 26, 2018

Prepared statement performance fixes

24cf6c5

1) Further speedups to prepared statement hashing 2) Caching of '?' chararacter positiobs in prepared statements to speed parameter substitution

cheenamalhotra and others added 7 commits June 30, 2018 17:02

Prepared statement performance fixes from RDearnaley

85ef29c

Prepared statement performance fixes

Merge pull request #12 from Microsoft/dev

16649a3

Pull latest changes from Microsoft:dev branch

Merge branch 'dev' into hashKeyChanges

b4759e3

# Conflicts: # src/main/java/com/microsoft/sqlserver/jdbc/SQLServerConnection.java # src/main/java/com/microsoft/sqlserver/jdbc/SQLServerPreparedStatement.java # src/main/java/com/microsoft/sqlserver/jdbc/Util.java

String compare for hash keys

ee347d8

added missing line for bulkcopy tests.

comment change

d20e5d5

Add String comparison with CityHashKey and fix test failures (PR #13 …

8f8067e

…by Rene) Add String comparison with CityHashKey and fix test failures

Move CityHash class to a separate file

27da9e2

cheenamalhotra changed the title ~~Change Sha1HashKey to CityHash128Key for Prepared Statement Caching~~ Change Sha1HashKey to CityHash128Key for generating PreparedStatement handle and metadata cache keys Jul 4, 2018

cheenamalhotra added this to the 7.0.0 milestone Jul 4, 2018

rene-ye reviewed Jul 5, 2018

View reviewed changes

RDearnaley previously approved these changes Jul 7, 2018

View reviewed changes

spacings fixes

d349ff5

cleaner code & logic

peterbae reviewed Jul 9, 2018

View reviewed changes

Merge pull request #14 from rene-ye/hashKeyChanges2

b442832

Fix for review comments

cheenamalhotra dismissed RDearnaley’s stale review via b442832 July 9, 2018 18:55

peterbae approved these changes Jul 9, 2018

View reviewed changes

rene-ye approved these changes Jul 9, 2018

View reviewed changes

ulvii approved these changes Jul 9, 2018

View reviewed changes

cheenamalhotra merged commit e2cf217 into microsoft:dev Jul 9, 2018

MSSQL JDBC automation moved this from In progress to Closed/Merged PRs Jul 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change Sha1HashKey to CityHash128Key for generating PreparedStatement handle and metadata cache keys #717

Change Sha1HashKey to CityHash128Key for generating PreparedStatement handle and metadata cache keys #717

cheenamalhotra commented Jun 6, 2018

codecov-io commented Jun 6, 2018 •

edited

Loading

ulvii Jun 13, 2018

brettwooldridge Jun 14, 2018

RDearnaley Jun 14, 2018 •

edited

Loading

RDearnaley Jun 14, 2018 •

edited

Loading

brettwooldridge Jun 14, 2018

brettwooldridge Jun 14, 2018

RDearnaley Jun 14, 2018

RDearnaley Jun 14, 2018 •

edited

Loading

brettwooldridge Jun 15, 2018

RDearnaley Jun 15, 2018 •

edited

Loading

brettwooldridge Jun 16, 2018

brettwooldridge Jun 14, 2018

RDearnaley Jun 15, 2018 •

edited

Loading

RDearnaley commented Jun 26, 2018

rene-ye Jul 5, 2018

rene-ye Jul 5, 2018 •

edited

Loading

peterbae Jul 9, 2018 •

edited

Loading

rene-ye Jul 5, 2018 •

edited

Loading

rene-ye Jul 5, 2018 •

edited

Loading

RDearnaley left a comment

peterbae Jul 9, 2018

RDearnaley Jul 9, 2018


		ParsedSQLCacheItem cacheItem = new ParsedSQLCacheItem(parsedSql, paramCount, procName, returnValueSyntax);
		ParsedSQLCacheItem cacheItem = new ParsedSQLCacheItem (parsedSql, parameterPositions, procName, returnValueSyntax);

Change Sha1HashKey to CityHash128Key for generating PreparedStatement handle and metadata cache keys #717

Change Sha1HashKey to CityHash128Key for generating PreparedStatement handle and metadata cache keys #717

Conversation

cheenamalhotra commented Jun 6, 2018

codecov-io commented Jun 6, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RDearnaley Jun 14, 2018 • edited Loading

Choose a reason for hiding this comment

RDearnaley Jun 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RDearnaley Jun 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RDearnaley Jun 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RDearnaley Jun 15, 2018 • edited Loading

Choose a reason for hiding this comment

RDearnaley commented Jun 26, 2018

Choose a reason for hiding this comment

rene-ye Jul 5, 2018 • edited Loading

Choose a reason for hiding this comment

peterbae Jul 9, 2018 • edited Loading

Choose a reason for hiding this comment

rene-ye Jul 5, 2018 • edited Loading

Choose a reason for hiding this comment

rene-ye Jul 5, 2018 • edited Loading

Choose a reason for hiding this comment

RDearnaley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Jun 6, 2018 •

edited

Loading

RDearnaley Jun 14, 2018 •

edited

Loading

RDearnaley Jun 14, 2018 •

edited

Loading

RDearnaley Jun 14, 2018 •

edited

Loading

RDearnaley Jun 15, 2018 •

edited

Loading

RDearnaley Jun 15, 2018 •

edited

Loading

rene-ye Jul 5, 2018 •

edited

Loading

peterbae Jul 9, 2018 •

edited

Loading

rene-ye Jul 5, 2018 •

edited

Loading

rene-ye Jul 5, 2018 •

edited

Loading