Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Charset throughout #26

Merged
merged 1 commit into from
Nov 29, 2016
Merged

Use Charset throughout #26

merged 1 commit into from
Nov 29, 2016

Conversation

marschall
Copy link
Contributor

The driver spends a lot of time converting from Charset to String and
from String to Charset. Every time checked exceptions have to be
caught. In addition the Charset lookup code is quite involved. It would
be simpler to use Charset everywhere.

@@ -587,9 +590,9 @@ final Encoding checkSupported() throws UnsupportedEncodingException
// This works for all of the code pages above in SE 5 and later.
try
{
" ".getBytes(charsetName);
charset = Charset.forName(charsetName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of assigning the charset object to charset variable, is it possible to do the assignment outside the checkSupported() method? e.g. we could do the assignment after checkSupported() is called in charset()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do this.

The driver spends a lot of time converting from Charset to String and
from String to Charset. Every time checked exceptions have to be
caught. In addition the Charset lookup code is quite involved. It would
be simpler to use Charset everywhere.
checkSupported();
if (charset == null)
{
charset = Charset.forName(charsetName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marschall thank you very much for your great idea and fix. I just have one more question and hope you could consider. Is it possible to delete the local variable 'private Charset charset'? I am not sure why we need to check if the variable is null here. instead of assigning it to charset and returning charset., could we just do "return Charset.forName(charsetName);" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I considered this. It would be possible but:
'private Charset charset' ist not a local variable but an instance/member variable. Encoding#charset() is called quite frequently and by many users both directly and indirectly. If you look at the implementation of Charset#forName it is quite involved and the worst case is quite bad, eg. some of the charsets are in the Extended Encoding Set. I would be more comfortable with looking them up just once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marschall thank you for your feedback.
I totally agree to what you said, we definitely need to look up once.

However, what I meant was something like this.:

        try
    	{
    		checkSupported();
    	} 
    	catch (UnsupportedEncodingException e) 
              ...
    	return Charset.forName(charsetName);

the look up is already called within checkSupported (by Charset.isSupported())
I am not sure why we need the member variable, or why we need to check if the variable is null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the member variable in order to cache the look up. Otherwise every time somebody calls Encoding#charset() directly or indirectly a Charset lookup is performed. Examples include:

  • TypeInfo.Builder.Strategy#apply(TypeInfo, TDSReader)
  • Util#readUnicodeString(byte[], int, int, SQLServerConnection)
  • SQLServerSQLXML#getString()
  • SQLServerSQLXML#getValue()
  • TDSReader#readGUID(int, JDBCType, StreamType)
  • SQLServerBulkCopy#writeColumnToTdsWriter(TDSWriter, int, int, int, boolean, int, int, boolean, Object)

Without the member variable every time one of this methods is called directly or indirectly a lookup would be performed anew. The null check avoids the look up the second and following times the method is called.
It is for the same reason that jvmSupportConfirmed is cached in a member variable instead of computed every time the method is called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marschall thanks for the very detailed explanation, now I understand why you have introduced a new member variable here. I wanted the code to be consistent with the current implementation, but your cache implementation is better than the current. I am going to accept the PR once the testing is done. thanks again.

@ajlam ajlam added the Under Review Used for pull requests under review label Nov 28, 2016
@xiangyushawn xiangyushawn merged commit 66bb7ae into microsoft:dev Nov 29, 2016
@xiangyushawn
Copy link
Contributor

@marschall thank you very much

@lilgreenbird lilgreenbird added this to Closed/Merged PRs in MSSQL JDBC Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Under Review Used for pull requests under review
Projects
MSSQL JDBC
  
Closed/Merged PRs
Development

Successfully merging this pull request may close these issues.

None yet

3 participants