-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use VARCHAR( n char) instead of VARCHAR(n) so semantics are used and not... #222
Conversation
…not number of bytes. This is better when using multibyte charsets like UTF8
ping? |
Why don't you comment on this or pick it up? |
JOB_NAME VARCHAR2(100) NOT NULL, | ||
JOB_KEY VARCHAR2(32) NOT NULL, | ||
JOB_NAME VARCHAR2(100 char) NOT NULL, | ||
JOB_KEY VARCHAR2(32 char) NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a string generated by batch so I'm not sure we would need this feature here.
I've made a couple notes. Add to those, I'm currently looking to see if any of the other databases we support have a similar function. I'd rather make a change like this all at once. |
@davidkarlsen Please sign the Contributor License Agreement! Click here to manually synchronize the status of this Pull Request. See the FAQ for frequently asked questions. |
@pivotal-issuemaster done! |
@davidkarlsen Thank you for signing the Contributor License Agreement! |
There is an open issue for this: https://jira.spring.io/browse/BATCH-2750 We will consider merging this PR in the upcoming v4.1. |
I checked the data types for each database provider we support (except the embedded ones) and most of them use characters by default:
However, some providers use bytes by default:
Moreover, this default can vary from one version to another for each database provider. For example, the default has changed for MySQL between version 4 and 5 as per: https://stackoverflow.com/questions/1997540/mysql-varchar-lengths-and-utf-8. I think it is almost unmanageable to test the DDL against each version of each database provider. So in hindsight, I'm not sure if this kind of issues should be fixed in the DDL or left as an adjustment to the user, as said in Appendix A.9:
This appendix clearly mentions that some adjustments (column lengths, additional indexes, etc) might be necessary if the default DDL is not enough. For some providers like DB2, it is not possible to specify if the size of the BTW, even the fix in this PR may not be sufficient: Passing the @davidkarlsen @mminella Do you agree? |
Spring Batch also has code in it that truncates values before they are inserted in specific columns. I agree that this kind of thing is better off left to users to customize on their own. We could add a note in the docs explicitly about the idea that they may need to increase field sizes when using multibyte characters. |
Thank you Michael. This section of the docs is explicit about multi-byte characters and gives some examples reported by users like doubling the column size or setting the
Ok, we are on the same page. @davidkarlsen What do you think? |
I think it makes sense to use characters where possible - people reason about characters - not bytes. We had a case where the status rolled back because it could fit to the database - and the truncating in code will reason around characters rather than bytes. Make it the easiest for people to use - if the 1st thing you need to do when you pick up a framework is to alter some low-level DDLs to make it work it is not a good UX. |
My report: https://jira.spring.io/browse/BATCH-2731 shows that truncating is done by Java DB string storage space requirement varies depending on version & locale settings. Doubling storage resolve issue in 99% cases unless you get exception with all characters in U+0800 - U+FFFF range for which each character has 3 bytes in UTF-8. What is about tripling DB column space? )) I my case I am not interested in storing of exception trace in BATCH table. I have ElasticSearch logging solution for that. Instead of fixing character length I'd like Batch framework ignores this kind of error and continue processing instead of breaking batch job in the middle. Otherwise it is unreliable to pass exception handling to Spring Batch framework and it is necessary to implement custom wrappers around Batch Tasklet interface to catch all Exceptions and to handle them safely. |
I agree on that, but if you choose to use oracle, it is oracle that makes you think about bytes or characters and make the decision, not Spring Batch. Other providers have a clear decision about that.
This argument made me re-think my position about this PR. But note that for a db provider like DB2 where you can specify column size only in bytes, the truncating being based on characters would still be an issue. On the other hand, if we change the truncating code to be based on bytes, the issue could still happen with db providers where the column size is specified in chars..
Well, I don't agree, the oracle DDL works out-of-the-box and tweaking it is not the first thing you need to do. You might need to adjust it only if your requirements are different from the defaults (column size, indexes, sequences, etc). Thank you for the details. Indeed, truncating is based on characters, not bytes.
As said in my previous comment, the DDL provides a default value for the column size but this size can be adjusted if necessary. The precise issue with Oracle is that we truncate the message at 2500 characters while the column size is defined in bytes. The change in this PR should fix the issue of BATCH-2091 (and the one you reported in BATCH-2731 too).
This is another discussion. I would be happy to discuss it in another issue if needed. To keep it simple, since the problem is about oracle and only oracle provides the choice to use |
Hi, After further investigations, I was able to reproduce the issue (See here) and the changes in this PR do fix the issue. However, we need to add a migration script for people already using the old DDL in production. I added this script in a separate PR #650 on top of these changes so all the credits go to @davidkarlsen for the fix ! Thank you @davidkarlsen for your contribution. Br, |
... number of bytes. This is better when using multibyte charsets like UTF8
https://jira.springsource.org/browse/BATCH-2091
This change is