-
-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ICU] Problem with get a CHAR UNICODE_FSS in CP943C connection charset [CORE2123] #2554
Comments
Modified by: @dyemanovassignee: Adriano dos Santos Fernandes [ asfernandes ] |
Commented by: @asfernandes > I think, the problem at implementation unicode_to_icu/icu_to_unicode. These functions do not return a CS_TRUNCATION_ERROR. There is code to return CS_TRUNCATION_ERROR on these functions. So please send a test case. |
Commented by: @ibprovider Run the "charsets.*" tests from CORE2122 |
Commented by: @ibprovider Oh, I'm sorry tests: |
Commented by: @ibprovider BUG FIX This patch corrects the CORE2122 also All my tests (old and new) works fine now. |
Modified by: @ibproviderAttachment: cv_icu__1_4.txt [ 11120 ] |
Commented by: @asfernandes If this patch fix CORE2122, I'm sure it's incorrect, causing transliteration of invalid characters. Reason for CORE2122 is also not the one I described. I'm now running your tests with release build and hope it finish before I need to go or I will need to pass fix for you test. With debug build, it didn't finished on 4 hours. Anyway, could you describe what this patch does? I didn't verified anything on CORE2123 yet. |
Commented by: @ibprovider Hmmm... Open your eyes and try use a debugger. You can work with debugger? Im afraid - NO. Because, 1. At CORE2122, you not found that server handles only first 16K bytes from BLOB with length ~60K. I say about blob.002.unicode.TBL_CS__TIS620.COL_BLOB.ins_UNICODE_FSS.sel_TIS620.len_32767.chars_TIS620.bind__wstr. 2. You not found, that your icu_to_unicode and unicode_to_icu can't return the correct err_position. They always return err_position==ZERO. You agree? My patch is very simple and if you try spend the less time for "architect" problems, you can without any problems understand its. Ofcourse, If you want understand. Regards. |
Commented by: @asfernandes You are a child, an idiot child! I'll not ask you anymore for collaboration. Thank you. (your so good test is still running, without any error) |
Commented by: @hvlad Adriano, try this (database is in WIN1251) : recreate table t_blb (id int, blb blob sub_type text character set win1251); execute block returns (i int) i = 0; while (i < 80000) do
end |
Commented by: @ibprovider Adriano, at next time, please replace your "I'm sure" on "I'm think". And all will be happy. Regards. |
Commented by: @hvlad Hmm... i used slightly old build (21198). Currently i see no errors with 21217. Guys, Adriano and Dmitry ! Please, be patient and honour each other. |
Commented by: @hvlad Reading blobs back i see that character 0x98 (152) is zero (0x00) in blobs : with recursive nnn (n) as ( vals as ( |
Commented by: @asfernandes Vlad, Your recursive query was not ending for me. So I tried to replace with: create or alter procedure rrr returns (n integer) with Note I also introduced char_length. With this query no rows are returned. Did I misunderstood it? |
Commented by: @hvlad Adriano, I just updated my source tree and rebuild. Build num is 21232 My recursive query runs in 5 sec (release build) and still returns 313 rows. |
Commented by: @asfernandes Vlad, After create an index on t_blb (id), it runs faster. :-) But your query, with data generated by your exec. block returns 255 rows for me, the same rows as my one without char_length. This is your query with char_length: with recursive nnn (n) as ( vals as ( It doesn't return rows for me. If I replace char_length by octet_length (they should return same value in win1251) nothing changes, so is not a problem of char_length. So it seems for me that zeros are returned because ascii_val(substring of non-existent character). |
Commented by: @hvlad > After create an index on t_blb (id), it runs faster. :-) Sorry ! I made two examples and give you a wrong one :) Here is correct variant : recreate table t_blb (id int not null primary key, blb blob sub_type text character set win1251); execute block returns (i int) i = 0; while (i < 80000) do
end with recursive nnn (n) as ( vals as ( |
Commented by: @asfernandes Vlad, This is no problem. In cs_win1251.h, there is mapping table from WIN1251 to Unicode: When this byte is converted to UTF8 it's replaced by ICU to \0. When you convert it back to WIN1251 it becomes \0. |
Modified by: @asfernandesstatus: Open [ 1 ] => Resolved [ 5 ] resolution: Fixed [ 1 ] Fix Version: 2.5 Beta 1 [ 10251 ] |
Commented by: @ibprovider Fix a problems with small size of output buffer (truncation error) Possible situation with lost of data See using of pSource_Done_Prev Sorry for this stupid bug. |
Modified by: @ibproviderAttachment: 2008_12_16__cv_icu__1_6_diff.txt [ 11242 ] |
Modified by: @pcisarstatus: Resolved [ 5 ] => Closed [ 6 ] |
Modified by: @pavel-zotovQA Status: No test |
Submitted by: @ibprovider
Attachments:
cv_icu__1_4.txt
2008_12_16__cv_icu__1_6_diff.txt
Hi
I made this test for CHAR-ARRAY columns, but (I think) similar problem will be for simple CHAR-column also
Meta: CHAR(8) [0:2] character set UNICODE_FSS
Insert: connection ctype CP943C. All is OK
Select: connection ctype CP943C: generates a translation error
For VARCHAR-ARRAY insert/select - work fine
For non-ICU multibyte charsets (for example, BIG_5) - CHAR-ARRAY do not have any errors.
---
I think, the problem at implementation unicode_to_icu/icu_to_unicode. These functions do not return a CS_TRUNCATION_ERROR.
As result CsConvert::convert can't ignore trailingSpace
Banzay
Commits: 92b8eff ea9226f
The text was updated successfully, but these errors were encountered: