Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-3174] Fix trailing space issue with varchar column for SDK #2988

Closed
wants to merge 1 commit into from

Conversation

Shubh18s
Copy link
Contributor

What was the issue?
After doing SDK Write, Select * was failing for 'long_string_columns' with trailing space.

What has been changed?
Removed the trailing space in ColumnName.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

  • Any interfaces changed?

  • Any backward compatibility impacted?

  • Document update required?

  • Testing done
    Added a test case.

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@CarbonDataQA
Copy link

Can one of the admins verify this patch?

@brijoobopanna
Copy link
Contributor

add to whitelist

@brijoobopanna
Copy link
Contributor

retest this please

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1761/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1972/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10020/

s"""CREATE TABLE test using carbon options('long_string_columns'='subject,messagebody')
|LOCATION '$writerPath'"""
.stripMargin)
sql("select * from test").show()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this test case not checking the correctness of the results? Such as checkAnswer , assert and so on

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1786/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10046/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1998/

@Shubh18s
Copy link
Contributor Author

retest this please

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1793/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10053/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2005/

@@ -55,7 +55,7 @@
* @param type datatype of field, specified in strings.
*/
public Field(String name, String type) {
this.name = name;
this.name = name.toLowerCase().trim();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CarbonWriterBuilder.updateSchemaFields() is already converting to lowercase, just add trim in that method. No need to handle for each here.

@@ -2490,6 +2490,54 @@ class TestNonTransactionalCarbonTable extends QueryTest with BeforeAndAfterAll {
FileUtils.deleteDirectory(new File(writerPath))
}

test("check varchar with trailing space") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to duplicate test cases. In the existing varchar columns test case, add a trailing space to one of the columns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

besides, this is for varchar columns, why not update the code there?

@ajantha-bhat
Copy link
Member

@Shubh18s : why for only varchar columns ? how it was handled other columns ? I guess this problem is there for other columns also

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1801/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2013/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10062/

@ajantha-bhat
Copy link
Member

@Shubh18s : I have checked the code,
all the column names are stored without trim in SDK. But long_string_columns table properties is having trim. column name has string without trim but properties have string with trim. Hence the schema mismatch.

After this change, sort_columns and invertedIndexFor are affected.
As CarbonWriterBuilder.sortBy() is exposed to user, here also trim is not there as per previous code. so add trim() here also. similar changes in CarbonWriterBuilder.invertedIndexFor()

@@ -747,7 +747,7 @@ private Schema updateSchemaFields(Schema schema, Set<String> longStringColumns)
Field[] fields = schema.getFields();
for (int i = 0; i < fields.length; i++) {
if (fields[i] != null) {
fields[i].updateNameToLowerCase();
//fields[i].updateName();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1814/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1815/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2025/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10074/

@ajantha-bhat
Copy link
Member

LGTM

1 similar comment
@kunal642
Copy link
Contributor

LGTM

@asfgit asfgit closed this in f822540 Dec 18, 2018
asfgit pushed a commit that referenced this pull request Jan 21, 2019
What was the issue?
After doing SDK Write, Select * was failing for 'long_string_columns' with trailing space.

What has been changed?
Removed the trailing space in ColumnName.

This closes #2988
qiuchenjian pushed a commit to qiuchenjian/carbondata that referenced this pull request Jun 14, 2019
What was the issue?
After doing SDK Write, Select * was failing for 'long_string_columns' with trailing space.

What has been changed?
Removed the trailing space in ColumnName.

This closes apache#2988
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants