New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-2491] Fix the error when reader read twice with SDK carbonReader #2318
[CARBONDATA-2491] Fix the error when reader read twice with SDK carbonReader #2318
Conversation
19043b6
to
3bcd4a0
Compare
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4986/ |
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4802/ |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5959/ |
retest this please |
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4805/ |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5963/ |
retest this please |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5975/ |
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4818/ |
@@ -267,6 +267,8 @@ public TableDataMap getDataMap(CarbonTable table, DataMapSchema dataMapSchema) { | |||
} | |||
} | |||
} | |||
} else { | |||
dataMap.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it required? please add comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it loses the purpose of cache if you clear for every retrieval. clear should be called only flush the cache. why do you need to flush cache for every call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I move it into carbonReader close method
public class AvroCarbonWriterTest { | ||
private String path = "./AvroCarbonWriterSuiteWriteFiles"; | ||
|
||
@Before | ||
public void cleanFile() { | ||
assert (TestUtil.cleanMdtFile()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there another PR to remove the creation of system folder when user uses SDK to write data?
When writing carbondata with SDK, we should not generate system folder.
@ravipesala check this please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think pr 2246 can solve it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR 2246 fix another problem, can't solve this one. @ravipesala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackylk This PR dependency on PR2246, the first commit it cherry-pick from PR2246.
After pr2246 merged, this pr need rebase.
3bcd4a0
to
53f9522
Compare
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5985/ |
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4827/ |
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5007/ |
retest this please |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5987/ |
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4829/ |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6031/ |
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4872/ |
5c4e40c
to
2bcb108
Compare
retest this please |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6046/ |
retest this please |
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4887/ |
@@ -77,6 +85,24 @@ public void testWriteAndReadFiles() throws IOException, InterruptedException { | |||
Assert.assertEquals(i, 100); | |||
|
|||
reader.close(); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case points to sequential read. One reader gets closed and second one starts. What exactly happens when there is parallel read of two readers. Can we have a test case for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will add. What's more, search mode has used CarbonRecordReader, there are some test case to concurrent run in org.apache.carbondata.examples.SearchModeExample.
// Read again | ||
CarbonReader reader2 = CarbonReader | ||
.builder(path, "_temp") | ||
.projection(new String[]{"name", "age"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test case of two sequential reads but without closing the 1st reader, 2nd reader starts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, done
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6051/ |
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4892/ |
@xubo245 Better to allow "*" as input in reader projection. This will help the user to specify all columns. Just like SQL select *. |
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4902/ |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6062/ |
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5066/ |
@sounakr Ok, done |
LGTM |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6069/ |
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4909/ |
@jackylk @ravipesala Hello, sounakr give LGTM and CI pass. Can you help to check and merge it if there are no problem, please. |
@@ -504,7 +505,24 @@ public QueryModel createQueryModel(InputSplit inputSplit, TaskAttemptContext tas | |||
String projectionString = getColumnProjection(configuration); | |||
String[] projectColumns; | |||
if (projectionString != null) { | |||
projectColumns = projectionString.split(","); | |||
if (projectionString.equalsIgnoreCase("*")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of passing *
, I think better to add another function to project all columns. You can add projectAllColumns()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, removed this. I raised new PR for it: https://github.com/apache/carbondata/pull/2338/files. Please review.
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5071/ |
315d669
to
bdc3757
Compare
retest this please |
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5082/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6084/ |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6081/ |
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4921/ |
retest this please |
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4925/ |
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4930/ |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6091/ |
retest this please |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6093/ |
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4932/ |
@jackylk CI pass, please check. |
LGTM |
…nReader This PR includes: 1. Fix the error out of bound when reader read twice with SDK carbonReader 2. Fix the java.lang.NegativeArraySizeException 3. Add timestamp and bad record test case 4. support parallel read of two readers This closes #2318
…nReader This PR includes: 1. Fix the error out of bound when reader read twice with SDK carbonReader 2. Fix the java.lang.NegativeArraySizeException 3. Add timestamp and bad record test case 4. support parallel read of two readers This closes apache#2318
…nReader This PR includes: 1. Fix the error out of bound when reader read twice with SDK carbonReader 2. Fix the java.lang.NegativeArraySizeException 3. Add timestamp and bad record test case 4. support parallel read of two readers This closes apache#2318
This PR includes:
How to fix the error:
Carbon throw the exception because the fileName is "" in org.apache.carbondata.core.util.path.CarbonTablePath.DataFileUtil#getTaskNo, I find the filePath is "" in prunedBlocklets of org.apache.carbondata.hadoop.api.CarbonInputFormat#getDataBlocksOfSegment. So it should the data map issue.
It'S UNSAFE issue
Carbon throw the second exception because the length less than 0 in org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow#convertToSafeRow
So this PR Clear the datamap cache by
DataMapStoreManager.getInstance().getDefaultDataMap(queryModel.getTable()).clear();
in org.apache.carbondata.hadoop.CarbonRecordReader#close
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
No
No
No
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
No
No