[CARBONDATA-2491] Fix the error when reader read twice with SDK carbonReader #2318

xubo245 · 2018-05-18T07:45:29Z

This PR includes:

Fix the error out of bound when reader read twice with SDK carbonReader
Fix the java.lang.NegativeArraySizeException
Add timestamp and bad record test case
support parallel read of two readers

How to fix the error:
Carbon throw the exception because the fileName is "" in org.apache.carbondata.core.util.path.CarbonTablePath.DataFileUtil#getTaskNo, I find the filePath is "" in prunedBlocklets of org.apache.carbondata.hadoop.api.CarbonInputFormat#getDataBlocksOfSegment. So it should the data map issue.
It'S UNSAFE issue

Carbon throw the second exception because the length less than 0 in org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow#convertToSafeRow

So this PR Clear the datamap cache by
DataMapStoreManager.getInstance().getDefaultDataMap(queryModel.getTable()).clear();
in org.apache.carbondata.hadoop.CarbonRecordReader#close

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

Any interfaces changed?
No
Any backward compatibility impacted?
No
Document update required?
No
Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
No
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
No

ravipesala · 2018-05-18T09:08:07Z

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4986/

CarbonDataQA · 2018-05-18T09:33:25Z

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4802/

CarbonDataQA · 2018-05-18T10:04:31Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5959/

xubo245 · 2018-05-18T10:22:42Z

retest this please

CarbonDataQA · 2018-05-18T11:23:59Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4805/

CarbonDataQA · 2018-05-18T11:41:28Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5963/

xubo245 · 2018-05-18T15:46:23Z

retest this please

CarbonDataQA · 2018-05-18T16:40:15Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5975/

CarbonDataQA · 2018-05-18T17:09:45Z

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4818/

jackylk · 2018-05-19T01:32:25Z

core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java

@@ -267,6 +267,8 @@ public TableDataMap getDataMap(CarbonTable table, DataMapSchema dataMapSchema) {
          }
        }
      }
+    } else {
+      dataMap.clear();


why is it required? please add comment

I think it loses the purpose of cache if you clear for every retrieval. clear should be called only flush the cache. why do you need to flush cache for every call?

I move it into carbonReader close method

jackylk · 2018-05-19T01:37:15Z

store/sdk/src/test/java/org/apache/carbondata/sdk/file/AvroCarbonWriterTest.java

 public class AvroCarbonWriterTest {
  private String path = "./AvroCarbonWriterSuiteWriteFiles";

+  @Before
+  public void cleanFile() {
+    assert (TestUtil.cleanMdtFile());


Is there another PR to remove the creation of system folder when user uses SDK to write data?
When writing carbondata with SDK, we should not generate system folder.
@ravipesala check this please

I think pr 2246 can solve it

PR 2246 fix another problem, can't solve this one. @ravipesala

@jackylk This PR dependency on PR2246, the first commit it cherry-pick from PR2246.
After pr2246 merged, this pr need rebase.

CarbonDataQA · 2018-05-21T03:17:05Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5985/

CarbonDataQA · 2018-05-21T03:20:24Z

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4827/

ravipesala · 2018-05-21T03:24:04Z

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5007/

xubo245 · 2018-05-21T03:55:03Z

retest this please

CarbonDataQA · 2018-05-21T04:50:29Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5987/

CarbonDataQA · 2018-05-21T04:54:17Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4829/

CarbonDataQA · 2018-05-22T05:19:23Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6031/

CarbonDataQA · 2018-05-22T05:23:11Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4872/

sraghunandan · 2018-05-22T12:17:05Z

retest this please

CarbonDataQA · 2018-05-22T13:25:52Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6046/

xubo245 · 2018-05-22T14:03:10Z

retest this please

CarbonDataQA · 2018-05-22T14:25:53Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4887/

sounakr · 2018-05-22T14:27:46Z

store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java

@@ -77,6 +85,24 @@ public void testWriteAndReadFiles() throws IOException, InterruptedException {
    Assert.assertEquals(i, 100);

    reader.close();
+


This test case points to sequential read. One reader gets closed and second one starts. What exactly happens when there is parallel read of two readers. Can we have a test case for that?

Ok, I will add. What's more, search mode has used CarbonRecordReader, there are some test case to concurrent run in org.apache.carbondata.examples.SearchModeExample.

sounakr · 2018-05-22T14:30:18Z

store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java

+    // Read again
+    CarbonReader reader2 = CarbonReader
+        .builder(path, "_temp")
+        .projection(new String[]{"name", "age"})


Add a test case of two sequential reads but without closing the 1st reader, 2nd reader starts.

CarbonDataQA · 2018-05-22T15:58:24Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6051/

CarbonDataQA · 2018-05-22T16:25:58Z

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4892/

sounakr · 2018-05-23T08:49:51Z

@xubo245 Better to allow "*" as input in reader projection. This will help the user to specify all columns. Just like SQL select *.

CarbonDataQA · 2018-05-23T09:08:06Z

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4902/

CarbonDataQA · 2018-05-23T09:23:10Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6062/

ravipesala · 2018-05-23T10:14:23Z

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5066/

xubo245 · 2018-05-23T10:52:23Z

@sounakr Ok, done

sounakr · 2018-05-23T11:13:55Z

LGTM

CarbonDataQA · 2018-05-23T12:29:59Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6069/

CarbonDataQA · 2018-05-23T13:06:16Z

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4909/

xubo245 · 2018-05-23T13:11:57Z

@jackylk @ravipesala Hello, sounakr give LGTM and CI pass. Can you help to check and merge it if there are no problem, please.

jackylk · 2018-05-23T14:53:36Z

hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java

@@ -504,7 +505,24 @@ public QueryModel createQueryModel(InputSplit inputSplit, TaskAttemptContext tas
    String projectionString = getColumnProjection(configuration);
    String[] projectColumns;
    if (projectionString != null) {
-      projectColumns = projectionString.split(",");
+      if (projectionString.equalsIgnoreCase("*")) {


instead of passing *, I think better to add another function to project all columns. You can add projectAllColumns()

ok, removed this. I raised new PR for it: https://github.com/apache/carbondata/pull/2338/files. Please review.

ravipesala · 2018-05-23T14:59:40Z

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5071/

xubo245 · 2018-05-24T02:10:00Z

retest this please

ravipesala · 2018-05-24T03:50:49Z

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5082/

CarbonDataQA · 2018-05-24T04:21:36Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6084/

CarbonDataQA · 2018-05-24T04:24:45Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6081/

CarbonDataQA · 2018-05-24T04:31:41Z

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4921/

xubo245 · 2018-05-24T05:12:39Z

retest this please

CarbonDataQA · 2018-05-24T05:53:48Z

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4925/

CarbonDataQA · 2018-05-24T08:22:11Z

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4930/

CarbonDataQA · 2018-05-24T08:22:11Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6091/

xubo245 · 2018-05-24T11:09:15Z

retest this please

CarbonDataQA · 2018-05-24T12:05:05Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6093/

CarbonDataQA · 2018-05-24T12:13:37Z

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4932/

xubo245 · 2018-05-24T12:41:09Z

@jackylk CI pass, please check.

kunal642 · 2018-05-24T13:51:13Z

LGTM

…nReader This PR includes: 1. Fix the error out of bound when reader read twice with SDK carbonReader 2. Fix the java.lang.NegativeArraySizeException 3. Add timestamp and bad record test case 4. support parallel read of two readers This closes #2318

…nReader This PR includes: 1. Fix the error out of bound when reader read twice with SDK carbonReader 2. Fix the java.lang.NegativeArraySizeException 3. Add timestamp and bad record test case 4. support parallel read of two readers This closes apache#2318

xubo245 force-pushed the CARBONDATA-2491-OutOfBoundAndBadRecord branch from 19043b6 to 3bcd4a0 Compare May 18, 2018 07:47

jackylk reviewed May 19, 2018

View reviewed changes

xubo245 force-pushed the CARBONDATA-2491-OutOfBoundAndBadRecord branch from 3bcd4a0 to 53f9522 Compare May 21, 2018 01:57

xubo245 force-pushed the CARBONDATA-2491-OutOfBoundAndBadRecord branch from 5c4e40c to 2bcb108 Compare May 22, 2018 06:47

sounakr reviewed May 22, 2018

View reviewed changes

jackylk reviewed May 23, 2018

View reviewed changes

xubo245 force-pushed the CARBONDATA-2491-OutOfBoundAndBadRecord branch from 315d669 to bdc3757 Compare May 24, 2018 02:04

asfgit closed this in a7ac656 May 24, 2018

		@@ -77,6 +85,24 @@ public void testWriteAndReadFiles() throws IOException, InterruptedException {
		Assert.assertEquals(i, 100);

		reader.close();

[CARBONDATA-2491] Fix the error when reader read twice with SDK carbonReader #2318

[CARBONDATA-2491] Fix the error when reader read twice with SDK carbonReader #2318

Conversation

xubo245 commented May 18, 2018 • edited

ravipesala commented May 18, 2018

CarbonDataQA commented May 18, 2018

CarbonDataQA commented May 18, 2018

xubo245 commented May 18, 2018

CarbonDataQA commented May 18, 2018

CarbonDataQA commented May 18, 2018

xubo245 commented May 18, 2018

CarbonDataQA commented May 18, 2018

CarbonDataQA commented May 18, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jackylk May 19, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarbonDataQA commented May 21, 2018

CarbonDataQA commented May 21, 2018

ravipesala commented May 21, 2018

xubo245 commented May 21, 2018

CarbonDataQA commented May 21, 2018

CarbonDataQA commented May 21, 2018

CarbonDataQA commented May 22, 2018

CarbonDataQA commented May 22, 2018

sraghunandan commented May 22, 2018

CarbonDataQA commented May 22, 2018

xubo245 commented May 22, 2018

CarbonDataQA commented May 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarbonDataQA commented May 22, 2018

CarbonDataQA commented May 22, 2018

sounakr commented May 23, 2018

CarbonDataQA commented May 23, 2018

CarbonDataQA commented May 23, 2018

ravipesala commented May 23, 2018

xubo245 commented May 23, 2018

sounakr commented May 23, 2018

CarbonDataQA commented May 23, 2018

CarbonDataQA commented May 23, 2018

xubo245 commented May 23, 2018

Choose a reason for hiding this comment

xubo245 May 24, 2018 • edited

Choose a reason for hiding this comment

ravipesala commented May 23, 2018

xubo245 commented May 24, 2018

ravipesala commented May 24, 2018

CarbonDataQA commented May 24, 2018

CarbonDataQA commented May 24, 2018

CarbonDataQA commented May 24, 2018

xubo245 commented May 24, 2018

CarbonDataQA commented May 24, 2018

CarbonDataQA commented May 24, 2018

CarbonDataQA commented May 24, 2018

xubo245 commented May 24, 2018

CarbonDataQA commented May 24, 2018

CarbonDataQA commented May 24, 2018

xubo245 commented May 24, 2018

kunal642 commented May 24, 2018

xubo245 commented May 18, 2018 •

edited

jackylk May 19, 2018 •

edited

xubo245 May 24, 2018 •

edited