Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-2491] Fix the error when reader read twice with SDK carbonReader #2318

Closed

Conversation

xubo245
Copy link
Contributor

@xubo245 xubo245 commented May 18, 2018

This PR includes:

  1. Fix the error out of bound when reader read twice with SDK carbonReader
  2. Fix the java.lang.NegativeArraySizeException
  3. Add timestamp and bad record test case
  4. support parallel read of two readers

How to fix the error:
Carbon throw the exception because the fileName is "" in org.apache.carbondata.core.util.path.CarbonTablePath.DataFileUtil#getTaskNo, I find the filePath is "" in prunedBlocklets of org.apache.carbondata.hadoop.api.CarbonInputFormat#getDataBlocksOfSegment. So it should the data map issue.
It'S UNSAFE issue

Carbon throw the second exception because the length less than 0 in org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow#convertToSafeRow

So this PR Clear the datamap cache by
DataMapStoreManager.getInstance().getDefaultDataMap(queryModel.getTable()).clear();
in org.apache.carbondata.hadoop.CarbonRecordReader#close

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

  • Any interfaces changed?
    No
  • Any backward compatibility impacted?
    No
  • Document update required?
    No
  • Testing done
    Please provide details on
    - Whether new unit test cases have been added or why no new tests are required?
    - How it is tested? Please attach test report.
    - Is it a performance related change? Please attach the performance test report.
    - Any additional information to help reviewers in testing this change.
    No
  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    No

@xubo245 xubo245 force-pushed the CARBONDATA-2491-OutOfBoundAndBadRecord branch from 19043b6 to 3bcd4a0 Compare May 18, 2018 07:47
@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4986/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4802/

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5959/

@xubo245
Copy link
Contributor Author

xubo245 commented May 18, 2018

retest this please

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4805/

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5963/

@xubo245
Copy link
Contributor Author

xubo245 commented May 18, 2018

retest this please

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5975/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4818/

@@ -267,6 +267,8 @@ public TableDataMap getDataMap(CarbonTable table, DataMapSchema dataMapSchema) {
}
}
}
} else {
dataMap.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it required? please add comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it loses the purpose of cache if you clear for every retrieval. clear should be called only flush the cache. why do you need to flush cache for every call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I move it into carbonReader close method

public class AvroCarbonWriterTest {
private String path = "./AvroCarbonWriterSuiteWriteFiles";

@Before
public void cleanFile() {
assert (TestUtil.cleanMdtFile());
Copy link
Contributor

@jackylk jackylk May 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there another PR to remove the creation of system folder when user uses SDK to write data?
When writing carbondata with SDK, we should not generate system folder.
@ravipesala check this please

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pr 2246 can solve it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR 2246 fix another problem, can't solve this one. @ravipesala

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackylk This PR dependency on PR2246, the first commit it cherry-pick from PR2246.
After pr2246 merged, this pr need rebase.

@xubo245 xubo245 force-pushed the CARBONDATA-2491-OutOfBoundAndBadRecord branch from 3bcd4a0 to 53f9522 Compare May 21, 2018 01:57
@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5985/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4827/

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5007/

@xubo245
Copy link
Contributor Author

xubo245 commented May 21, 2018

retest this please

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5987/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4829/

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6031/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4872/

@xubo245 xubo245 force-pushed the CARBONDATA-2491-OutOfBoundAndBadRecord branch from 5c4e40c to 2bcb108 Compare May 22, 2018 06:47
@sraghunandan
Copy link
Contributor

retest this please

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6046/

@xubo245
Copy link
Contributor Author

xubo245 commented May 22, 2018

retest this please

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4887/

@@ -77,6 +85,24 @@ public void testWriteAndReadFiles() throws IOException, InterruptedException {
Assert.assertEquals(i, 100);

reader.close();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case points to sequential read. One reader gets closed and second one starts. What exactly happens when there is parallel read of two readers. Can we have a test case for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will add. What's more, search mode has used CarbonRecordReader, there are some test case to concurrent run in org.apache.carbondata.examples.SearchModeExample.

// Read again
CarbonReader reader2 = CarbonReader
.builder(path, "_temp")
.projection(new String[]{"name", "age"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test case of two sequential reads but without closing the 1st reader, 2nd reader starts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6051/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4892/

@sounakr
Copy link
Contributor

sounakr commented May 23, 2018

@xubo245 Better to allow "*" as input in reader projection. This will help the user to specify all columns. Just like SQL select *.

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4902/

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6062/

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5066/

@xubo245
Copy link
Contributor Author

xubo245 commented May 23, 2018

@sounakr Ok, done

@sounakr
Copy link
Contributor

sounakr commented May 23, 2018

LGTM

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6069/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4909/

@xubo245
Copy link
Contributor Author

xubo245 commented May 23, 2018

@jackylk @ravipesala Hello, sounakr give LGTM and CI pass. Can you help to check and merge it if there are no problem, please.

@@ -504,7 +505,24 @@ public QueryModel createQueryModel(InputSplit inputSplit, TaskAttemptContext tas
String projectionString = getColumnProjection(configuration);
String[] projectColumns;
if (projectionString != null) {
projectColumns = projectionString.split(",");
if (projectionString.equalsIgnoreCase("*")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of passing *, I think better to add another function to project all columns. You can add projectAllColumns()

Copy link
Contributor Author

@xubo245 xubo245 May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, removed this. I raised new PR for it: https://github.com/apache/carbondata/pull/2338/files. Please review.

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5071/

@xubo245 xubo245 force-pushed the CARBONDATA-2491-OutOfBoundAndBadRecord branch from 315d669 to bdc3757 Compare May 24, 2018 02:04
@xubo245
Copy link
Contributor Author

xubo245 commented May 24, 2018

retest this please

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5082/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6084/

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6081/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4921/

@xubo245
Copy link
Contributor Author

xubo245 commented May 24, 2018

retest this please

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4925/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4930/

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6091/

@xubo245
Copy link
Contributor Author

xubo245 commented May 24, 2018

retest this please

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6093/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4932/

@xubo245
Copy link
Contributor Author

xubo245 commented May 24, 2018

@jackylk CI pass, please check.

@kunal642
Copy link
Contributor

LGTM

@asfgit asfgit closed this in a7ac656 May 24, 2018
asfgit pushed a commit that referenced this pull request Jun 5, 2018
…nReader

This PR includes:
1. Fix the error out of bound when reader read twice with SDK carbonReader
2. Fix the java.lang.NegativeArraySizeException
3. Add timestamp and bad record test case
4. support parallel read of two readers

This closes #2318
sv71294 pushed a commit to sv71294/carbondata that referenced this pull request Jun 22, 2018
…nReader

This PR includes:
1. Fix the error out of bound when reader read twice with SDK carbonReader
2. Fix the java.lang.NegativeArraySizeException
3. Add timestamp and bad record test case
4. support parallel read of two readers

This closes apache#2318
anubhav100 pushed a commit to anubhav100/incubator-carbondata that referenced this pull request Jun 22, 2018
…nReader

This PR includes:
1. Fix the error out of bound when reader read twice with SDK carbonReader
2. Fix the java.lang.NegativeArraySizeException
3. Add timestamp and bad record test case
4. support parallel read of two readers

This closes apache#2318
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants