New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-2484][LUCENE]Refactor distributable code and lauch job to clear the datamap from executor(clears segmentMap and remove datamap from cache) #2310
Conversation
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5898/ |
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4744/ |
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4932/ |
retest this please |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5907/ |
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4754/ |
public DataMapExprWrapper getAllDataMapsForClear(CarbonTable carbonTable) | ||
throws IOException { | ||
List<TableDataMap> allDataMapFG = | ||
DataMapStoreManager.getInstance().getAllVisibleDataMap(carbonTable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not just visible, you should get all datamaps
public CarbonTable getCarbonTable(AbsoluteTableIdentifier identifier) { | ||
CarbonTable carbonTable = null; | ||
try { | ||
carbonTable = CarbonTable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First try getting the table from cache using CarbonMetadata.getInstance.getCarbonTable(dbName, tableName)
, if cannot get then read from disk
} | ||
DistributableDataMapFormat dataMapFormat = | ||
createDataMapJob(carbonTable, dataMapExprWrapper, validSegments, null, className, true); | ||
dataMapJob.execute((DistributableDataMapFormat) dataMapFormat, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to typecast
} else { | ||
return; | ||
} | ||
DistributableDataMapFormat dataMapFormat = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if dataMapExprWrapper
is null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataMapExprWrapper will be null, if the table does not have datamaps, that check is already there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So won't it fai in line 100, if you don't do null check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, if there are no datamaps present, it will return
} | ||
} | ||
|
||
public static void setDataMapJob(Configuration configuration, Object dataMapJob) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comments to all public methods
dataMapJob.execute((DistributableDataMapFormat) dataMapFormat, null); | ||
} | ||
|
||
public static DistributableDataMapFormat createDataMapJob(CarbonTable carbonTable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it private
* @param className | ||
* @return | ||
*/ | ||
public static Object createDataMapJob(String className) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be called from CarbonInputFormatUtil in function setDataMapJobIfConfigured, so we can keep it public
DistributableDataMapFormat(CarbonTable table, | ||
DataMapExprWrapper dataMapExprWrapper, List<Segment> validSegments, | ||
List<PartitionSpec> partitions, String className) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the className
, it seems not used
TableDataMap dataMap = DataMapStoreManager.getInstance() | ||
distributable = (DataMapDistributableWrapper) inputSplit; | ||
// clear the segmentMap and from cache in executor when there are invalid segments | ||
SegmentStatusManager ssm = new SegmentStatusManager(table.getAbsoluteTableIdentifier()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't read table statusfile in executor, pass the invalid segments from driver
@@ -162,14 +162,16 @@ public DataMapBuilder createBuilder(Segment segment, String shardName) { | |||
* Get all distributable objects of a segmentid | |||
*/ | |||
@Override | |||
public List<DataMapDistributable> toDistributable(Segment segment) { | |||
public List<DataMapDistributable> toDistributable(Segment segment, boolean isJobToClearDataMaps) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't change datamap interface, if the segment.getFilteredIndexShardNames()
is null then get all.
@@ -123,7 +123,8 @@ public DataMapBuilder createBuilder(Segment segment, String shardName) | |||
* @param segment | |||
* @return | |||
*/ | |||
@Override public List<DataMapDistributable> toDistributable(Segment segment) { | |||
@Override public List<DataMapDistributable> toDistributable(Segment segment, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't change interface
* @param conf | ||
* @throws IOException | ||
*/ | ||
public static void setDataMapJobIfConfigured(Configuration conf) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this method is required here,let it be there in inputformat
51e80e7
to
0618255
Compare
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5932/ |
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4777/ |
460b503
to
dcd61f3
Compare
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4962/ |
dcd61f3
to
3875025
Compare
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5935/ |
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4780/ |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5937/ |
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4782/ |
3875025
to
cd004d1
Compare
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4965/ |
cd004d1
to
e598736
Compare
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5941/ |
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4967/ |
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4786/ |
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4968/ |
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4789/ |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5945/ |
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4972/ |
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4974/ |
… cache from executor
e598736
to
59fb6b1
Compare
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4984/ |
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4798/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5954/ |
retest this please |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5955/ |
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4799/ |
LGTM |
… clear the datamap from executor(clears segmentMap and remove datamap from cache) Problem: During query, blockletDataMapFactory maintains a segmentMap which has mapping of segmentId -> list of index file, and this will be used while getting the extended blocklet by checking whether the blocklet present in the index or not. In case of Lucene, the datamap job will be launched and during pruning the segmentMap will be added in executor and this map will be cleared in driver when drop table is called, but it will not be cleared in executor. so when the query is fired after table or datamap is dropped, the lucene query fails. Solution: So when drop table or drop datamap is called a job is launched which clears the datamaps from segmentMap and cache and then clears in driver. This PR also refactors the datamap job classes and other common classes This closes apache#2310
Problem:
During query, blockletDataMapFactory maintains a segmentMap which has mapping of
segmentId -> list of index file, and this will be used while getting the extended blocklet
by checking whether the blocklet present in the index or not.
In case of Lucene, the datamap job will be launched and during pruning the segmentMap will be added
in executor and this map will be cleared in driver when drop table is called, but it will not be cleared in executor.
so when the query is fired after table or datamap is dropped, the lucene query fails.
Solution:
So when drop table or drop datamap is called a job is launched which clears the datamaps from segmentMap and cache and
then clears in driver.
This PR also refactors the datamap job classes and other common classes.
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
Any interfaces changed?
YES
Any backward compatibility impacted?
NA
Document update required?
NA
Testing done
tested in cluster
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
NA