Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-2484][LUCENE]Refactor distributable code and lauch job to clear the datamap from executor(clears segmentMap and remove datamap from cache) #2310

Closed
wants to merge 1 commit into from

Conversation

akashrn5
Copy link
Contributor

@akashrn5 akashrn5 commented May 15, 2018

Problem:
During query, blockletDataMapFactory maintains a segmentMap which has mapping of
segmentId -> list of index file, and this will be used while getting the extended blocklet
by checking whether the blocklet present in the index or not.
In case of Lucene, the datamap job will be launched and during pruning the segmentMap will be added
in executor and this map will be cleared in driver when drop table is called, but it will not be cleared in executor.
so when the query is fired after table or datamap is dropped, the lucene query fails.

Solution:
So when drop table or drop datamap is called a job is launched which clears the datamaps from segmentMap and cache and
then clears in driver.

This PR also refactors the datamap job classes and other common classes.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

  • Any interfaces changed?
    YES

  • Any backward compatibility impacted?
    NA

  • Document update required?
    NA

  • Testing done
    tested in cluster
    Please provide details on
    - Whether new unit test cases have been added or why no new tests are required?
    - How it is tested? Please attach test report.
    - Is it a performance related change? Please attach the performance test report.
    - Any additional information to help reviewers in testing this change.

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    NA

@akashrn5 akashrn5 changed the title [WIP]refactor distributable code and lauch job to clear the segmentmap and cache from executor [WIP]Refactor distributable code and lauch job to clear the datamap from executor(clears segmentMap and remove datamap from cache) May 15, 2018
@akashrn5 akashrn5 changed the title [WIP]Refactor distributable code and lauch job to clear the datamap from executor(clears segmentMap and remove datamap from cache) [CARBONDATA-2484][LUCENE]Refactor distributable code and lauch job to clear the datamap from executor(clears segmentMap and remove datamap from cache) May 15, 2018
@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5898/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4744/

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4932/

@akashrn5
Copy link
Contributor Author

retest this please

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5907/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4754/

public DataMapExprWrapper getAllDataMapsForClear(CarbonTable carbonTable)
throws IOException {
List<TableDataMap> allDataMapFG =
DataMapStoreManager.getInstance().getAllVisibleDataMap(carbonTable);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not just visible, you should get all datamaps

public CarbonTable getCarbonTable(AbsoluteTableIdentifier identifier) {
CarbonTable carbonTable = null;
try {
carbonTable = CarbonTable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First try getting the table from cache using CarbonMetadata.getInstance.getCarbonTable(dbName, tableName) , if cannot get then read from disk

}
DistributableDataMapFormat dataMapFormat =
createDataMapJob(carbonTable, dataMapExprWrapper, validSegments, null, className, true);
dataMapJob.execute((DistributableDataMapFormat) dataMapFormat, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to typecast

} else {
return;
}
DistributableDataMapFormat dataMapFormat =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if dataMapExprWrapper is null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataMapExprWrapper will be null, if the table does not have datamaps, that check is already there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So won't it fai in line 100, if you don't do null check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, if there are no datamaps present, it will return

}
}

public static void setDataMapJob(Configuration configuration, Object dataMapJob)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments to all public methods

dataMapJob.execute((DistributableDataMapFormat) dataMapFormat, null);
}

public static DistributableDataMapFormat createDataMapJob(CarbonTable carbonTable,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it private

* @param className
* @return
*/
public static Object createDataMapJob(String className) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it private

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be called from CarbonInputFormatUtil in function setDataMapJobIfConfigured, so we can keep it public

DistributableDataMapFormat(CarbonTable table,
DataMapExprWrapper dataMapExprWrapper, List<Segment> validSegments,
List<PartitionSpec> partitions, String className) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the className, it seems not used

TableDataMap dataMap = DataMapStoreManager.getInstance()
distributable = (DataMapDistributableWrapper) inputSplit;
// clear the segmentMap and from cache in executor when there are invalid segments
SegmentStatusManager ssm = new SegmentStatusManager(table.getAbsoluteTableIdentifier());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't read table statusfile in executor, pass the invalid segments from driver

@@ -162,14 +162,16 @@ public DataMapBuilder createBuilder(Segment segment, String shardName) {
* Get all distributable objects of a segmentid
*/
@Override
public List<DataMapDistributable> toDistributable(Segment segment) {
public List<DataMapDistributable> toDistributable(Segment segment, boolean isJobToClearDataMaps) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't change datamap interface, if the segment.getFilteredIndexShardNames() is null then get all.

@@ -123,7 +123,8 @@ public DataMapBuilder createBuilder(Segment segment, String shardName)
* @param segment
* @return
*/
@Override public List<DataMapDistributable> toDistributable(Segment segment) {
@Override public List<DataMapDistributable> toDistributable(Segment segment,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't change interface

* @param conf
* @throws IOException
*/
public static void setDataMapJobIfConfigured(Configuration conf) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this method is required here,let it be there in inputformat

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5932/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4777/

@akashrn5 akashrn5 force-pushed the refactor_clear_datamaps branch 2 times, most recently from 460b503 to dcd61f3 Compare May 17, 2018 09:31
@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4962/

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5935/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4780/

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5937/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4782/

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4965/

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5941/

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4967/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4786/

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4968/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4789/

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5945/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4972/

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4974/

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4984/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4798/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5954/

@akashrn5
Copy link
Contributor Author

retest this please

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5955/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4799/

@ravipesala
Copy link
Contributor

LGTM

@asfgit asfgit closed this in 2018048 May 18, 2018
anubhav100 pushed a commit to anubhav100/incubator-carbondata that referenced this pull request Jun 22, 2018
… clear the datamap from executor(clears segmentMap and remove datamap from cache)

Problem:
During query, blockletDataMapFactory maintains a segmentMap which has mapping of
segmentId -> list of index file, and this will be used while getting the extended blocklet
by checking whether the blocklet present in the index or not.
In case of Lucene, the datamap job will be launched and during pruning the segmentMap will be added
in executor and this map will be cleared in driver when drop table is called, but it will not be cleared in executor.
so when the query is fired after table or datamap is dropped, the lucene query fails.

Solution:
So when drop table or drop datamap is called a job is launched which clears the datamaps from segmentMap and cache and
then clears in driver.

This PR also refactors the datamap job classes and other common classes

This closes apache#2310
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants