Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-2809][DataMap] Block rebuilding for bloom/lucene and preagg datamap #2594

Closed
wants to merge 2 commits into from

Conversation

xuchuanyin
Copy link
Contributor

@xuchuanyin xuchuanyin commented Aug 1, 2018

Problems & RootCause:

For non-lazy (not deferred rebuild, which is by default) index datamap, the data of datamap will be
generated immediately after:

  1. the datamap is created
  2. the main table is loaded
    So there is no need to rebuild this datamap.
    Actually, it will encounter error if we trigger rebuilding for these
    datamaps due to the existence of old data.

The situation for preagg is the same.

Solution:

We will block rebuilding for bloom/lucene and preagg datamap as well as creating them with 'deferred rebuild' in carbondata 1.4.1.
Later we will optimize the full rebuilding and incremental rebuilding for these datamaps.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

  • Any interfaces changed?
    NO
  • Any backward compatibility impacted?
    NO
  • Document update required?
    NO
  • Testing done
    Please provide details on
    - Whether new unit test cases have been added or why no new tests are required?
    NO
    - How it is tested? Please attach test report.
    Tested in local
    - Is it a performance related change? Please attach the performance test report.
    NO
    - Any additional information to help reviewers in testing this change.
    NA
  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    NA

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7706/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6432/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6098/

@xuchuanyin xuchuanyin changed the title [CARBONDATA-2809][DataMap] Skip rebuilding for non-lazy datamap [CARBONDATA-2809][DataMap] Skip rebuilding for non-lazy index datamap Aug 2, 2018
@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6108/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7715/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6441/

// for non-lazy index datamap, the data of datamap will be generated immediately after
// the datamap is created or the main table is loaded, so there is no need to
// rebuild this datamap.
if (!schema.isLazy && provider.isInstanceOf[IndexDataMapProvider]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it is other datamap like preaggregat datamap, we should not rebuild it, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For MV, current implementation requires rebuild.
For preagg, I'm not sure about its implementation, so leave it as it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now rebuild call on pre-aggregate DM ithrows "NoSuchDataMapException". Please handle to give correct message as pre-aggregate also rebuild is not required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@xuchuanyin xuchuanyin changed the title [CARBONDATA-2809][DataMap] Skip rebuilding for non-lazy index datamap [CARBONDATA-2809][DataMap] Block rebuilding for bloom/lucene and preagg datamap Aug 3, 2018
@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6489/

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7765/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6147/

@ravipesala
Copy link
Contributor

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6148/

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6177/

As manual refresh currently only works fine for MV, it has some bugs
with other types of datamap such as preaggregate, timeserials, lucene,
bloomfilter, we will block 'deferred rebuild' for them as well as block
rebuild command for them.
@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6178/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7799/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6523/

@ravipesala
Copy link
Contributor

@xuchuanyin Please check MVTests, it is failing

MV datamap will be deferred rebuild no matter whether the deferred flag
is set or not.
@xuchuanyin
Copy link
Contributor Author

@ravipesala
Fixed.
The root cause is that MV is actually 'deferred rebuild', but we didn't specify it while we create the datamap.
To make compliance, we will enable 'deferred rebuild' for MV datamap no matter whether the flag is enabled by user or not.

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6182/

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7803/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6527/

@brijoobopanna
Copy link
Contributor

retest this please

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7806/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6530/

@jackylk
Copy link
Contributor

jackylk commented Aug 7, 2018

LGTM

@asfgit asfgit closed this in abcd4f6 Aug 7, 2018
asfgit pushed a commit that referenced this pull request Aug 9, 2018
…gg datamap

As manual refresh currently only works fine for MV, it has some bugs
with other types of datamap such as preaggregate, timeserials, lucene,
bloomfilter, we will block 'deferred rebuild' for them as well as block
rebuild command for them.

Fix bugs in deferred rebuild for MV

MV datamap will be deferred rebuild no matter whether the deferred flag
is set or not.

This closes #2594
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants