Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][CARBONDATA-3935]Support partition table transactional write in presto #3916

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

akashrn5
Copy link
Contributor

@akashrn5 akashrn5 commented Sep 8, 2020

Why is this PR needed?

Currently, we support only reading the tables created in spark in presto. Its a bottleneck and writing the trasactional is required
in presto for easy write and read via presto.

What changes were proposed in this PR?

This PR iis on top of #3875
This PR supports writing the partition transactional data in presto, it supports multiple partition columns too.

Does this PR introduce any user interface change?

  • No

Is any new testcase added?

  • Yes

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2270/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4010/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2404/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4145/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2454/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4197/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2564/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4314/

@QiangCai
Copy link
Contributor

please do rebase

@ajantha-bhat
Copy link
Member

@akashrn5 : please rebase as presto write PR is merged

@akashrn5
Copy link
Contributor Author

akashrn5 commented Oct 27, 2020

@ajantha-bhat @QiangCai i have rebased, please have a look.

@@ -369,7 +369,8 @@ private CarbonCommonConstants() {
public static final String CARBON_MERGE_INDEX_IN_SEGMENT =
"carbon.merge.index.in.segment";

public static final String CARBON_MERGE_INDEX_IN_SEGMENT_DEFAULT = "true";
// TODO: revert this after proper fix in this PR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert this

String tableFactLocation =
carbonLoadModel.getCarbonDataLoadSchema().getCarbonTable().getTablePath();
List<CarbonFile> carbonFiles =
FileFactory.getCarbonFile(tableFactLocation).listFiles(true, new CarbonFileFilter() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list files of whole table can be very slow when multiple segments are present. we list previous load segments also here. I think we need to keep list of index/merge index created for current load in memory and write in the segment file here.

context.getConfiguration().set("carbon.outputformat.writepath", finalOutPath.toString());
String[] outputPathSplits = finalOutPath.toString().split("/");
StringBuilder partitionDirs = new StringBuilder();
for (int i = partitionColumn; i > 0; i--) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, how carbondata-hive partition write is working ?

private val prestoServer = new PrestoServer

override def beforeAll: Unit = {
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove written by, it has to take care internally

@ajantha-bhat
Copy link
Member

@akashrn5
a) Please update the PR description about what all problems were there and what changes done to support it. Now it is not clear.
b) Please confirm here whether the cluster test is passed.

@CarbonDataQA1
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4700/

@CarbonDataQA1
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2943/

@akashrn5 akashrn5 changed the title [CARBONDATA-3935]Support partition table transactional write in presto [WIP][CARBONDATA-3935]Support partition table transactional write in presto Oct 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants