New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][CARBONDATA-3935]Support partition table transactional write in presto #3916
base: master
Are you sure you want to change the base?
Conversation
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2270/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4010/ |
d9563d8
to
ee27100
Compare
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2404/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4145/ |
ee27100
to
812aa22
Compare
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2454/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4197/ |
812aa22
to
6f42b16
Compare
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2564/ |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4314/ |
please do rebase |
@akashrn5 : please rebase as presto write PR is merged |
6f42b16
to
19bc2b1
Compare
@ajantha-bhat @QiangCai i have rebased, please have a look. |
@@ -369,7 +369,8 @@ private CarbonCommonConstants() { | |||
public static final String CARBON_MERGE_INDEX_IN_SEGMENT = | |||
"carbon.merge.index.in.segment"; | |||
|
|||
public static final String CARBON_MERGE_INDEX_IN_SEGMENT_DEFAULT = "true"; | |||
// TODO: revert this after proper fix in this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please revert this
String tableFactLocation = | ||
carbonLoadModel.getCarbonDataLoadSchema().getCarbonTable().getTablePath(); | ||
List<CarbonFile> carbonFiles = | ||
FileFactory.getCarbonFile(tableFactLocation).listFiles(true, new CarbonFileFilter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list files of whole table can be very slow when multiple segments are present. we list previous load segments also here. I think we need to keep list of index/merge index created for current load in memory and write in the segment file here.
context.getConfiguration().set("carbon.outputformat.writepath", finalOutPath.toString()); | ||
String[] outputPathSplits = finalOutPath.toString().split("/"); | ||
StringBuilder partitionDirs = new StringBuilder(); | ||
for (int i = partitionColumn; i > 0; i--) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, how carbondata-hive partition write is working ?
private val prestoServer = new PrestoServer | ||
|
||
override def beforeAll: Unit = { | ||
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_WRITTEN_BY_APPNAME, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove written by, it has to take care internally
@akashrn5 |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4700/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/2943/ |
Why is this PR needed?
Currently, we support only reading the tables created in spark in presto. Its a bottleneck and writing the trasactional is required
in presto for easy write and read via presto.
What changes were proposed in this PR?
This PR iis on top of #3875
This PR supports writing the partition transactional data in presto, it supports multiple partition columns too.
Does this PR introduce any user interface change?
Is any new testcase added?