Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-4091] support prestosql 333 integartion with carbon #4034

Closed
wants to merge 1 commit into from

Conversation

ajantha-bhat
Copy link
Member

@ajantha-bhat ajantha-bhat commented Dec 2, 2020

Why is this PR needed?

Currently carbondata is integrated with presto-sql 316, which is 1.5 years older.
There are many good features and optimization that came into presto like dynamic filtering, Rubix data cache and some performance improvements.

It is always good to use latest version, latest version is presto-sql 348.
But jumping from 316 to 348 will be too many changes.
So, to utilize these new features and based on customer demand, I suggest to upgrade presto-sql to 333 version.
Later it will be again upgraded to more latest version in few months.

Note:
This is a plain integration to support all existing features of presto316, deep integration to support new features like dynamic filtering, Rubix cache will be handled in another PR.

What changes were proposed in this PR?

  • Adapt to the new hive adapter changes like some constructor changes, Made a carbonDataConnector to support CarbonDataHandleResolver
  • Java 11 removed ConstructorAccessor class, so using unsafe class for reflection. (presto333 depend on java 11 for runtime)
  • POM changes to support presto333

Note: JAVA 11 environment is needed for running presto333 with carbon and also need add this jvm property "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED"

Does this PR introduce any user interface change?

  • No

Is any new testcase added?

  • No

@CarbonDataQA2
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5017/

@CarbonDataQA2
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3261/

@CarbonDataQA2
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3349/

@CarbonDataQA2
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5111/

@ajantha-bhat ajantha-bhat force-pushed the presto333 branch 2 times, most recently from 8fc6686 to 5be2097 Compare December 8, 2020 17:12
@CarbonDataQA2
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5119/

@CarbonDataQA2
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3357/

@CarbonDataQA2
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5123/

@CarbonDataQA2
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3361/

@CarbonDataQA2
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5201/

@CarbonDataQA2
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3441/

@CarbonDataQA2
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3300/

@ajantha-bhat ajantha-bhat changed the title [WIP] support prestosql 333 [CARBONDATA-4091] support prestosql 333 integartion with carbon Dec 18, 2020
@CarbonDataQA2
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3450/

@CarbonDataQA2
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5210/

@CarbonDataQA2
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5264/

@CarbonDataQA2
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3503/

@@ -294,7 +294,7 @@
<dependency>
<groupId>io.airlift</groupId>
<artifactId>json</artifactId>
<version>0.144</version>
<version>0.193</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a suggestion, can you put this version in a variable in POM, both airlift and jackson, so that in future while changing version it will be easier and chances of missing will be less.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a different maven profile needs different version, then it makes sense to add a variable, here it is not needed. Also, each is a different artifact.

HiveStatisticsProvider hiveStatisticsProvider,
AccessControlMetadata accessControlMetadata) {
super(catalogName, metastore, hdfsEnvironment, partitionManager, timeZone,
allowCorruptWritesForTesting, writesToNonManagedTablesEnabled,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writesToNonManagedTablesEnabled make this as true like before as it was failing for insert scenarios when impletemented write feature. Or is the default value changed now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, cluster I face some issue in the insert. Still debugging

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insert has passed now.

presto:aj333> select * from t1;
name

aj
ab
(2 rows)

Query 20210113_015933_00005_6sfgc, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:04 [2 rows, 22B] [0 rows/s, 5B/s]

presto:aj333> insert into t1 values('junk');
INSERT: 1 row

Query 20210113_015940_00006_6sfgc, FINISHED, 1 node
Splits: 35 total, 35 done (100.00%)
0:04 [0 rows, 0B] [0 rows/s, 0B/s]

presto:aj333> select * from t1;
name

junk
aj
ab
(3 rows)

Query 20210113_015951_00007_6sfgc, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:00 [3 rows, 35B] [11 rows/s, 137B/s]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted back to writesToNonManagedTablesEnabled instead of hardcoded true value as cluster test is passed after reverting also.

handle.getPageSinkMetadata(),
new HiveMetastoreClosure(
memoizeMetastore(metastore, perTransactionMetastoreCacheMaximumSize)),
new HiveIdentity(session)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from line 151 to 155, please reformat the code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

public class CarbonMetadataFactory extends HiveMetadataFactory {

private static final Logger log = Logger.get(HiveMetadataFactory.class);
private final boolean allowCorruptWritesForTesting;
private final boolean skipDeletionForAlter;
private final boolean skipTargetCleanupOnRollback;
private final boolean writesToNonManagedTablesEnabled = true;
private final boolean writesToNonManagedTablesEnabled;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please confirm that once after this changes, the insert is working fine in cluster test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, cluster it works fine without hardcoding writesToNonManagedTablesEnabled = true

public CarbondataConnectorFactory(String connectorName, ClassLoader classLoader) {
super(connectorName, classLoader, Optional.empty());
this.classLoader = requireNonNull(classLoader, "classLoader is null");
public CarbondataConnectorFactory(String name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public CarbondataConnectorFactory(String name) {
public CarbondataConnectorFactory(String connectorName) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

throw new RuntimeException(e);
}
}

/**
* Set the Carbon format enum to HiveStorageFormat, its a hack but for time being it is best
* choice to avoid lot of code change.
*
* @throws Exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can revert this, as no specific info

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is good to have proper java doc. Also it doesn't have any params, but stills throws exception. so recording in java doc. you can also observe similar in old code

}

public static final class TypeDeserializer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this? as its already present in HiveModule class and accessible right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, can use it.

}
// TODO: check and use dynamicFilter in CarbondataPageSource
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if possible, can you create jira for this and add the jira in TODO?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import static io.airlift.configuration.ConfigBinder.configBinder;
import static java.util.Objects.requireNonNull;

public final class InternalCarbonDataConnectorFactory {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a class level comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -780,9 +780,9 @@
<activeByDefault>true</activeByDefault>
</activation>
<properties>
<presto.version>316</presto.version>
<presto.version>333</presto.version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to one of the able comment, can we design this a variable and use everywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<presto.version> is a variable itself.

@CarbonDataQA2
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5291/

@CarbonDataQA2
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3531/

@CarbonDataQA2
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5304/

@CarbonDataQA2
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3544/

@ajantha-bhat
Copy link
Member Author

retest this please

@CarbonDataQA2
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5126/

@CarbonDataQA2
Copy link

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3375/

@ydvpankaj99
Copy link
Contributor

retest this please

@CarbonDataQA2
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5132/

@CarbonDataQA2
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3381/

@CarbonDataQA2
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3605/

@CarbonDataQA2
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5350/

@jackylk
Copy link
Contributor

jackylk commented May 18, 2021

LGTM

@CarbonDataQA2
Copy link

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5815/

@CarbonDataQA2
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4071/

@CarbonDataQA2
Copy link

Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/218/

@ajantha-bhat
Copy link
Member Author

retest this please

@CarbonDataQA2
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4075/

@CarbonDataQA2
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5820/

@CarbonDataQA2
Copy link

Build Failed with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/223/

@brijoobopanna
Copy link
Contributor

retest this please

@CarbonDataQA2
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5836/

@CarbonDataQA2
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4092/

@CarbonDataQA2
Copy link

Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/240/

@akashrn5
Copy link
Contributor

LGTM

@Indhumathi27
Copy link
Contributor

retest this please

@CarbonDataQA2
Copy link

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4097/

@CarbonDataQA2
Copy link

Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/245/

@CarbonDataQA2
Copy link

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5841/

@asfgit asfgit closed this in 1ccf295 Aug 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants