New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-4091] support prestosql 333 integartion with carbon #4034
Conversation
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5017/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3261/ |
7f665e5
to
5be2097
Compare
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3349/ |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5111/ |
8fc6686
to
5be2097
Compare
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5119/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3357/ |
9624fb3
to
c294078
Compare
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5123/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3361/ |
c294078
to
2fc5e1a
Compare
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5201/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3441/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3300/ |
2fc5e1a
to
7c3505d
Compare
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3450/ |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5210/ |
7c3505d
to
cf9605b
Compare
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5264/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3503/ |
integration/presto/pom.xml
Outdated
@@ -294,7 +294,7 @@ | |||
<dependency> | |||
<groupId>io.airlift</groupId> | |||
<artifactId>json</artifactId> | |||
<version>0.144</version> | |||
<version>0.193</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a suggestion, can you put this version in a variable in POM, both airlift and jackson, so that in future while changing version it will be easier and chances of missing will be less.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a different maven profile needs different version, then it makes sense to add a variable, here it is not needed. Also, each is a different artifact.
HiveStatisticsProvider hiveStatisticsProvider, | ||
AccessControlMetadata accessControlMetadata) { | ||
super(catalogName, metastore, hdfsEnvironment, partitionManager, timeZone, | ||
allowCorruptWritesForTesting, writesToNonManagedTablesEnabled, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
writesToNonManagedTablesEnabled
make this as true
like before as it was failing for insert scenarios when impletemented write feature. Or is the default value changed now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, cluster I face some issue in the insert. Still debugging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Insert has passed now.
presto:aj333> select * from t1;
name
aj
ab
(2 rows)
Query 20210113_015933_00005_6sfgc, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:04 [2 rows, 22B] [0 rows/s, 5B/s]
presto:aj333> insert into t1 values('junk');
INSERT: 1 row
Query 20210113_015940_00006_6sfgc, FINISHED, 1 node
Splits: 35 total, 35 done (100.00%)
0:04 [0 rows, 0B] [0 rows/s, 0B/s]
presto:aj333> select * from t1;
name
junk
aj
ab
(3 rows)
Query 20210113_015951_00007_6sfgc, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:00 [3 rows, 35B] [11 rows/s, 137B/s]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted back to writesToNonManagedTablesEnabled instead of hardcoded true value as cluster test is passed after reverting also.
handle.getPageSinkMetadata(), | ||
new HiveMetastoreClosure( | ||
memoizeMetastore(metastore, perTransactionMetastoreCacheMaximumSize)), | ||
new HiveIdentity(session)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from line 151 to 155, please reformat the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
public class CarbonMetadataFactory extends HiveMetadataFactory { | ||
|
||
private static final Logger log = Logger.get(HiveMetadataFactory.class); | ||
private final boolean allowCorruptWritesForTesting; | ||
private final boolean skipDeletionForAlter; | ||
private final boolean skipTargetCleanupOnRollback; | ||
private final boolean writesToNonManagedTablesEnabled = true; | ||
private final boolean writesToNonManagedTablesEnabled; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please confirm that once after this changes, the insert is working fine in cluster test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, cluster it works fine without hardcoding writesToNonManagedTablesEnabled = true
public CarbondataConnectorFactory(String connectorName, ClassLoader classLoader) { | ||
super(connectorName, classLoader, Optional.empty()); | ||
this.classLoader = requireNonNull(classLoader, "classLoader is null"); | ||
public CarbondataConnectorFactory(String name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public CarbondataConnectorFactory(String name) { | |
public CarbondataConnectorFactory(String connectorName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
throw new RuntimeException(e); | ||
} | ||
} | ||
|
||
/** | ||
* Set the Carbon format enum to HiveStorageFormat, its a hack but for time being it is best | ||
* choice to avoid lot of code change. | ||
* | ||
* @throws Exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can revert this, as no specific info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is good to have proper java doc. Also it doesn't have any params, but stills throws exception. so recording in java doc. you can also observe similar in old code
} | ||
|
||
public static final class TypeDeserializer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this? as its already present in HiveModule class and accessible right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, can use it.
} | ||
// TODO: check and use dynamicFilter in CarbondataPageSource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if possible, can you create jira for this and add the jira in TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import static io.airlift.configuration.ConfigBinder.configBinder; | ||
import static java.util.Objects.requireNonNull; | ||
|
||
public final class InternalCarbonDataConnectorFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a class level comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -780,9 +780,9 @@ | |||
<activeByDefault>true</activeByDefault> | |||
</activation> | |||
<properties> | |||
<presto.version>316</presto.version> | |||
<presto.version>333</presto.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to one of the able comment, can we design this a variable and use everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<presto.version>
is a variable itself.
cf9605b
to
2609395
Compare
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5291/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3531/ |
2609395
to
8a820e5
Compare
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5304/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3544/ |
retest this please |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5126/ |
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3375/ |
retest this please |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5132/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3381/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/3605/ |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5350/ |
LGTM |
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5815/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4071/ |
Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/218/ |
retest this please |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4075/ |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5820/ |
Build Failed with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/223/ |
retest this please |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5836/ |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4092/ |
Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/240/ |
LGTM |
retest this please |
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_2.4.5/4097/ |
Build Success with Spark 3.1, Please check CI http://121.244.95.60:12602/job/ApacheCarbon_PR_Builder_3.1/245/ |
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12602/job/ApacheCarbonPRBuilder2.3/5841/ |
Why is this PR needed?
Currently carbondata is integrated with presto-sql 316, which is 1.5 years older.
There are many good features and optimization that came into presto like dynamic filtering, Rubix data cache and some performance improvements.
It is always good to use latest version, latest version is presto-sql 348.
But jumping from 316 to 348 will be too many changes.
So, to utilize these new features and based on customer demand, I suggest to upgrade presto-sql to 333 version.
Later it will be again upgraded to more latest version in few months.
Note:
This is a plain integration to support all existing features of presto316, deep integration to support new features like dynamic filtering, Rubix cache will be handled in another PR.
What changes were proposed in this PR?
Note: JAVA 11 environment is needed for running presto333 with carbon and also need add this jvm property "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED"
Does this PR introduce any user interface change?
Is any new testcase added?