Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] upgrade hadoop\spark\hive default vertion to 3.x #4263

Merged
merged 7 commits into from
Mar 6, 2023

Conversation

GuoPhilipse
Copy link
Member

What is the purpose of the change

we now support different hadoop.hive,spark vesion, so we can upgrade hadoop\spark\hive default vertion to 3.x to solve possible security issue

Related issues/PRs

Related issues: #4262
Related pr:#4263

Brief change log

  • upgrade hadoop from 2.7.2 ->3.3.4
  • upgrade hive from 2.3.3 ->3.1.3
  • upgrade spark from 2.4.3 ->3.2.1
  • remove profile spark-3.2 for spark will be 3.x by default
  • remove profile hadoop-3.3 for hadoop will be 3.x by default
  • modify profile spark-2.4-hadoop-3.3 to spark-2.4 for hadoop vesion will be 3.x by default
  • upgrade default curator\json4s\scala\hadoop-hdfs-client.artifact properties to fit for hadoop3/spark3
  • update doc introduction
  • update known depdency

Checklist

  • I have read the Contributing Guidelines on pull requests.
  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (If not, please discuss on the Linkis mailing list first)
  • If this is a code change: I have written unit tests to fully verify the new behavior.

Copy link
Contributor

@jackxu2011 jackxu2011 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the spark-2.7 should change some properties in global, such as json4s

pom.xml Show resolved Hide resolved
<scope>provided</scope>
<groupId>org.apache.linkis</groupId>
<artifactId>linkis-hadoop-hdfs-client-shade</artifactId>
<version>${project.version}</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this dependency is not needed by spark 3.2, it's just for spark2.4 working with hadoop3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will try variable to make it suitable only for spark2.4, just keep hadoop-common and hadoop-hdfs provided if no t profile spark2.4

Copy link
Contributor

@jackxu2011 jackxu2011 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -41,6 +41,6 @@ public class UJESConstants {

public static final int DEFAULT_PAGE_SIZE = 500;

public static final String DEFAULT_SPARK_ENGINE = "spark-2.4.3";
public static final String DEFAULT_HIVE_ENGINE = "hive-1.2.1";
public static final String DEFAULT_SPARK_ENGINE = "spark-3.2.1";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove

@@ -23,9 +23,9 @@ object GovernanceCommonConf {

val CONF_FILTER_RM = "wds.linkis.rm"

val SPARK_ENGINE_VERSION = CommonVars("wds.linkis.spark.engine.version", "2.4.3")
val SPARK_ENGINE_VERSION = CommonVars("wds.linkis.spark.engine.version", "3.2.1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use LabelCommonConfig:HIVE_ENGINE_VERSION and SPARK_ENGINE_VERSION

@@ -29,7 +29,7 @@ public static void main(String[] args) {
SubmittableInteractiveJob job =
LinkisJobClient.interactive()
.builder()
.setEngineType("hive-2.3.3")
.setEngineType("hive-3.1.3")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use org.apache.linkis.manager.label.conf.LabelCommonConfig#HIVE_ENGINE_VERSION

@@ -134,7 +134,7 @@ class CommonEntranceParser(val persistenceManager: PersistenceManager)
private def checkEngineTypeLabel(labels: util.Map[String, Label[_]]): Unit = {
val engineTypeLabel = labels.getOrDefault(LabelKeyConstant.ENGINE_TYPE_KEY, null)
if (null == engineTypeLabel) {
val msg = s"You need to specify engineTypeLabel in labels, such as spark-2.4.3"
val msg = s"You need to specify engineTypeLabel in labels, such as spark-3.2.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use org.apache.linkis.manager.label.conf.LabelCommonConfig#SPARK_ENGINE_VERSION

@@ -27,7 +27,7 @@ public class TestLabelBuilder {

public static void main(String[] args) throws LabelErrorException {
LabelBuilderFactory labelBuilderFactory = LabelBuilderFactoryContext.getLabelBuilderFactory();
Label<?> engineType = labelBuilderFactory.createLabel("engineType", "hive-1.2.1");
Label<?> engineType = labelBuilderFactory.createLabel("engineType", "hive-3.1.3");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use org.apache.linkis.manager.label.conf.LabelCommonConfig#HIVE_ENGINE_VERSION

@@ -23,7 +23,7 @@ object ManagerCommonConf {

val DEFAULT_ENGINE_TYPE = CommonVars("wds.linkis.default.engine.type", "spark")

val DEFAULT_ENGINE_VERSION = CommonVars("wds.linkis.default.engine.version", "2.4.3")
val DEFAULT_ENGINE_VERSION = CommonVars("wds.linkis.default.engine.version", "3.2.1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove

@@ -1183,12 +1183,12 @@ data:
(select `relation`.`config_key_id` AS `config_key_id`, '' AS `config_value`, `relation`.`engine_type_label_id` AS `config_label_id` FROM linkis_ps_configuration_key_engine_relation relation
INNER JOIN linkis_cg_manager_label label ON relation.engine_type_label_id = label.id AND label.label_value = '*-*,*-*');

-- spark2.4.3 default configuration
-- spark3.2.1 default configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove 3.2.1

insert into `linkis_ps_configuration_config_value` (`config_key_id`, `config_value`, `config_label_id`)
(select `relation`.`config_key_id` AS `config_key_id`, '' AS `config_value`, `relation`.`engine_type_label_id` AS `config_label_id` FROM linkis_ps_configuration_key_engine_relation relation
INNER JOIN linkis_cg_manager_label label ON relation.engine_type_label_id = label.id AND label.label_value = @SPARK_ALL);

-- hive1.2.1 default configuration
-- hive3.1.3 default configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove 3.1.3

@@ -141,7 +141,7 @@ data:
spark.sql.autoBroadcastJoinThreshold 26214400
spark.sql.hive.convertMetastoreOrc true
spark.sql.hive.metastore.jars /opt/ldh/current/spark/jars/*
spark.sql.hive.metastore.version 2.3.3
spark.sql.hive.metastore.version 3.1.3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use 2.3.3?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark3.2.1 use hive meta 2.3.9 by default ,maybe we can change as 2.3.9 ?

-- Global Settings
insert into `linkis_ps_configuration_key_engine_relation` (`config_key_id`, `engine_type_label_id`)
(select config.id as `config_key_id`, label.id AS `engine_type_label_id` FROM linkis_ps_configuration_config_key config
INNER JOIN linkis_cg_manager_label label ON config.engine_conn_type is null and label.label_value = "*-*,*-*");

-- spark-2.4.3(Here choose to associate all spark type Key values with spark2.4.3)
-- spark-3.2.1(Here choose to associate all spark type Key values with spark3.2.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove 3.2.1 and 3.1.3

Copy link
Contributor

@peacewong peacewong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@peacewong peacewong merged commit eb55412 into apache:dev-1.4.0 Mar 6, 2023
@GuoPhilipse GuoPhilipse deleted the upgradeversion branch March 7, 2023 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants