Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [Module Name] Hive Source read_partitions option 当配置的分区不在表文件目录时会被throw 异常提早退出 #5811

Closed
2 of 3 tasks
viverlxl opened this issue Nov 8, 2023 · 6 comments
Assignees

Comments

@viverlxl
Copy link
Contributor

viverlxl commented Nov 8, 2023

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

如下配置,当指定分区的时候,如果该表的有任何一个分区路径没有包含在分区的时候,直接抛出异常。
image

SeaTunnel Version

2.3.3

SeaTunnel Config

env {
  spark.app.name = "hive_to_ck_file"
  spark.executor.instances = 4
  spark.executor.cores = 1
  spark.executor.memory = "3g"
  // This configuration is required
  spark.sql.catalogImplementation = "hive"
  spark.executor.extraJavaOptions = "-Dfile.encoding=UTF-8"
  spark.driver.extraJavaOptions = "-Dfile.encoding=UTF-8"
  spark.hadoop.hive.exec.dynamic.partition = "true"
  spark.hadoop.hive.exec.dynamic.partition.mode = "nonstrict"
  spark.debug.maxToStringFields = 100000
}
source {
    hive {
        table_name = "xxxx.xxxx"
        metastore_uri = "thrift://xxxx:9083"
        result_table_name = "xxx_d_test_01"
        parallelism = 4
        read_partitions = ["dt=2023-10-30"]
    }
}
transform {}
sink {
    ClickhouseFile {
        host = "xxxx:8123"
        server_time_zone = "Asia/Shanghai"
        database = "dms_ddcx_xxx"
        parallelism = 4
        table = "xx_d_test_test"
        sharding_key = "diversion_id"
        username = "xxx"
        password = "xxxx"
        node_free_password = true
        clickhouse_local_path = "/opt/software/clickhouse local"
        node_pass = []
    }
}

Running Command

./bin/start-seatunnel-spark-3-connector-v2.sh --master yarn --deploy-mode cluster --config config/hive_to_ck_online.config

Error Exception

throw new FileConnectorException(
                    FileConnectorErrorCode.FILE_LIST_EMPTY,
                    "The target file list is empty,"
                            + "SeaTunnel will not be able to sync empty table, "
                            + "please check the configuration parameters such as: [file_filter_pattern]");

Zeta or Flink or Spark Version

spark: 3.2.4

Java or Scala Version

java 1.8

Screenshots

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@viverlxl viverlxl added the bug label Nov 8, 2023
@viverlxl
Copy link
Contributor Author

viverlxl commented Nov 8, 2023

判断fileNames是不是空,应该要在递归函数之外

@Carl-Zhou-CN
Copy link
Member

@Hisoka-X I'd like to try it. Please assign it to me

@hailin0
Copy link
Member

hailin0 commented Nov 8, 2023

this pr fixed
#5591

@Carl-Zhou-CN
Copy link
Member

@viverlxl hi, can you help verify this problem?

@viverlxl
Copy link
Contributor Author

@viverlxl hi, can you help verify this problem?

yes, LGTM

@Carl-Zhou-CN
Copy link
Member

@viverlxl hi, can you help verify this problem?

yes, LGTM

Let me close this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants