[Bug] [Module Name] Hive Source read_partitions option 当配置的分区不在表文件目录时会被throw 异常提早退出 #5811

viverlxl · 2023-11-08T07:33:13Z

Search before asking

I had searched in the issues and found no similar issues.

What happened

如下配置，当指定分区的时候，如果该表的有任何一个分区路径没有包含在分区的时候，直接抛出异常。

SeaTunnel Version

2.3.3

SeaTunnel Config

env {
  spark.app.name = "hive_to_ck_file"
  spark.executor.instances = 4
  spark.executor.cores = 1
  spark.executor.memory = "3g"
  // This configuration is required
  spark.sql.catalogImplementation = "hive"
  spark.executor.extraJavaOptions = "-Dfile.encoding=UTF-8"
  spark.driver.extraJavaOptions = "-Dfile.encoding=UTF-8"
  spark.hadoop.hive.exec.dynamic.partition = "true"
  spark.hadoop.hive.exec.dynamic.partition.mode = "nonstrict"
  spark.debug.maxToStringFields = 100000
}
source {
    hive {
        table_name = "xxxx.xxxx"
        metastore_uri = "thrift://xxxx:9083"
        result_table_name = "xxx_d_test_01"
        parallelism = 4
        read_partitions = ["dt=2023-10-30"]
    }
}
transform {}
sink {
    ClickhouseFile {
        host = "xxxx:8123"
        server_time_zone = "Asia/Shanghai"
        database = "dms_ddcx_xxx"
        parallelism = 4
        table = "xx_d_test_test"
        sharding_key = "diversion_id"
        username = "xxx"
        password = "xxxx"
        node_free_password = true
        clickhouse_local_path = "/opt/software/clickhouse local"
        node_pass = []
    }
}

Running Command

./bin/start-seatunnel-spark-3-connector-v2.sh --master yarn --deploy-mode cluster --config config/hive_to_ck_online.config

Error Exception

throw new FileConnectorException(
                    FileConnectorErrorCode.FILE_LIST_EMPTY,
                    "The target file list is empty,"
                            + "SeaTunnel will not be able to sync empty table, "
                            + "please check the configuration parameters such as: [file_filter_pattern]");

Zeta or Flink or Spark Version

spark: 3.2.4

Java or Scala Version

java 1.8

Screenshots

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

viverlxl · 2023-11-08T07:34:03Z

判断fileNames是不是空，应该要在递归函数之外

Carl-Zhou-CN · 2023-11-08T13:08:10Z

@Hisoka-X I'd like to try it. Please assign it to me

hailin0 · 2023-11-08T14:19:51Z

this pr fixed
#5591

Carl-Zhou-CN · 2023-11-13T10:13:24Z

@viverlxl hi, can you help verify this problem?

viverlxl · 2023-11-13T11:02:50Z

@viverlxl hi, can you help verify this problem?

yes, LGTM

Carl-Zhou-CN · 2023-11-20T03:28:41Z

@viverlxl hi, can you help verify this problem?

yes, LGTM

Let me close this issue

viverlxl added the bug label Nov 8, 2023

Hisoka-X added the help wanted label Nov 8, 2023

hailin0 assigned Carl-Zhou-CN Nov 8, 2023

Carl-Zhou-CN closed this as completed Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] [Module Name] Hive Source read_partitions option 当配置的分区不在表文件目录时会被throw 异常提早退出 #5811

[Bug] [Module Name] Hive Source read_partitions option 当配置的分区不在表文件目录时会被throw 异常提早退出 #5811

viverlxl commented Nov 8, 2023

viverlxl commented Nov 8, 2023

Carl-Zhou-CN commented Nov 8, 2023

hailin0 commented Nov 8, 2023

Carl-Zhou-CN commented Nov 13, 2023

viverlxl commented Nov 13, 2023

Carl-Zhou-CN commented Nov 20, 2023

[Bug] [Module Name] Hive Source read_partitions option 当配置的分区不在表文件目录时会被throw 异常提早退出 #5811

[Bug] [Module Name] Hive Source read_partitions option 当配置的分区不在表文件目录时会被throw 异常提早退出 #5811

Comments

viverlxl commented Nov 8, 2023

Search before asking

What happened

SeaTunnel Version

SeaTunnel Config

Running Command

Error Exception

Zeta or Flink or Spark Version

Java or Scala Version

Screenshots

Are you willing to submit PR?

Code of Conduct

viverlxl commented Nov 8, 2023

Carl-Zhou-CN commented Nov 8, 2023

hailin0 commented Nov 8, 2023

Carl-Zhou-CN commented Nov 13, 2023

viverlxl commented Nov 13, 2023

Carl-Zhou-CN commented Nov 20, 2023