[Bug] [connector-hive] hive_site_path 为 hdfs 文件时，文件无法找到 #7624

PorterXie · 2024-09-10T07:58:10Z

Search before asking

I had searched in the issues and found no similar issues.

What happened

现象：

flink on yarn Application 运行模式，使用 hive sink 时，指定了 metastoreUris，并设置了 hive_site_path 路径为 hdfs://xxxx/xxxx/hive-site.xml，AbstractStorage 中会将 hive_site_path 处理为当前 container 运行目录的子目录，例如 /xxxxx/application-xxxxx/container-xxxxx/hdfs://xxxx/xxxx/hive-site.xml，导致 hive-site.xml 找不到

问题说明：

1、首先从异常栈看，涉及到 HDFSStorge 类中对文件的读取，从命名上来看，我猜测这个类在设计之初，应该是可以支持 hdfs 路径的，这一点我没在官方文档上找到相应的描述，因此无法印证。
2、如果是支持 hdfs 路径的读取，那么将 hive_site_path 的处理应该是 Hadoop的 Path 而非 Java 的 File

SeaTunnel Version

2.3.7

SeaTunnel Config

{
    "env":
    {
        "job.mode": "BATCH",
        "parallelism": 1,
        "job.name": "Mysql2Hive_instance_1725952817332_13102"
    },
    "source":
    [
        {
            "_type": "mysql_source",
            "url": "jdbc:mysql://172.16.19.183:3306/xieyue_full_2?CatalogMeansCurrent=true&characterEncoding=UTF-8",
            "user": "root",
            "password": "123456",
            "query": "select `alert_group_id`,`alert_group_name` from `xieyue_full_2`.`alert_group`",
            "result_table_name": "alert_group_1526751128",
            "fetch_size": 5000,
            "table_path": "xieyue_full_2.alert_group"
        }
    ],
    "sink":
    [
        {
            "_type": "hive_sink",
            "source_table_name": "alert_group_1526751128_t",
            "table_name": "zgl.ods_input_data",
            "metastore_uri": "thrift://u01:9083",
            "hive_site_path": "hdfs:///hive/config/hive-site.xml"
        }
    ],
    "transform":
    [
        {
            "_type": "sql_trans",
            "query": "SELECT TRY_CAST(alert_group_id AS INTEGER) as id,TRY_CAST(alert_group_name AS VARCHAR) as name FROM alert_group_1526751128",
            "source_table_name": "alert_group_1526751128",
            "result_table_name": "alert_group_1526751128_t"
        }
    ]
}

Running Command

FlinkExecution flinkExecution = new FlinkExecution();
flinkExecution.execute();

Error Exception

No such file or directory

Zeta or Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

PorterXie added the bug label Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] [connector-hive] hive_site_path 为 hdfs 文件时，文件无法找到 #7624

[Bug] [connector-hive] hive_site_path 为 hdfs 文件时，文件无法找到 #7624

PorterXie commented Sep 10, 2024

[Bug] [connector-hive] hive_site_path 为 hdfs 文件时，文件无法找到 #7624

[Bug] [connector-hive] hive_site_path 为 hdfs 文件时，文件无法找到 #7624

Comments

PorterXie commented Sep 10, 2024

Search before asking

What happened

SeaTunnel Version

SeaTunnel Config

Running Command

Error Exception

Zeta or Flink or Spark Version

Java or Scala Version

Screenshots

Are you willing to submit PR?

Code of Conduct