Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [connector-hive] hive_site_path 为 hdfs 文件时,文件无法找到 #7624

Open
2 of 3 tasks
PorterXie opened this issue Sep 10, 2024 · 0 comments
Open
2 of 3 tasks
Labels

Comments

@PorterXie
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

现象:

flink on yarn Application 运行模式,使用 hive sink 时,指定了 metastoreUris,并设置了 hive_site_path 路径为 hdfs://xxxx/xxxx/hive-site.xml,AbstractStorage 中 会将 hive_site_path 处理为当前 container 运行目录的子目录,例如 /xxxxx/application-xxxxx/container-xxxxx/hdfs://xxxx/xxxx/hive-site.xml,导致 hive-site.xml 找不到
0f385a75b95e62ffd43f9ede68c1e04
768aff40851a29d0078a81cf7a6daf9

问题说明:

1、首先从异常栈看,涉及到 HDFSStorge 类中对文件的读取,从命名上来看,我猜测这个类在设计之初,应该是可以支持 hdfs 路径的,这一点我没在官方文档上找到相应的描述,因此无法印证。
2、如果是支持 hdfs 路径的读取,那么将 hive_site_path 的处理应该是 Hadoop的 Path 而非 Java 的 File

SeaTunnel Version

2.3.7

SeaTunnel Config

{
    "env":
    {
        "job.mode": "BATCH",
        "parallelism": 1,
        "job.name": "Mysql2Hive_instance_1725952817332_13102"
    },
    "source":
    [
        {
            "_type": "mysql_source",
            "url": "jdbc:mysql://172.16.19.183:3306/xieyue_full_2?CatalogMeansCurrent=true&characterEncoding=UTF-8",
            "user": "root",
            "password": "123456",
            "query": "select `alert_group_id`,`alert_group_name` from `xieyue_full_2`.`alert_group`",
            "result_table_name": "alert_group_1526751128",
            "fetch_size": 5000,
            "table_path": "xieyue_full_2.alert_group"
        }
    ],
    "sink":
    [
        {
            "_type": "hive_sink",
            "source_table_name": "alert_group_1526751128_t",
            "table_name": "zgl.ods_input_data",
            "metastore_uri": "thrift://u01:9083",
            "hive_site_path": "hdfs:///hive/config/hive-site.xml"
        }
    ],
    "transform":
    [
        {
            "_type": "sql_trans",
            "query": "SELECT TRY_CAST(alert_group_id AS INTEGER) as id,TRY_CAST(alert_group_name AS VARCHAR) as name FROM alert_group_1526751128",
            "source_table_name": "alert_group_1526751128",
            "result_table_name": "alert_group_1526751128_t"
        }
    ]
}

Running Command

FlinkExecution flinkExecution = new FlinkExecution();
flinkExecution.execute();

Error Exception

No such file or directory

Zeta or Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@PorterXie PorterXie added the bug label Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant