Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

批处理方式新增参数替换功能 #153

Closed
duowan1520 opened this issue Sep 11, 2018 · 5 comments
Closed

批处理方式新增参数替换功能 #153

duowan1520 opened this issue Sep 11, 2018 · 5 comments

Comments

@duowan1520
Copy link

duowan1520 commented Sep 11, 2018

在用批处理方式执行任务时,通常需要定时执行处理逻辑。并且在每次执行时需要带上不同的参数(调度时传递)。是否可以新增参数替换功能,方便读取像hdfs/hive中不同目录/分区的数据。

@garyelephant
Copy link
Contributor

你好,参数替换功能我们正在考虑增加,你所需要的参数替换功能,请举一个具体的例子,我们优先支持。

@duowan1520
Copy link
Author

    你好,我们在对数仓模型进行数据开发时,很多模型不需要实时计算,通常我们把它们放在hive或hdfs中按分区/目录存储。计算周期可能是天、周、月。
    比如:按天统计,13号凌晨,需要从昨天的分区中取出数据进行汇总:select id from t where dt = 20180912 and col1=xxx 我们从模型t中的dt=20180912分区中取出数据。任务上线后,需要每天跑前一天的数据。这时,就希望条件dt=20180912中的20180912这个日期值是可替换的,每天调度执行时都是不同的值。另外,在一条SQL中可否考虑下支持多个参数替换。

@garyelephant
Copy link
Contributor

@duoduo5566 你好,你的描述很详尽,我们也正有意加入类似功能,我的微信号码 garyelephant, 请加我微信我们持续沟通和跟进。

@duowan1520
Copy link
Author

好的,很荣幸!谢谢

@garyelephant
Copy link
Contributor

Waterdrop 中如何在配置中指定变量,之后在运行时,动态指定变量的值?

Waterdrop 从v1.2.4开始,支持在配置中指定变量,此功能常用于做定时或非定时的离线处理时,替换时间、日期等变量,用法如下:

在配置中,配置变量名称,比如:

...

filter {
  sql {
    table_name = "user_view"
    sql = "select * from user_view where city ='"${city}"' and dt = '"${date}"'"
  }
}

...

这里只是以sql filter举例,实际上,配置文件中任意位置的key = value中的value,都可以使用变量替换功能。

详细配置示例,请见variable substitution

启动命令如下:

# local  模式
./bin/start-waterdrop.sh -c ./config/your_app.conf -e client -m local[2] -i city=shanghai -i date=20190319

# yarn client 模式
./bin/start-waterdrop.sh -c ./config/your_app.conf -e client -m yarn -i city=shanghai -i date=20190319

# yarn cluster 模式
./bin/start-waterdrop.sh -c ./config/your_app.conf -e cluster -m yarn -i city=shanghai -i date=20190319

# mesos, spark standalone  启动方式相同。

可以用参数 -i 或者 --variable 后面指定 key=value来指定变量的值,其中key 需要与配置中的变量名相同。

EricJoy2048 added a commit to EricJoy2048/incubator-seatunnel that referenced this issue Jul 26, 2023
* [Feature][S3-Redshift] Support write cdc changelog

* [bugfix][kafka] Fixed null catalogName not supported when KafkaCatalog was created

* job client never retry

* [bugfix] Binary update to varbinary

* Fix wait for job complete

* [Bugfix] Fix oracle11g could not list the system space table

* fix wait job complete NPE

* [bugfix] Prefix and suffix table path is not used, revert to the original code (apache#129)

* Fix XA Transaction bug

* Code format

* [bugfix][oracleCDC] Add the `log.mining.batch.size.max` parameter Settings, prevent can't read data

* [bugfix][zeta] Fixed multi-table job data loss and latency issues (apache#149)

* [Hotfix][CDC] Fix oracle restore start-scn expired

* [Feature][CDC][Zeta] Support schema evolution(DDL)

* [Feature][Zeta] Add checkpointEnd event

* [Feature][CDC][Zeta] Fix schema change checkpoint restore

* [Feature][RedShift] S3-Redshift support DDL event

* [Feature][RedShift] Remove temporary table

* [bugfix][zeta] Fix job restore running checkpoint does not continue

* [Feature][CDC][Zeta] Filter unsupport schema change event

* [Feature][RedShift] Improve get rowtype from committer

* [Feature][CDC][Zeta] Fix schema checkpoint NPE

* [Feature][RedShift] Fix get error rowtype from committer

* [Feature][RedShift] Improve get rowtype logic from committer

* [Fix][RedShift] Fix cast class Exception in committer

* [Hotfix][Redshift] Fix multiple primary keys

* [Fix][RedShift] Add Drop table before create temporary table

* [Hotfix][CDC] Fix parse mysql column rename ddl after error

* [Improve][S3-Redshift] Support auto-create table and super datatype

* Format codestyle

---------

Co-authored-by: liuli <m_liuli@163.com>
Co-authored-by: gaojun <gaojun2048@gmail.com>
Co-authored-by: XiaoJiang521 <j18686831232@163.com>
Co-authored-by: XiaoJiang521 <131635688+XiaoJiang521@users.noreply.github.com>
Co-authored-by: ic4y <83933160+ic4y@users.noreply.github.com>
Co-authored-by: Jia Fan <fanjiaeminem@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants