Closed
Description
Search before asking
- I had searched in the issues and found no similar issues.
What happened
When I used spark local mode to read the local csv file into the hive table, the data was multiplied by 3N times, but this did not happen when I used spark yarn mode. Because I used seatunnnel 1.5 before, the migration process was local, but when I tested version 2.3.5, the data was doubled.
summary :
--master local --deploy-mode client 3 times
--master yarn --deploy-mode client 3 times
--master yarn --deploy-mode cluster right
I have 2076 in my cvs file ,but select count(1) from xx then shows 3*2076
SeaTunnel Version
2.3.5
SeaTunnel Config
env {
# seatunnel defined streaming batch duration in seconds
execution.parallelism = 4
job.mode = "BATCH"
spark.executor.instances = 4
spark.executor.cores = 4
spark.executor.memory = "4g"
spark.sql.catalogImplementation = "hive"
spark.hadoop.hive.exec.dynamic.partition = "true"
spark.hadoop.hive.exec.dynamic.partition.mode = "nonstrict"
}
source {
LocalFile {
schema {
fields {
sku = string
sku_group = string
pb = string
series = string
pn = string
mater_n = string
}
}
path = "/data/ghyworkbase/uploadfile/h019-ods_file_pjp_old_new_sku_yy.csv"
file_format_type = "csv"
skip_header_row_number=1
result_table_name="ods_file_pjp_old_new_sku_yy_source"
}
}
transform {
Sql {
source_table_name="ods_file_pjp_old_new_sku_yy_source"
query = "select sku,sku_group,pb,series,pn,mater_n,TO_CHAR(CURRENT_DATE(),'yyyy') as dt_year from ods_file_pjp_old_new_sku_yy_source "
result_table_name="ods_file_pjp_old_new_sku_yy"
}
}
sink {
# Console {
# source_table_name = "ods_file_pjp_old_new_sku_yy"
# }
Hive {
source_table_name="ods_file_pjp_old_new_sku_yy"
table_name = "ghydata.ods_file_pjp_old_new_sku_yy"
metastore_uri = "thrift://"
}
}
Running Command
sh /data/seatunnel/seatunnel-2.3.4/bin/start-seatunnel-spark-3-connector-v2.sh \
--master local \
--deploy-mode client \
--queue ghydl \
--executor-instances 4 \
--executor-cores 4 \
--executor-memory 4g \
--name "h019-ods_file_pjp_old_new_sku_yy" \
--config /2.3.5/h019-ods_file_pjp_old_new_sku_yy.conf
Error Exception
nothing but data 3*
Zeta or Flink or Spark Version
No response
Java or Scala Version
/usr/local/jdk/jdk1.8.0_341
Screenshots
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct