[Bug][connector-hive] filter '_SUCCESS' file in file list (#2235) by TyrantLucifer · Pull Request #2236 · apache/seatunnel

TyrantLucifer · 2022-07-21T16:28:49Z

Purpose of this pull request

Check list

Code changed are covered with tests, or it does not need tests for reason:
If any new Jar binary package adding in your PR, please add License Notice according
New License Guide
If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs

* Delete a repeated dependency libary. (#2180) Signed-off-by: root <l-shen@localhost.localdomain> Co-authored-by: root <l-shen@localhost.localdomain> * update flinkCommand to sparkCommand in spark example (#2184) * update doc about module desc to keep consistent with the real module name (#2185) * [Connector-V2] Add Hive sink connector v2 (#2158) * tmp commit * add hadoop2 and hadoop3 shade jar * add hadoop2 and hadoop3 shade jar * add license head * change know denpendencies * tmp commit * tmp commit * change hadoop dependency scope to provide * back pom * fix checkstyle * add example * fix example bug * remove file connector from example and e2e because hadoop2 can not compile with jdk11 * no need jdk8 and jdk11 profile because we don't use hadoop shade jar * change hadoop jar dependency scope to provided * back * file connector can not build in jdk11 * drop hadoop shade * add gitignore item * add hadoop and local file sink * fix pom error * fix pom error * fix pom error * implement new interface * fix UT error * fix e2e error * update build timeout from 30min to 40min * fix e2e error * remove auto service * fix e2e error * fix e2e error * fix e2e error * found e2e error * fix e2e error * fix e2e error * fix e2e error * merge from upstream * merge from upstream * merge from upstream * merge from upstream * merge from upstream * add mvn jvm option * add mvn jvm option * add license * add licnese * add licnese * fix dependency * fix build jvm oom * fix build jvm oom * fix build jvm oom * fix dependency * fix dependency * fix e2e error * add codeql check timeout from 30min to 60min * merge from dev * merge from dev * fix ci error * fix checkstyle * fix ci * fix ci * aa * aa * aa * add .idea * del .idea * del .idea * del .idea * del .idea * remove no use license * remove no use before and after method in test * fix license; remove dependency * fix review * fix build order * fix license * fix license * fix review * fix review * fix review * fix review * fix review * fix review * fix review * fix review * fix review * add code-analysys timeout to 120 * retry ci * update license and remove no use jar from LICENSE file * retry ci * add hive sink * add hive sink connector doc * add hive sink connector doc * fix checkstyle error. * fix bug * tmp * fix hive shade error * fix hive shade error * fix commit bug * optimaze doc * optimaze doc * optimize doc * optimize code * [Feat][UI] Add login page. (#2183) * [bug]fix commandArgs -t(--check) conflict with flink deployment t… (#2174) * [bug]fix commandArgs -t(--check) conflict with flink deployment target * [bug]fix commandArgs -t(--check) conflict with flink deployment target * [Bug][spark-connector-v2-example] fix the bug of no class found. (#2191) (#2192) * [Bug][spark-connector-v2-example] fix the bug of no class found. (#2191) * add the janino dependency in pom * [Bug][spark-connector-v2-example] remove janino dependency in main pom and add it to connector[v2]-hive (#2191) * [Bug][spark-connector-v2-example] add janino-3.0.9.jar in known-dependencies.txt to fix dependency license error (#2191) * update the condition to 1 = 0 about get table operation (#2186) * [Docs] Add connectors-v2 to docs item (#2187) * [Feat][UI] Add dashboard layout. (#2198) * [checkstyle] Improved validation scope of MagicNumber (#2194) * [Bug][Connector]Hudi Source loads the data twice * add unknown exception message (#2204) * [Bug] [seatunnel-api-flink] Connectors dependencies repeat additions (#2207) * [Bug] [connector-v2] When outputting data to clickhouse, a ClassCastException was encountered * [Bug] [seatunnel-api-flink] Connectors dependencies repeat additions * [Bug][Script]Fix the problem that the help command is invalid * [Fix][CI] Add remove jar from /tmp/seatunnel-dependencies before run * [Feat][UI] Add dashboard default router. (#2216) * [Feat][UI] Add the header component in the dashboard layout. (#2218) * [Core][Starter] Change jar connector load logic (#2193) * [Docs]Fix Flink engine version requirements (#2220) Flink 1.13.6 version is compatible with 1.12, but not applicable to below 1.12 * [Feat][UI] Add the setting dropdown in the dashboard layout. (#2225) * [Feat][UI] Add the user dropdown in the dashboard layout. (#2228) * [Bug][hive-connector-v2] Resolve the schema inconsistency bug (#2229) (#2230) * [doc] Correct v2 connector avoid duplicate slug (#2231) Currently, url https://seatunnel.apache.org/docs/category/source will expand two parent sidebar with both source and source-v2. This is because we're using same slug in our sidebars.js. * [Build]Optimize license check (#2232) * [Core][Starter] Fix connector v2 can't deserialize on spark (#2221) * [Core][Starter] Fix connector v2 can't deserialize on spark * [Core][Starter] Add SerializationUtils Unit Test * [Core][Starter] Add SerializationUtils Unit Test * [Core][Flink] Fixed FlinkEnvironment registerPlugin logic both old and new api * [Bug][connector-hive] filter '_SUCCESS' file in file list (#2235) (#2236) * StateT of SeaTunnelSource should extend `Serializable` (#2214) * [Improvement][core] StateT of SeaTunnelSource should extend `Serializable` ,so that `org.apache.seatunnel.api.source.SeaTunnelSource.getEnumeratorStateSerializer` can support a default implementation. This will be useful to each SeaTunnelSource subclass implementation. * repetitive dependency repetitive dependency * [Improvement][connector-v2] postgre jar should be contained in container like mysql-java, so it should be provided, not compile * [Improvement][connector-v2] remove the code block in the implementation class to keep code clean. * [Improvement][connector-v2] remove unused import * [Improvement][connector-v2] modify import order Co-authored-by: bjyflihongyu <lihongyuinfo@jd.com> * [Feat][UI] Add the table in the user manage. (#2234) * Merge dev to st-engine branch Co-authored-by: l-shen <lijieliang@cmss.chinamobile.com> Co-authored-by: root <l-shen@localhost.localdomain> Co-authored-by: Xiao Zhao <49054376+zhaomin1423@users.noreply.github.com> Co-authored-by: Eric <gaojun2048@gmail.com> Co-authored-by: songjianet <1778651752@qq.com> Co-authored-by: sandyfog <154525105@qq.com> Co-authored-by: TyrantLucifer <TyrantLucifer@gmail.com> Co-authored-by: Zongwen Li <zongwen.li.tech@gmail.com> Co-authored-by: superzhang0929 <45145852+superzhang0929@users.noreply.github.com> Co-authored-by: Kerwin <37063904+zhuangchong@users.noreply.github.com> Co-authored-by: gaara <85996062+gaaraG@users.noreply.github.com> Co-authored-by: lvlv <40759793+lvlv-feifei@users.noreply.github.com> Co-authored-by: Hisoka <fanjiaeminem@qq.com> Co-authored-by: Jiajie Zhong <zhongjiajie955@gmail.com> Co-authored-by: Jared Li <lhyundeadsoul@gmail.com> Co-authored-by: bjyflihongyu <lihongyuinfo@jd.com>

… (apache#2236)

xujiongda · 2023-12-16T06:46:27Z

Hi @TyrantLucifer ,
is this issue also happen on hive connecter(not v2),
if so, is there any way we can filter '_SUCCESS' on the older version(2.1.x)?

It will be appreciated if you can give any advice.
Thank you in advance!

TyrantLucifer · 2023-12-16T13:11:08Z

Hi @TyrantLucifer , is this issue also happen on hive connecter(not v2), if so, is there any way we can filter '_SUCCESS' on the older version(2.1.x)?

It will be appreciated if you can give any advice. Thank you in advance!

I remember the v1 issue has been fixed.

xujiongda · 2023-12-16T16:22:52Z

Hi @TyrantLucifer
Thank you for replying!

Do you remember some keyword about the v1 issue?
In "Issues" I just found this bug related to _SUCCESS.
(The version I am using is 2.1.3.)

2.And about the issue description you said:

When hive connector scan hdfs dirs and put file in list, it will not filter the '_SUCCESS' file that usually generated by spark. It will lead to cause the task failed.

Is the issue you found related to job writing lots "_SUCCESS" files in the same dir, leading to task failure?
Normally hive doesn't have _SUCCESS in dir, so if the v1 version fixed, "_SUCCESS" file should not exist in hive dir, is it correct?
Just want to confirm if it is the same issue.

Thanks!

[Bug][connector-hive] filter '_SUCCESS' file in file list (apache#2235)

7f6c787

TyrantLucifer mentioned this pull request Jul 21, 2022

[Connector-V2] [Hive connector] file list should not contain '_SUCCESS' file #2235

Closed

3 tasks

CalvinKirs approved these changes Jul 22, 2022

View reviewed changes

CalvinKirs merged commit db04651 into apache:dev Jul 22, 2022

CalvinKirs linked an issue Jul 22, 2022 that may be closed by this pull request

[Connector-V2] [Hive connector] file list should not contain '_SUCCESS' file #2235

Closed

3 tasks

TyrantLucifer added a commit to TyrantLucifer/incubator-seatunnel that referenced this pull request Sep 18, 2022

[Bug][connector-hive] filter '_SUCCESS' file in file list (apache#2235)…

c1f0db7

… (apache#2236)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug][connector-hive] filter '_SUCCESS' file in file list (#2235)#2236

[Bug][connector-hive] filter '_SUCCESS' file in file list (#2235)#2236
CalvinKirs merged 1 commit intoapache:devfrom
TyrantLucifer:bug-fix-hive-connector-v2-filter-file

TyrantLucifer commented Jul 21, 2022 •

edited

Loading

Uh oh!

xujiongda commented Dec 16, 2023

Uh oh!

TyrantLucifer commented Dec 16, 2023

Uh oh!

xujiongda commented Dec 16, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

TyrantLucifer commented Jul 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose of this pull request

Check list

Uh oh!

xujiongda commented Dec 16, 2023

Uh oh!

TyrantLucifer commented Dec 16, 2023

Uh oh!

xujiongda commented Dec 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TyrantLucifer commented Jul 21, 2022 •

edited

Loading

xujiongda commented Dec 16, 2023 •

edited

Loading