Skip to content

[Bug][connector-hive] filter '_SUCCESS' file in file list (#2235)#2236

Merged
CalvinKirs merged 1 commit intoapache:devfrom
TyrantLucifer:bug-fix-hive-connector-v2-filter-file
Jul 22, 2022
Merged

[Bug][connector-hive] filter '_SUCCESS' file in file list (#2235)#2236
CalvinKirs merged 1 commit intoapache:devfrom
TyrantLucifer:bug-fix-hive-connector-v2-filter-file

Conversation

@TyrantLucifer
Copy link
Member

@TyrantLucifer TyrantLucifer commented Jul 21, 2022

Purpose of this pull request

#2235

Check list

@CalvinKirs CalvinKirs merged commit db04651 into apache:dev Jul 22, 2022
@CalvinKirs CalvinKirs linked an issue Jul 22, 2022 that may be closed by this pull request
3 tasks
CalvinKirs added a commit that referenced this pull request Jul 22, 2022
* Delete a repeated dependency libary. (#2180)

Signed-off-by: root <l-shen@localhost.localdomain>

Co-authored-by: root <l-shen@localhost.localdomain>

* update flinkCommand to sparkCommand in spark example (#2184)

* update doc about module desc to keep consistent with the real module name (#2185)

* [Connector-V2] Add Hive sink connector v2 (#2158)

* tmp commit

* add hadoop2 and hadoop3 shade jar

* add hadoop2 and hadoop3 shade jar

* add license head

* change know denpendencies

* tmp commit

* tmp commit

* change hadoop dependency scope to provide

* back pom

* fix checkstyle

* add example

* fix example bug

* remove file connector from example and e2e because hadoop2 can not compile with jdk11

* no need jdk8 and jdk11 profile because we don't use hadoop shade jar

* change hadoop jar dependency scope to provided

* back

* file connector can not build in jdk11

* drop hadoop shade

* add gitignore item

* add hadoop and local file sink

* fix pom error

* fix pom error

* fix pom error

* implement new interface

* fix UT error

* fix e2e error

* update build timeout from 30min to 40min

* fix e2e error

* remove auto service

* fix e2e error

* fix e2e error

* fix e2e error

* found e2e error

* fix e2e error

* fix e2e error

* fix e2e error

* merge from upstream

* merge from upstream

* merge from upstream

* merge from upstream

* merge from upstream

* add mvn jvm option

* add mvn jvm option

* add license

* add licnese

* add licnese

* fix dependency

* fix build jvm oom

* fix build jvm oom

* fix build jvm oom

* fix dependency

* fix dependency

* fix e2e error

* add codeql check timeout from 30min to 60min

* merge from dev

* merge from dev

* fix ci error

* fix checkstyle

* fix ci

* fix ci

* aa

* aa

* aa

* add .idea

* del .idea

* del .idea

* del .idea

* del .idea

* remove no use license

* remove no use before and after method in test

* fix license; remove dependency

* fix review

* fix build order

* fix license

* fix license

* fix review

* fix review

* fix review

* fix review

* fix review

* fix review

* fix review

* fix review

* fix review

* add code-analysys timeout to 120

* retry ci

* update license and remove no use jar from LICENSE file

* retry ci

* add hive sink

* add hive sink connector doc

* add hive sink connector doc

* fix checkstyle error.

* fix bug

* tmp

* fix hive shade error

* fix hive shade error

* fix commit bug

* optimaze doc

* optimaze doc

* optimize doc

* optimize code

* [Feat][UI] Add login page. (#2183)

* [bug]fix  commandArgs  -t(--check)  conflict  with flink deployment t… (#2174)

* [bug]fix  commandArgs  -t(--check)  conflict  with flink deployment target

* [bug]fix  commandArgs  -t(--check)  conflict  with flink deployment target

* [Bug][spark-connector-v2-example] fix the bug of no class found. (#2191) (#2192)

* [Bug][spark-connector-v2-example] fix the bug of no class found. (#2191)

* add the janino dependency in pom

* [Bug][spark-connector-v2-example] remove janino dependency in main pom and add it to connector[v2]-hive (#2191)

* [Bug][spark-connector-v2-example] add janino-3.0.9.jar in known-dependencies.txt to fix dependency license error (#2191)

* update the condition to 1 = 0 about get table operation (#2186)

* [Docs] Add connectors-v2 to docs item (#2187)

* [Feat][UI] Add dashboard layout. (#2198)

* [checkstyle] Improved validation scope of MagicNumber (#2194)

* [Bug][Connector]Hudi Source loads the data twice

* add unknown exception message (#2204)

* [Bug] [seatunnel-api-flink] Connectors dependencies repeat additions (#2207)

* [Bug] [connector-v2] When outputting data to clickhouse, a ClassCastException was encountered

* [Bug] [seatunnel-api-flink] Connectors dependencies repeat additions

* [Bug][Script]Fix the problem that the help command is invalid

* [Fix][CI] Add remove jar from /tmp/seatunnel-dependencies before run

* [Feat][UI] Add dashboard default router. (#2216)

* [Feat][UI] Add the header component in the dashboard layout. (#2218)

* [Core][Starter] Change jar connector load logic (#2193)

* [Docs]Fix Flink engine version requirements (#2220)

Flink 1.13.6 version is compatible with 1.12, but not applicable to below 1.12

* [Feat][UI] Add the setting dropdown in the dashboard layout. (#2225)

* [Feat][UI] Add the user dropdown in the dashboard layout. (#2228)

* [Bug][hive-connector-v2] Resolve the schema inconsistency bug (#2229) (#2230)

* [doc] Correct v2 connector avoid duplicate slug (#2231)

Currently, url https://seatunnel.apache.org/docs/category/source
will expand two parent sidebar with both source and source-v2.
This is because we're using same slug in our sidebars.js.

* [Build]Optimize license check (#2232)

* [Core][Starter] Fix connector v2 can't deserialize on spark (#2221)

* [Core][Starter] Fix connector v2 can't deserialize on spark

* [Core][Starter] Add SerializationUtils Unit Test

* [Core][Starter] Add SerializationUtils Unit Test

* [Core][Flink] Fixed FlinkEnvironment registerPlugin logic both old and new api

* [Bug][connector-hive] filter '_SUCCESS' file in file list (#2235) (#2236)

* StateT of SeaTunnelSource should extend `Serializable` (#2214)

* [Improvement][core] StateT of SeaTunnelSource should extend `Serializable`
,so that `org.apache.seatunnel.api.source.SeaTunnelSource.getEnumeratorStateSerializer` can support a default implementation.
This will be useful to each SeaTunnelSource subclass implementation.

* repetitive dependency

repetitive dependency

* [Improvement][connector-v2] postgre jar should be contained in container like mysql-java, so it should be  provided, not compile

* [Improvement][connector-v2] remove the code block in the implementation class to keep code clean.

* [Improvement][connector-v2] remove unused import

* [Improvement][connector-v2] modify import order

Co-authored-by: bjyflihongyu <lihongyuinfo@jd.com>

* [Feat][UI] Add the table in the user manage. (#2234)

* Merge dev to st-engine branch

Co-authored-by: l-shen <lijieliang@cmss.chinamobile.com>
Co-authored-by: root <l-shen@localhost.localdomain>
Co-authored-by: Xiao Zhao <49054376+zhaomin1423@users.noreply.github.com>
Co-authored-by: Eric <gaojun2048@gmail.com>
Co-authored-by: songjianet <1778651752@qq.com>
Co-authored-by: sandyfog <154525105@qq.com>
Co-authored-by: TyrantLucifer <TyrantLucifer@gmail.com>
Co-authored-by: Zongwen Li <zongwen.li.tech@gmail.com>
Co-authored-by: superzhang0929 <45145852+superzhang0929@users.noreply.github.com>
Co-authored-by: Kerwin <37063904+zhuangchong@users.noreply.github.com>
Co-authored-by: gaara <85996062+gaaraG@users.noreply.github.com>
Co-authored-by: lvlv <40759793+lvlv-feifei@users.noreply.github.com>
Co-authored-by: Hisoka <fanjiaeminem@qq.com>
Co-authored-by: Jiajie Zhong <zhongjiajie955@gmail.com>
Co-authored-by: Jared Li <lhyundeadsoul@gmail.com>
Co-authored-by: bjyflihongyu <lihongyuinfo@jd.com>
TyrantLucifer added a commit to TyrantLucifer/incubator-seatunnel that referenced this pull request Sep 18, 2022
@xujiongda
Copy link

Hi @TyrantLucifer ,
is this issue also happen on hive connecter(not v2),
if so, is there any way we can filter '_SUCCESS' on the older version(2.1.x)?

It will be appreciated if you can give any advice.
Thank you in advance!

@TyrantLucifer
Copy link
Member Author

Hi @TyrantLucifer , is this issue also happen on hive connecter(not v2), if so, is there any way we can filter '_SUCCESS' on the older version(2.1.x)?

It will be appreciated if you can give any advice. Thank you in advance!

I remember the v1 issue has been fixed.

@xujiongda
Copy link

xujiongda commented Dec 16, 2023

Hi @TyrantLucifer
Thank you for replying!

  1. Do you remember some keyword about the v1 issue?
    In "Issues" I just found this bug related to _SUCCESS.
    (The version I am using is 2.1.3.)

2.And about the issue description you said:

When hive connector scan hdfs dirs and put file in list, it will not filter the '_SUCCESS' file that usually generated by spark. It will lead to cause the task failed.

Is the issue you found related to job writing lots "_SUCCESS" files in the same dir, leading to task failure?
Normally hive doesn't have _SUCCESS in dir, so if the v1 version fixed, "_SUCCESS" file should not exist in hive dir, is it correct?
Just want to confirm if it is the same issue.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Connector-V2] [Hive connector] file list should not contain '_SUCCESS' file

3 participants