Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
Change-Id: I423f052ca48915502f182cb4f1c67cdf04838a99
  • Loading branch information
steveloughran committed Aug 17, 2022
1 parent 82372d0 commit 25db5da
Showing 1 changed file with 2 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -317,7 +317,7 @@ None of the S3A committers support this. Condition (1) is not met by
the staging committers, while (2) is not met by S3 itself.

To use the manifest committer with dynamic partition overwrites, the
spark version must contain
spark version must contain
[SPARK-40034](https://issues.apache.org/jira/browse/SPARK-40034)
_PathOutputCommitters to work with dynamic partition overwrite_.

Expand All @@ -339,8 +339,6 @@ for SQL queries/Spark DataSet operations where many thousands of files are creat
The fact that these will suffer from performance problems before
throttling scale issues surface, should be considered a warning.



# <a name="SUCCESS"></a> Job Summaries in `_SUCCESS` files

The original hadoop committer creates a zero byte `_SUCCESS` file in the root of the output directory
Expand Down Expand Up @@ -673,7 +671,7 @@ For this to work, a number of conditions must be met:
* All jobs/tasks must create files with unique filenames.
* All jobs must create output with the same directory partition structure.
* The job/queries MUST NOT be using Spark Dynamic Partitioning "INSERT OVERWRITE TABLE"; data may be lost.
This holds for *all* committers, not just the manifest committer.
This holds for *all* committers, not just the manifest committer.
* Remember to delete the `_temporary` directory later!

This has *NOT BEEN TESTED*

0 comments on commit 25db5da

Please sign in to comment.