Skip to content

PHOENIX-7267 CsvBulkLoadTool fails job due to a bad record with "(sta…#2399

Open
xavifeds8 wants to merge 5 commits into
apache:masterfrom
xavifeds8:PHOENIX-7267
Open

PHOENIX-7267 CsvBulkLoadTool fails job due to a bad record with "(sta…#2399
xavifeds8 wants to merge 5 commits into
apache:masterfrom
xavifeds8:PHOENIX-7267

Conversation

@xavifeds8
Copy link
Copy Markdown
Contributor

…rtline 1) EOF reached before encapsulated token finished"

@xavifeds8 xavifeds8 force-pushed the PHOENIX-7267 branch 2 times, most recently from 2404a9d to 41a8c42 Compare April 27, 2026 10:43
@xavifeds8
Copy link
Copy Markdown
Contributor Author

With commons-csv 1.0, CsvBulkLoadTool would fail the entire MapReduce job when encountering a malformed CSV record.
After the upgrade of commons-csv to 1.14.1
--ignore-errors: Bad records are skipped, good records are loaded, errors are counted in MR counters
Without --ignore-errors: Job fails gracefully with a clear error message instead of crashing

Sanity test for the upgrade : https://gist.github.com/xavifeds8/bd6015a1733ddbf630cbbdb453bdbc0d

@xavifeds8
Copy link
Copy Markdown
Contributor Author

xavifeds8 commented Apr 27, 2026

Changes made :

  1. Upgraded commons-csv from 1.0 to 1.14.1
  2. Migrated deprecated CSVFormat.withXxx() calls to CSVFormat.Builder API
  3. Migrated deprecated new CSVParser(reader, format) to CSVParser.builder().setFormat(format).setReader(reader).get()
  4. Caught UncheckedIOException (thrown by commons-csv 1.14.1 during iteration) in UpsertExecutor and CsvToKeyValueMapper, so parse errors are now routed through the normal error-handling path
  5. Updated Pherf's CSVFileResultHandler and GoogleChartGenerator for the same API migration

Tests:

  1. Updated existing CSVCommonsLoaderIT tests for the new API
  2. Fixed testCSVCommonsUpsertBadEncapsulatedControlChars assertion to match the new exception wrapping
  3. Added testCSVCommonsUpsertEOFInEncapsulatedToken — directly tests the reported scenario (unclosed quote at EOF)

@xavifeds8 xavifeds8 force-pushed the PHOENIX-7267 branch 2 times, most recently from 40f08c1 to ff599da Compare May 12, 2026 20:42
@xavifeds8
Copy link
Copy Markdown
Contributor Author

Hi @virajjasani, could you please trigger a CI run for this PR? Also, let me know if the changes look good to you.

@virajjasani
Copy link
Copy Markdown
Contributor

@xavifeds8 the CI builds are broken temporarily. Could you run all csv bulk load realted tests and some more in your local to confirm nothing is broken?

Copy link
Copy Markdown
Contributor

@virajjasani virajjasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, need manual verification of build and some test results

try (CSVParser csvParser =
CSVParser.builder().setFormat(csvFormat).setReader(new StringReader(input)).get()) {
return Iterables.getFirst(csvParser, null);
} catch (UncheckedIOException e) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure that the newer csvParser actually throws UncheckedIOException for a bad record. We should not catch anything more unless needed

Copy link
Copy Markdown
Contributor Author

@xavifeds8 xavifeds8 May 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @NihalJain
https://github.com/apache/commons-csv/blob/6f93c7edfa0f758f757227b1d30588411fdbf669/src/main/java/org/apache/commons/csv/CSVParser.java#L234

Here in 1.14.1 in csv-commons CSVParser IOException is wraped with UncheckedIOException.

also have added a UT to verify this behaviour 25be349

[INFO] -------------------------------------------------------
[INFO] T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.phoenix.mapreduce.CsvToKeyValueMapperTest
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.064 s -- in org.apache.phoenix.mapreduce.CsvToKeyValueMapperTest
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 23.525 s
[INFO] Finished at: 2026-05-18T12:02:52+05:30
[INFO] ------------------------------------------------------------------------

% cat phoenix-core/target/surefire-reports/org.apache.phoenix.mapreduce.CsvToKeyValueMapperTest-output.txt

Exception type: java.io.UncheckedIOException
Message: org.apache.commons.csv.CSVException: (startline 1) EOF reached before encapsulated token finished
Cause type: org.apache.commons.csv.CSVException
Cause message: (startline 1) EOF reached before encapsulated token finished
phoenix %

…rtline 1) EOF reached before encapsulated token finished"
@xavifeds8 xavifeds8 reopened this May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants