Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser unable to parse csv file with lower row quantity compare with header #111

Closed
EugeneLDT opened this issue Dec 4, 2022 · 6 comments
Labels
bug Something isn't working

Comments

@EugeneLDT
Copy link

EugeneLDT commented Dec 4, 2022

Describe the bug
Some csv editors after creating new fields add only touched fields
for example some csv editors on mac don't add extra empty fields:

For example we have csv file:
name,age,gender,hobby
Steve,19,male,sport

I'm will open this file and add one more user, but with empty gender, and hobby, in out business logic it's allowed.
name,age,gender,hobby
Steve,19,male,sport
Emma,20

editor will save latest record without extra empty fields, and parser with throw error something like
Fields num seems to be 4 on each row, but on 3th csv row, fields num is 2.

according configuration i'm can ignore or throw error in such case, but it's valid situation and. I'm need save this record.

To Reproduce
Steps to reproduce the behavior.

val record = mutableListOf<List>()
csvReader { charset = "UTF-8" quoteChar = '"' delimiter = separator }.open(file) { readAllAsSequence().forEach { record.add(it) } }

  1. export file:
    name,age,gender,hobby
    Steve,19,male,sport

  2. add one more user, and try to read this file
    name,age,gender,hobby
    Steve,19,male,sport
    Emma,20

Actual behavior
Fields num seems to be 4 on each row, but on 3th csv row, fields num is 2.

Expected behavior
Reading file successful empty fields getting considered as empty one.

P.S. Will be good to add to class InsufficientFieldsRowBehaviour enum: IGNORE_MISSING_FIELDS
and handle it, like it some fields are missing just consider them as empty one.

Environment

  • kotlin-csv version [e.g. 0.10.0]
  • java version [e.g. java8]
  • OS: [e.g. MacOS]

Screenshots
If applicable, add screenshots to help explain your problem.

@EugeneLDT EugeneLDT added the bug Something isn't working label Dec 4, 2022
@doyaaaaaken
Copy link
Owner

Hi, @EugeneLDT. Thank you for using this library.
You can solve the problem by using the excessFieldsRowBehaviour and insufficientFieldsRowBehaviour option.
See https://github.com/doyaaaaaken/kotlin-csv#customize.

@EugeneLDT
Copy link
Author

EugeneLDT commented Dec 6, 2022

Hi @doyaaaaaken Thank you for response.
ExcessFieldsRowBehaviour allow only trim excess fields which I'm already use.
InsufficientFieldsRowBehaviour allow only skip or throw error. but not process already contains fields.
My concern that's will be good to allow InsufficientFieldsRowBehaviour also process fields with smaller amount of fields.

I'm already found solution work around maybe will be helpful for someone.
I'm just add extra empty fields...

if your try to parse file demo.csv you got error.
second file new.csv fill be processed without error.

fun main() {
File("demo.csv").delete()
File("new.csv").delete()

val tempFile = File("demo.csv")
val newFile = File("new.csv")

tempFile.appendText("name,surname,age \n")
tempFile.appendText("Smith,Johnson,18\n")
tempFile.appendText("Emily")

val separator = ','

val extraSeparators = separator.toString().repeat(20)
tempFile.readLines().mapIndexed { index, it ->
    val newLine = when (index) {
        0 -> "$it\n" // Don't add extra empty fields to header
        else -> "$it$extraSeparators\n" // old line + extra separators ,,,,, and move to next line symbol
    }
    newFile.appendText(newLine)
}

val record = mutableListOf<List<String>>()
csvReader {
    charset = "UTF-8"
    quoteChar = '"'
    delimiter = separator
    excessFieldsRowBehaviour = ExcessFieldsRowBehaviour.TRIM
}.open(newFile) {
    readAllAsSequence().forEach { record.add(it) }
}

println(record)

tempFile.delete()
newFile.delete()

}

@doyaaaaaken
Copy link
Owner

@EugeneLDT
Thank you, I understand!
So, would implementing the new option InsufficientFieldsRowBehaviour.EMPTY_STRING be a solution?
If this option would be set, insufficient fields are treated as empty string ("").

@EugeneLDT
Copy link
Author

Yes, it will be great.

@doyaaaaaken
Copy link
Owner

Thanks!
As this problem is not bug (this behavior is not defined on CSV specification), I create an feature request issue #113 and close this issue.

@doyaaaaaken
Copy link
Owner

Resolved in this PR #117

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants