Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not quoted empty strings are now null, implement a trim-blanks options. #12

Closed
wants to merge 2 commits into from

Conversation

dimitri
Copy link

@dimitri dimitri commented Oct 12, 2013

When loading data from CSV file into a database (PostgreSQL in my case), it's important to be able to distinguish in between empty string and NULL values. This patch considers non-quoted empty strings as NIL.

Also, it's quite important to be able to retain blanks found around non-quoted values, so I also implemented a new option trim-blanks that works as shown here:

CL-USER> (cl-csv:read-csv "'a', 'b',,'',   ,  plop  ,'c'" :quote #\' :trim-blanks nil)
(("a" "b" NIL "" "   " "  plop  " "c"))
CL-USER> (cl-csv:read-csv "'a', 'b',,'',   ,  plop  ,'c'" :quote #\' :trim-blanks t)
(("a" "b" NIL "" NIL "plop" "c"))

I've made the previous behaviour the default, that is trim-blanks defaults to t. I don' t think the unquoted empty string to NIL behavior needs an option, though.

@bobbysmith007
Copy link
Member

I am sorry to have missed this... I will look at this post-haste

@bobbysmith007
Copy link
Member

Looking at the whitespace tests, this seems to be a bug you have discovered rather than a feature that needs to be added. We already do not trim quoted values, but unfortunately this was missed on white-space only strings.

I think I will leave it returning nil for empty quoted values, but otherwise we should be returning the spaces always. (It is relatively easy to trim after the fact and the semantics are easier to understand if "everything in quotes is returned").

@bobbysmith007
Copy link
Member

Never mind I miss read, apologies, applying now

@bobbysmith007
Copy link
Member

This should be fixed in 982931d Note that I changed the var/arg to empty-string-is-nil because trimming usually involves removing characters while this arg controls the disposition of the empty string.

@bobbysmith007
Copy link
Member

Apologies again for missing this, github didnt notify me and It was not on my github dashboard (though I am unsure why).

@dimitri
Copy link
Author

dimitri commented Jan 6, 2014

In fact, if I'm reading your commit correctly, what you did is not helping my use case at all. I need to be able to distinguish in between those two cases, which have different representations in databases:

  • unknown data (NULL)
  • data is known to be an empty string (empty string)

So in my pull request what I had done is to provide a way to distinguish between no value between separators (raises nil) and an explicit empty string (raises "").

To be able to do that, you also have to consider whether a, ,b and a,,b are the same thing or not, and I added an option to let the user choose about that. I could use your implementation of it provided that I'm still able to distinguish in between a,,b and a,"",b too, which doesn't seem to be the case yet.

@bobbysmith007
Copy link
Member

18dadc8

I just committed a patch that I think covers all the circumstances discretely. Please let me know if this now meets your needs

@bobbysmith007
Copy link
Member

As a note there was a bug in the tests related to lisp-unit2 that I have fixed.

@dimitri
Copy link
Author

dimitri commented Jan 6, 2014

Looks pretty good, thanks! will integrate tomorrow, I guess.

dimitri added a commit to dimitri/pgloader that referenced this pull request Jan 11, 2014
Thanks to the work at AccelerationNet/cl-csv#12 we
can now use the main branch of cl-csv again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants