Skip to content

Convenience: up to 20 parameters, Builders for ByteStrings #3

Closed
wants to merge 2 commits into from

3 participants

@mgajda
mgajda commented Jul 12, 2012

Hi,

I have added Params class for up to 20 parameters (useful if generating old FORTRAN-style column data formats), and added Builder instance for both strict and lazy ByteString. The latter may be useful when dealing with legacy code.

Any hints about efficiency improvements in the future? (My parsing code is actually much faster than formatting with text-format.)

mgajda added some commits Jul 11, 2012
@mgajda mgajda Added Params for argument counts from 11 up to 20. 02c15bf
@mgajda mgajda Convenience Builders for strict&lazy ByteString.
In case of interacting with old 8-bit code, one might want to print
ByteString values just as easily as other string types.
676f1aa
@singpolyma

Char8 is bad. Use a UTF8 decode routine (maybe the one from utf8-strings) at the very least, but even that is not safe unless you happen to know the bytes are UTF8 encoded text. Probably should not be an instance, but an explicit function you have to call (which already exist in Data.Text.Encoding)

@mgajda
mgajda commented Aug 25, 2012

I believe that it is correct when you have 100% certainty that you deal with pure ASCII, like when reading/writing legacy databases.

I added the instances since I noticed that many people are still using ASCII strings around for efficiency reasons.
They also allow to sidestep unpleasant issue when UTF8 codec crashes entire program, or "forgets" characters because they were input by user in a wrong encoding.

Disclosure: 2 out of 3 languages I use every day are much more readable with non-ASCII encoding. I have been badly bitten by encoding mixtures in many data files I processed with a default UTF-8 encoding.

@singpolyma

Char8 is actually latin1, and yes if you know your encoding it's safe, but in that case call a function specific to your encoding. An instance is dangerous because if someone uses it on bytes not in latin1, it will corrupt the data.

@mgajda
mgajda commented Aug 25, 2012

Yes, using a wrong tool may lead to data corruption.
I believe that one should always use Data.ByteString instead of Data.ByteString.Char8 if one doesn't have 8-bit characters inside.

And that's why I suggest instances only for *.Char8 types, not for those in Data.ByteString module itself.

@mgajda
mgajda commented Aug 27, 2012

If there are other people sharing this reservation, I may factor the *.Bytestring.Char8 instances into a separate module that requires explicit imports. Still it could be beneficial to share this code in text-format, where it belongs.

@bos
Owner
bos commented Aug 27, 2012

I've applied 02c15bf, thanks.

I won't take the other patch, as it is not safe for the reasons that @singpolyma mentions.

@bos bos closed this Aug 27, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.