Is encoding Double or Float supported? #20

leisurehound · 2020-05-30T06:46:58Z

When I change the floatStrategy as .convert I get an fatal error in this code:

            case .throw: throw CSVEncoder.Error._invalidFloatingPoint(value, codingPath: self.codingPath)
            case .convert(let positiveInfinity, let negativeInfinity, let nan):
                if value.isNaN {
                    return nan
                } else if value.isInfinite {
                    return (value < 0) ? negativeInfinity : positiveInfinity
                } else { fatalError() }
            }

So with either strategy I either get the thrown error or a fatal error if a valid Double is attempted to be encoded. Is this expected or is there another configuration I'm missing?

Similarly, when I try to decode a double, I get an error thrown, but when I decode it as a string and convert the string to a double in my structs init(from: Decoder) I process the field correctly.

The text was updated successfully, but these errors were encountered:

dehesa · 2020-05-30T08:06:52Z

Hi @leisurehound

It seems I introduced a bug encoding/decoding Doubles and Floats in one of the last releases and my tests were only covering integer numbers (and not floating-point ones).

It is an easy fix. I will have a patch ready in a couple of hours. Thank you for reporting.

dehesa · 2020-05-30T10:32:50Z

I have fixed the floating-point problem and added some tests to check correct behavior. The fixes can be found in the develop branch.

Please, give it a try and if it solves your problems, I will promote the fix to master and release a new patch version.

dehesa · 2020-05-30T10:49:57Z

Oh, and I forgot to mention that I have renamed floatStrategy to nonConformingFloatStrategy. This way, it makes it more clear that the encoding/decoding strategy only gives the option to modify non conforming floating-point values (i.e. infinity, NaN).

leisurehound · 2020-05-30T17:10:42Z

Thanks @dehesa.

The following statement now works:

duration = try container.decode(Double.self)

but the following does not:

duration = try container.decodeIfPresent(Double.self)

dehesa · 2020-05-30T17:19:11Z

Uhm, ok. That is interesting. Could you provide a small example with the behavior you are expecting, such as the one I have in the tests. It will help me out debugging.

leisurehound · 2020-05-30T18:08:15Z

Sorry, am not sure if my Codable experience allows me to replicate without the container that I'm using within the init(from: Decoder).

I'm trying to decode an enum property in the parent struct (see below) by decoding each field individually. I'd really like to be able to decode the Codable enum like I can with JSON, but the new container index seems to get mixed up and always fails.

I'm essentially serializing an enum which has one case with an associated value and the other does not (its essentially a simplified Result type). Here's how I'd expect it to work (but doesn't):

struct myStruct : Codable {
  let dateSent: Date
  let response: Response
  init(from decoder: Decoder) throws {
    var container = try decoder.container.unkeycontainer()
    // ... decode the date, which works fine...
    response = try container.decode(Response.self)
  }
}

enum Response : Codable {
  case time(duration: Double)
  case failure
  init(from decoder: Decoder) {
    var container = try decoder.container.unkeyedcontainer()
    let duration = try decoder.decodeIfPresent(Double.self)
    let failure = try decoder.decodeIfPresent(String.self)
    switch (duration, failure) {
      case (.some(let duration), .none): self = .time(duration: duration)
      case (.none, .some) : self = .failure
      case (.some, .some) : throw DecodingError.dataCorrupted(DecodingError.Context(codingPath: [], debugDescription: "Both Duration & Failure populated"))
      case (.none, .none) : throw DecodingError.dataCorrupted(DecodingError.Context(codingPath: [], debugDescription: "Both Duration & Failure empty"))
    }
}

Here's what I've had to resort to because of the container index becomes problematic when moving to the embedded enum, but it also doesn't work because of the decodeIfPresent(Double.self) always throws.

struct myStruct : Codable {
  let dateSent: Date
  let response: Response
  init(from decoder: Decoder) throws {
    var container = try decoder.container.unkeycontainer()
    // ... decode the date, which works fine...
    // response = try container.decode(Response.self)
    let duration = try decoder.decodeIfPresent(Double.self)
    let failure = try decoder.decodeIfPresent(String.self)
    switch (duration, failure) {
      case (.some(let duration), .none): response = .time(duration: duration)
      case (.none, .some) : response = .failure
      case (.some, .some) : throw DecodingError.dataCorrupted(DecodingError.Context(codingPath: [], debugDescription: "Both Duration & Failure populated"))
      case (.none, .none) : throw DecodingError.dataCorrupted(DecodingError.Context(codingPath: [], debugDescription: "Both Duration & Failure empty"))
    }
  }
}

enum Response {
  case time(duration: Double)
  case failure
}

I'm expecting (and encoding to) the following string records
<datestring>,3.23423,
or
<datestring>,,failure
but am struggling to decode the same

Sorry if this is fundamentally the wrong approach here, but its what I'm coming from the experience I have coding/decoding JSON with Codable.

dehesa · 2020-05-30T19:40:26Z

Alright. Let me take a look at it. I have a further question, though. Is the CSV schema closed? I mean, is that row definition final? <datestring>,<float>, and <datestring>,<empty>,failure or would you be open to define it in another way? e.g. <datestring>,<optionalFloat>

leisurehound · 2020-05-30T20:21:27Z

For this simple problem then yes, that's doable. I'm basically working on a dummy project to learn some aspects of SwiftUI plotting, Combine and Coding CSV values.

The real project I plan to use all this on, however, is managing an assortment time series that are sampled at various frequencies. So column positions are important to determine if there is a datum at any particular sample time. Time series CSV is generally columnar data and in the exports I'll have to process I won't have control of the collapsing of columns in particular rows, they'll simply be empty.

So, for the dummy project yes its easy to adapt. But the dummy project has a pattern I've chosen specifically as its the construct I'll have to support when I get past learning/prototyping (I'm actually a believer in a different second system effect, the first time I don't know what I'm doing and it ends up ugly, unmaintainable, and often inefficient. When I do it for the second time I have the knowledge to do it right and am not left with refactoring artifacts I wish weren't there...so thats why I'm constraining this work a bit more than necessary...)

dehesa · 2020-05-30T20:39:27Z

I fixed a small behavior on decodeNil, which is specified in the documentation, but I didn't account for; i.e. if nil is encountered, then +1, must be added to the index; but if not, it should not. With this fix, now decodeIfPresent should work as intended.

For your specific problem, you actually have several solutions (be sure to download the latest from develop branch):

Changing row definition to <date>,<floatOrNil>. Similar to this:

let data = """
    <dateA>,1.23
    <dateB>,
    <dateC>,4.56
    """.data(using: .utf8)!
let decoder = CSVDecoder { $0.headerStrategy = .none }
let structs = try decoder.decode([CustomStruct].self, from: data)

Easiest.

struct CustomStruct: Decodable {
    let date: Date
    let response: Double?
    
    private enum CodingKeys: Int, CodingKey {
        case date = 0
        case response = 1
    }
}

Keeping the enum.

struct CustomStruct: Decodable {
    let date: Date
    let response: Response

    init(from decoder: Decoder) throws {
        var container = try decoder.unkeyedContainer()
        self.num = try container.decode(Date.self)
        self.response = try container.decode(Response.self)
    }
}

enum Response: Decodable {
    case time(duration: Double)
    case failure

    init(from decoder: Decoder) throws {
        let container = try decoder.singleValueContainer()
        self = container.decodeNil() ? .failure
                                     : .time(duration: try container.decode(Double.self))
    }
}

Keeping row definition (<datestring>,<float>, or <datestring>,<empty>,failure)

struct CustomStruct: Decodable {
    let date: Date
    let response: Response

    init(from decoder: Decoder) throws {
        var container = try decoder.unkeyedContainer()
        self.num = try container.decode(Date.self)
        if let duration = try container.decodeIfPresent(Double.self) {
            self.response = .time(duration: duration)
        } else {
            self.response = .failure
        }
    }
}

dehesa · 2020-05-30T20:41:47Z

Let me know if that works for you and then I will upstream the patch to master. Again, thank you for reporting.

leisurehound · 2020-05-31T00:28:43Z

Yes, I can decodeIfPresent() successfully now.

One last question, I encodeRow() one result at a time (because they're a live time series) thus am using lazy. After each encoding I endEncoding() and then write the data with outputStream.writeBytes(). I've learned that I have to re-create a new encoder for the next datum after I've endEncoding() on the last live time series datum. Is there a manner in which I can reuse an encoder or is creating a new encoder cheap enough on every live datum that is collected?

dehesa · 2020-05-31T06:51:56Z

That is not ideal. Creating a new encoder is not "very expensive", but you are creating and deallocating OutputStreams for each row (which is not the intended usage).

Why don't you just store the lazy encoder somewhere (it is a reference type) and reuse when a new row arrives?

// 1. Create lazy encoder and store it somewhere.
self.cacheEncoder = CSVEncoder {
    $0.bufferingStrategy = .sequential
}.lazy(into: /* Whatever you need */)

// 2. Then, every time a new row arrives.
try self.cacheEncoder.encodeRow(customStructure)

// 3. Once you stop the live subscription, call
try self.cacheEncoder.endEncoding()
// there is also a defined deinit, which will `endEncoding()` for you (if you haven't called it) when the reference is deallocated.

leisurehound · 2020-05-31T07:21:54Z

I did try to store the encoder as a class property, but endEncoding on each datum so the file is tail-able, for example. This caused the next datum encoding to fail. For a long running process with a live feed, the data persistence is paramount, so I keep my output stream tied to a file open and continuously append to it, and then close the output stream on deinit, which could be never (or at server reboot, or when files roll). Its somewhat akin to using CSV encoding for a logger, I guess, which may be a bit corrupt. But data loggers do that by nature, and CSV is a natural export format for columnar data of time series data.

Would an option to flush the encoder instead of end it be viable here? The other option is for the client code do so on a Timer, letting the buffer grow and end encoding then create a new encoder again.

I think this may simply be a case where encoding is not worth the effort beyond:

let data = "\(dateStr),\(duratitonStr),\(failureStr)".data(using: .utf8)

dehesa · 2020-05-31T08:49:45Z

If I understood everything correctly, I believe you have three options to satisfy your requirements.

Create a lazy encoder with .sequential buffering strategy.

Store the lazy encoder and call encodeRow(customType). The sequential buffering strategy only keeps in memory the last row to be written. Therefore, when a new row arrives, it writes the previous row. That wouldn't do what you want, but if the type being encoded uses a unkeyedContainer, the fields are written sequential and the encoder is actually writing in the file the fields as they arrive (and doesn't keep it in memory).

So to sum up, you would use a lazy encoder, call encodeRow every time a new row arrives (do not call endEncoding afterwards), and ensures that you are using unkeyedContainers and you are encoding all fields in the row (including empty fields). Then it will have the behavior you want.
Use the imperative CSVWriter.

This writer do exactly what you tell it to do. Thus, if you call in the writer's write(row:) it actually writes that row in the file including the row delimiter.
Just append data to the file as you were suggesting.

The upside of using an encoder or writer, is that they have an easier-to-understand API, they will handle any edge case (such as CSV escaping, etc.) and they free you from muddling with file handles. Still, sometimes using low-level APIs might be the solution if your data format never changes.

leisurehound · 2020-05-31T16:49:02Z

ok, thanks the details and the help here. Those details were not gleamed from your readme page when I first attempted to integrate the library. I may send you a PR for the README to expand on usage so its easier for first time users who may be new to CSV coding; what's there now seems to have some unstated assumptions/knowledge that I certainly didn't have. I'll admit, I had to trap errors to figure out what I needed to do or was doing wrong, which is not an efficient means and even still wasn't/am not really using it correctly without what you thus shared here.

dehesa · 2020-05-31T20:22:03Z

I agree there might be some assumptions in the README about how to use the library. I have spent so much time on it that some knowledge is second nature to me. I would be delighted if you send PRs my way; having a different view on how to express things is always valuable.

I will close this issue now, but feel free to open new ones if you have any more questions or you would like to improve something. If you decide to get involved in the project, the following things help:

Check the features board (some of those TODO tickets reference README enhancements).
Star the project.
Report more bug reports or confusing behavior.

leisurehound added the question Further information is requested label May 30, 2020

leisurehound assigned dehesa May 30, 2020

dehesa added bug Something isn't working and removed question Further information is requested labels May 30, 2020

dehesa added this to Needs triage in Bugs via automation May 30, 2020

dehesa moved this from Needs triage to High priority in Bugs May 30, 2020

dehesa added a commit that referenced this issue May 30, 2020

Fix floating-point number handling #20

25b0aa1

dehesa added a commit that referenced this issue May 30, 2020

Fix floating-point number handling #20

1d64a52

dehesa added a commit that referenced this issue May 30, 2020

Fix floating-point number handling #20

24f57c2

dehesa added a commit that referenced this issue May 30, 2020

Fix decodeNil behavior #20

192461d

dehesa closed this as completed May 31, 2020

Bugs automation moved this from High priority to Closed May 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is encoding Double or Float supported? #20

Is encoding Double or Float supported? #20

leisurehound commented May 30, 2020

dehesa commented May 30, 2020

dehesa commented May 30, 2020

dehesa commented May 30, 2020

leisurehound commented May 30, 2020

dehesa commented May 30, 2020

leisurehound commented May 30, 2020 •

edited

Loading

dehesa commented May 30, 2020 •

edited

Loading

leisurehound commented May 30, 2020 •

edited

Loading

dehesa commented May 30, 2020 •

edited

Loading

dehesa commented May 30, 2020

leisurehound commented May 31, 2020 •

edited

Loading

dehesa commented May 31, 2020

leisurehound commented May 31, 2020 •

edited

Loading

dehesa commented May 31, 2020

leisurehound commented May 31, 2020

dehesa commented May 31, 2020

Is encoding Double or Float supported? #20

Is encoding Double or Float supported? #20

Comments

leisurehound commented May 30, 2020

dehesa commented May 30, 2020

dehesa commented May 30, 2020

dehesa commented May 30, 2020

leisurehound commented May 30, 2020

dehesa commented May 30, 2020

leisurehound commented May 30, 2020 • edited Loading

dehesa commented May 30, 2020 • edited Loading

leisurehound commented May 30, 2020 • edited Loading

dehesa commented May 30, 2020 • edited Loading

dehesa commented May 30, 2020

leisurehound commented May 31, 2020 • edited Loading

dehesa commented May 31, 2020

leisurehound commented May 31, 2020 • edited Loading

dehesa commented May 31, 2020

leisurehound commented May 31, 2020

dehesa commented May 31, 2020

leisurehound commented May 30, 2020 •

edited

Loading

dehesa commented May 30, 2020 •

edited

Loading

leisurehound commented May 30, 2020 •

edited

Loading

dehesa commented May 30, 2020 •

edited

Loading

leisurehound commented May 31, 2020 •

edited

Loading

leisurehound commented May 31, 2020 •

edited

Loading