Massive changes to improve encode/decode speed #19

Sega-Zero · 2016-05-09T00:58:37Z

Related to #16. This pull request breaks the backward compatibility, so there should be created a v2.0.0 tag.

What have been done:

Refactor encoder, use an indirect enum with associated values as a backing storage for data being encoded
Remove all string-related code from both CerealEncoder and CerealDecoder and refactor all the code to use a new enum
To improve the speed, internal dictionaries are replaced with arrays of tuples. This adds a bit more data into a result NSData if there is a value replacements during an encoding, but a decoder is guaranteed to decode only the last value.
Introduced a new logic layer, CerealSerialization. It's function is to serialize/deserialize CoderTreeValue enum to/from NSData object. Right now, to achieve a better speed, it works with a raw-byte TLV structured data, but may be extended to any kind of data in the future: xml, json, or maybe even old string format to introduce a backward compatibility with 1.x versions.
All the tests are rewritten to use a new TLV byte-arrays

Known issues:
Since the length of Int,Float and Double differs on x32 (Int==Int32) and x64 (Int==Int64) platforms, the result is incompatible between platforms.
IMHO, there is no need in supporting this, all the new Apple devices are x64, the x32 devices will no longer be supported in a couple of years. Those who need a complete compatibility should use the 1.x version. The tests are written for an x64 devices.

…ry capacity

…ociated values

…alization for better control

…org/browse/SR-696

ketzusaka · 2016-05-13T19:54:44Z

Cereal/CerealDecoder.swift

@@ -52,11 +87,12 @@ public struct CerealDecoder {
    - returns:      The instantiated object, or nil if no object was at the specified key.
    */
    public func decode<DecodedType: CerealRepresentable>(key: String) throws -> DecodedType? {
-        guard let data = items[key] else {
+        guard let data = self.itemForKey(key) else {


Is self needed here?

(or any of the other newly altered decode methods? I prefer not to have self if its not necessary)

Just an old habit :) Will change it in a moment.

ketzusaka · 2016-05-13T20:50:46Z

Cereal/CerealSerialization.swift

+}
+
+private extension CoderTreeValue {
+    func numberOfStringEntries() -> Int {


I think this could just be computed property

I don't think this will be semantically correct in that case. Computed vars expected to be a lightweight operation, which may not be true here, since we calculate this value recursively on a possibly large tree. How do you think?

When I think of properties I don't generally think of complexity. Its more about wording and interaction. I only use a function if:

It must be able to throw

It requires arguments

It has no return value

In this case we're asking how many string entries this type has, and I think that works well as a property, similar to count on an Array

Ok, I got your point. Will change it in a moment.

ketzusaka · 2016-05-13T21:09:42Z

The test changes are really great. How did you go about getting the values for those?

Sega-Zero · 2016-05-13T21:41:33Z

That was very challenging. I set the breakpoint on each test and then used one of this helper functions:

func printBytes(array: [UInt8]) {
    let str = array.reduce("[") { $0.0 + String($0.1) + "," }
    print(str.substringToIndex(str.endIndex.predecessor()) + "]")
}

func encode(value: (inout CerealEncoder) throws -> ()) {
    var encoder = CerealEncoder()
    let _ = try? value(&encoder)
    printBytes(encoder.toBytes())
}

And then print the result bytes, copy it and paste inside test body.

For encoder dictionary tests I wrote one more function:

func printXCT(result: [UInt8]) {
    let prefix = result[0..<47].reduce("") { $0.0 + String($0.1) + "," }
    let divider = (result.count - 47)

    let leftFrom = 47
    let leftUntil = leftFrom + divider / 2

    let rightFrom = leftUntil
    let rightUntil = rightFrom + divider / 2

    let left = result[leftFrom..<leftUntil].reduce("") { $0.0 + String($0.1) + "," }
    let right = result[rightFrom..<rightUntil].reduce("") { $0.0 + String($0.1) + "," }
    print("XCTAssertTrue(result.hasArrayPrefix([\(prefix.substringToIndex(prefix.endIndex.predecessor()))]))\nXCTAssertTrue(result.containsSubArray([\(left.substringToIndex(left.endIndex.predecessor()))]))\nXCTAssertTrue(result.containsSubArray([\(right.substringToIndex(right.endIndex.predecessor()))]))")
}

Since all the tests there was using the same key wat, I cut first header bytes, then a key string bytes (that's 47 bytes) and then split the rest by half. I added a few line breaks for an arrays that was too long or to separate dictionary subarray for more readability.

The hardest part was a decoding tests. Each test was prepared manually by setting a breakpoint and writing in console expressions like this:

po self.encode { try $0.encode([MyBar(bar: "baz"):[1.0,2.0] as [Double]], forKey: "hi") }

There was a few tests where I couldn't do this (like the error ones), so I gathered a whole byte array by pieces and then debug it accurately %)

Sega-Zero · 2016-05-22T23:00:15Z

Are there any other stuff that should be fixed before merging this PR? :)

ketzusaka · 2016-05-26T23:42:46Z

Nope, I think this is good to merge. I'm going to do some testing with my projects using this tomorrow to make sure it jives well and if all is well ill tag and release :)

Sega-Zero · 2016-05-26T23:47:25Z

Awesome! I'll finally switch my projects Podfile to upstream =)

Sega-Zero added 30 commits April 17, 2016 23:34

string replacement tlv containers

2d3d783

replace CerealTLV with more efficient data structures

b3ff04f

changes in project file

e3672ad

Add new method to CerealType that will help to reserve array/dictiona…

3db9b19

…ry capacity

Refactor CerealEncoder. Get rid of strings in favour of enum with ass…

af99be4

…ociated values

minor encoding changes

2dd2eee

decoding CoderTreeValue from raw bytes

fc9aec3

refactor decoder to use CoderTreeValue

1ed6564

slightly faster way of getting a value

0b03d29

few comments, better naming

564df62

get rid of numberOfEntries method

2e563b2

fixed IdentifyingTree decoding

a7da1ac

fixed non-string decoding

8a548c8

incapsulated all serialization/deserialization code inside CerealSeri…

175e250

…alization for better control

throwing more appropriate errors

2454e10

CerealTypeIdentifier is not needed anymore

27fc944

few more comments in CerealSerialization

bd84a91

Build strings from slices directly without array allocations

378d51f

get rid of numberOfEntries in example structs

59abe0f

refactor CerealEncoderTests

78aebdf

refactor encoder dictionary tests

517402b

more friendly generic name

c8a0514

refactor encoder code to workaround compiler bug: https://bugs.swift.…

a8221e4

…org/browse/SR-696

refactor encoder array tests

908942e

fix bools encoding

c6a939f

a few more code fo decoder to pass decoding tests

68e7673

change generic T to DecodedType

86302e8

fix boolean tests

dab7139

fix cereal decoding + zero length string

725387d

refactor simple decoder tests

86efeb7

Sega-Zero added 3 commits May 8, 2016 19:40

refactored decoder array tests

a059ba4

refactor decoder dictionary tests

f505d00

Add a test to ensure only last value is decoded

5b3ea18

Sega-Zero mentioned this pull request May 9, 2016

Extremely slow encoding/decoding on a large number of items #16

Closed

ketzusaka reviewed May 13, 2016
View reviewed changes

remove unnecesary self's

1f0e34d

ketzusaka reviewed May 13, 2016
View reviewed changes

Sega-Zero added 5 commits May 14, 2016 01:09

use much faster string constructor

ba25869

computed property instead of method

6adcc18

remove unused error types

e3146da

add TypeMismatch error

6fb0c6a

refactor error tests to a new TypeMismatch error

8469af5

ketzusaka merged commit 4655980 into Weebly:master May 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Massive changes to improve encode/decode speed #19

Massive changes to improve encode/decode speed #19

Sega-Zero commented May 9, 2016

ketzusaka May 13, 2016

ketzusaka May 13, 2016

Sega-Zero May 13, 2016

Sega-Zero May 13, 2016

ketzusaka May 13, 2016

Sega-Zero May 13, 2016

ketzusaka May 13, 2016

Sega-Zero May 13, 2016

Sega-Zero May 13, 2016

ketzusaka commented May 13, 2016

Sega-Zero commented May 13, 2016

Sega-Zero commented May 22, 2016

ketzusaka commented May 26, 2016

Sega-Zero commented May 26, 2016

Massive changes to improve encode/decode speed #19

Massive changes to improve encode/decode speed #19

Conversation

Sega-Zero commented May 9, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ketzusaka commented May 13, 2016

Sega-Zero commented May 13, 2016

Sega-Zero commented May 22, 2016

ketzusaka commented May 26, 2016

Sega-Zero commented May 26, 2016