Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive changes to improve encode/decode speed #19

Merged
merged 39 commits into from May 26, 2016

Conversation

Sega-Zero
Copy link
Contributor

Related to #16. This pull request breaks the backward compatibility, so there should be created a v2.0.0 tag.

What have been done:

  1. Refactor encoder, use an indirect enum with associated values as a backing storage for data being encoded
  2. Remove all string-related code from both CerealEncoder and CerealDecoder and refactor all the code to use a new enum
  3. To improve the speed, internal dictionaries are replaced with arrays of tuples. This adds a bit more data into a result NSData if there is a value replacements during an encoding, but a decoder is guaranteed to decode only the last value.
  4. Introduced a new logic layer, CerealSerialization. It's function is to serialize/deserialize CoderTreeValue enum to/from NSData object. Right now, to achieve a better speed, it works with a raw-byte TLV structured data, but may be extended to any kind of data in the future: xml, json, or maybe even old string format to introduce a backward compatibility with 1.x versions.
  5. All the tests are rewritten to use a new TLV byte-arrays

Known issues:
Since the length of Int,Float and Double differs on x32 (Int==Int32) and x64 (Int==Int64) platforms, the result is incompatible between platforms.
IMHO, there is no need in supporting this, all the new Apple devices are x64, the x32 devices will no longer be supported in a couple of years. Those who need a complete compatibility should use the 1.x version. The tests are written for an x64 devices.

@@ -52,11 +87,12 @@ public struct CerealDecoder {
- returns: The instantiated object, or nil if no object was at the specified key.
*/
public func decode<DecodedType: CerealRepresentable>(key: String) throws -> DecodedType? {
guard let data = items[key] else {
guard let data = self.itemForKey(key) else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is self needed here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or any of the other newly altered decode methods? I prefer not to have self if its not necessary)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an old habit :) Will change it in a moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

private extension CoderTreeValue {
func numberOfStringEntries() -> Int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could just be computed property

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will be semantically correct in that case. Computed vars expected to be a lightweight operation, which may not be true here, since we calculate this value recursively on a possibly large tree. How do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I think of properties I don't generally think of complexity. Its more about wording and interaction. I only use a function if:

  • It must be able to throw
  • It requires arguments
  • It has no return value

In this case we're asking how many string entries this type has, and I think that works well as a property, similar to count on an Array

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I got your point. Will change it in a moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ketzusaka
Copy link
Contributor

The test changes are really great. How did you go about getting the values for those?

@Sega-Zero
Copy link
Contributor Author

That was very challenging. I set the breakpoint on each test and then used one of this helper functions:

func printBytes(array: [UInt8]) {
    let str = array.reduce("[") { $0.0 + String($0.1) + "," }
    print(str.substringToIndex(str.endIndex.predecessor()) + "]")
}

func encode(value: (inout CerealEncoder) throws -> ()) {
    var encoder = CerealEncoder()
    let _ = try? value(&encoder)
    printBytes(encoder.toBytes())
}

And then print the result bytes, copy it and paste inside test body.

For encoder dictionary tests I wrote one more function:

func printXCT(result: [UInt8]) {
    let prefix = result[0..<47].reduce("") { $0.0 + String($0.1) + "," }
    let divider = (result.count - 47)

    let leftFrom = 47
    let leftUntil = leftFrom + divider / 2

    let rightFrom = leftUntil
    let rightUntil = rightFrom + divider / 2

    let left = result[leftFrom..<leftUntil].reduce("") { $0.0 + String($0.1) + "," }
    let right = result[rightFrom..<rightUntil].reduce("") { $0.0 + String($0.1) + "," }
    print("XCTAssertTrue(result.hasArrayPrefix([\(prefix.substringToIndex(prefix.endIndex.predecessor()))]))\nXCTAssertTrue(result.containsSubArray([\(left.substringToIndex(left.endIndex.predecessor()))]))\nXCTAssertTrue(result.containsSubArray([\(right.substringToIndex(right.endIndex.predecessor()))]))")
}

Since all the tests there was using the same key wat, I cut first header bytes, then a key string bytes (that's 47 bytes) and then split the rest by half. I added a few line breaks for an arrays that was too long or to separate dictionary subarray for more readability.

The hardest part was a decoding tests. Each test was prepared manually by setting a breakpoint and writing in console expressions like this:

po self.encode { try $0.encode([MyBar(bar: "baz"):[1.0,2.0] as [Double]], forKey: "hi") }

There was a few tests where I couldn't do this (like the error ones), so I gathered a whole byte array by pieces and then debug it accurately %)

@Sega-Zero
Copy link
Contributor Author

Are there any other stuff that should be fixed before merging this PR? :)

@ketzusaka
Copy link
Contributor

Nope, I think this is good to merge. I'm going to do some testing with my projects using this tomorrow to make sure it jives well and if all is well ill tag and release :)

@ketzusaka ketzusaka merged commit 4655980 into Weebly:master May 26, 2016
@Sega-Zero
Copy link
Contributor Author

Awesome! I'll finally switch my projects Podfile to upstream =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants