Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of date/time conversions #26

Closed
kkieffer opened this issue Jun 9, 2020 · 16 comments
Closed

Improve performance of date/time conversions #26

kkieffer opened this issue Jun 9, 2020 · 16 comments
Labels
bug Something isn't working

Comments

@kkieffer
Copy link

kkieffer commented Jun 9, 2020

As my database grows I've been noticing the performance starting to become an issue when compared with other database client implementations. I'm not sure if this is my code in particular or implementation differences.

Consider I have a table with 7 or so columns and about 6000 rows. The columns contain simple data: last name string, some integers, a timestamp, a few other strings. I want to retrieve the entire table - all rows and columns (so a simple SELECT statement). As I read each row from the query, the columns are parsed into their native types and stored in a list of objects.

I have a Java client using JDBC running on a laptop, for which this operation takes about 300 milliseconds. Roughly about 250 Kbytes are retrieved.

An iPhone and iPad on the same Wi-fi network running PostgresClientKit take about 8 seconds.

I don't think this is a network speed issue.
The laptop is faster of course, but hard to believe it's that much faster.
Both connections use SSL, possibly a performance issue in BlueSSLService.

Possibly the JDBC client is pre-fetching all rows before the cursor increments to them? Is there efficiency to be gained by trying to fetch all rows at once?

Could there be an underlying socket performance issue with blocking/non-blocking reads?

Any thoughts appreciated.

pitfield added a commit that referenced this issue Jun 10, 2020
@pitfield
Copy link
Member

To compare PostgresClientKit performance between macOS and iOS, I created two repositories:

https://github.com/pitfield/PostgresClientKit-Performance-CommandLine
https://github.com/pitfield/PostgresClientKit-Performance-iOS

The first is a macOS command line executable; the second is an iOS app. In each case, they create a simple table, INSERT 1000 rows into it, and then SELECT those 1000 rows, converting the columns to native types and accumulating the rows in an array.

I'd be interested to see results from your environment. You'll need to tweak the ConnectionConfiguration properties in main.swift and ViewController.swift. The apps log the timing info the stdout, and you can see it in the Xcode console.

In my environment, here are the times to SELECT and process the 1000 rows (3 runs of each configuration):

macOS           ssl=false   164ms 162ms 163ms  (2017 iMac)
                ssl=true    164ms 166ms 165ms
  
iOS simulator   ssl=false   155ms 153ms 155ms  (iPad 7th gen simulator)
                ssl=true    164ms 161ms 162ms
          
iOS device      ssl=false   348ms 347ms 358ms  (iPhone 6s)
                ssl=true    309ms 340ms 333ms

So it takes about twice as long on the iOS device as compared to the macOS command-line executable or iOS simulator. That seems about right to me. The iOS device needs to shove bytes over the wire; the other two do not. In each case, turning SSL on has little performance impact.

(When you run the apps, you'll also see how long it takes to INSERT the 1000 rows. For that measurement, the iOS device is much slower than the macOS command-line and iOS simulator. That too makes sense -- there are a lot more network round trips to INSERT 1000 rows, one at a time, than to SELECT 1000 rows at once.)

If, in your environment, you see a big difference with SSL turned on, there's one more thing you can try. There is a known issue in BlueSSLService that required a workaround in PostgresClientKit. The workaround involves a spin-wait when the BlueSSLService problem is detected. I've only seen this problem arise infrequently, and even then, only briefly...but there is the possibility it is causing a delay in your environment. I added some diagnostic logging to the pitfield/issue-26-diagnostics branch. You could try picking up the branch. If the BlueSSLService problem occurs, you'll see log entries containing "Issue #26: slept for 10 ms after socket read returned 0". Each such entry represents a 10ms delay.

I don't know if the JDBC client driver retrieves all rows before returning the ResultSet to the caller. Even if it did, the driver would see exactly the same bytes at the socket level. The Postgres network protocol doesn't provide any optimizations for clients that want all the rows at once; it always just writes one row after another. Still, it would be interesting to compare the macOS performance of JDBC vs PostgresClientKit (both with the server on the same host, and with the server on the network).

Hope this helps. Please let me know what you find.

@kkieffer
Copy link
Author

I think I've found the cause, and it's not socket related. Almost all of the time consumed is parsing a timestamp into a Date object:

 response = try request.query(connection: connection)
 for row in response {
         let columns = try row.get().columns
         //Other decodes (strings, booleans)
        mydate = try columns[7].timestamp().date(in: TimeZone.current)

The other columns decode fast but not the date. Perhaps try putting dates in your example and see if you have a performance drop.

I'll have a look at the implementation. If it can't be made faster perhaps the decoding could be pushed onto dispatch queues.

@helje5
Copy link

helje5 commented Jun 10, 2020

Probably using a DateFormatter, they can be extremely slow, often makes sense to drop to C level stuff.

@kkieffer
Copy link
Author

It appears to be using that, yes.

@pitfield
Copy link
Member

pitfield commented Jun 11, 2020

Yes, I can certainly believe that date/time conversions are slow. This is one of the reasons that PostgresClientKit doesn't immediately return column values as native types. The caller needs to ask for them.

In PostgresClientKit, these conversions are handled by DateFormatter, DateComponents, and Calendar. I believe that those types are ultimately implemented in C.

In your scenario, what is the TIMESTAMP column modeling? If it's a "point in time" (for example, created or lastModified), you might want to consider using TIMESTAMP WITH TIME ZONE instead (and PostgresTimestampWithTimeZone in Swift). TIMESTAMP records a year, month, day, hours, minutes, and seconds....but not the time zone in which to interpret those components. Consequently, everywhere a TIMESTAMP value is used, the time zone needs to be specified. In my experience, this is susceptible to bugs in date/time handling. For example, one time zone is specified in a database stored procedure, but a different one is selected in a client app.

TIMESTAMP WITH TIME ZONE values, on the other hand, explicitly specify the time zone. So there's no guessing. The Postgres server automatically normalizes these values to UTC/GMT for storage, which is handy. TIMESTAMP WITH TIME ZONE values represent "points in time", just like Date instances do in Swift, so converting back and forth is more natural. It's also quite a bit faster!

I did some benchmarking on my iPhone 6s, looking at how long it takes to convert from PostgresValue (which just wraps a string) to Timestamp/TimestampWithTimeZone, and then onto Date.

------------------------------------------------------  ------
Conversion from --> to                                  Time
------------------------------------------------------  ------
PostgresValue -> PostgresTimestamp                      205 us
PostgresValue -> PostgresTimestamp -> Date              294 us
PostgresValue -> PostgresTimestampWithTimeZone          109 us
PostgresValue -> PostgresTimestampWithTimeZone -> Date  112 us
------------------------------------------------------  ------

So 6000 PostgresValue -> PostgresTimestamp -> Date conversions would take 1.76 seconds, whereas PostgresValue -> PostgresTimestampWithTimeZone -> Date would take 0.67 seconds.

Another thought: Do you need to immediately convert all 6000 values to Dates? Could this be done lazily? For example, if you're showing them in a UITableView, how about deferring it until the table view asks for a cell?

[edited to correct type names in table]

@helje5
Copy link

helje5 commented Jun 11, 2020

So 6000 PostgresValue -> PostgresTimestamp -> Date conversions would take 1.76 seconds

Yeah, that is unacceptably slow. Avoid using DateFormatter when parsing dates in a protocol context at all costs. Yes, they also end up calling C code in ICU, but in super expensive ways (even for processing tiny amounts of data, like a few thousands of values).

@kkieffer
Copy link
Author

I tried two different approaches to speeding up. One was to fetch all the rows first, then dispatch each row for parsing on a concurrent dispatch queue. Despite multiple threads working at once there wasn't much improvement (possibly due to the mutexes needed to protect the collection that they were added to).

However this extension to PostgresValue was a marked improvement. From 8 seconds down to 3 seconds. Note however this approach isn't thread safe.

@helje5
Copy link

helje5 commented Jun 11, 2020

In your extension, as soon as you touch Calendar/Locale and the likes, everything is going to slow down.
Also, all those allocs are horrible, you should be able to just & the values?
To convert the local timezone, use mktime, localtime and friends (you don't seem to parse a timezone name?). Then feed the time_t into Date(timeIntervalSince1970:).

@kkieffer
Copy link
Author

The allocs are static - they're not used each time. Hence the non-reentrance. The author where I found this was down on mktime. I tried it and was slightly slower than Calendar/Locale.

I guess this is the best I can do - still a big improvement over the DateFormatter.

@pitfield
Copy link
Member

pitfield commented Jun 12, 2020

Thanks for looking into this. In this extension, I tried yet another approach. I also made a few tweaks to the vsscan approach. For an optimized build (Swift Compiler - Code Generation Optimization Level "-O"), the performance of both approaches is nearly identical, a shade under 40 us. (For reference, we started at 294 us.) I'd be interested to learn how these perform in your scenario. Thanks!

@pitfield
Copy link
Member

(Just fixed a bug in that extension to handle the "no fractional seconds" case.)

@kkieffer
Copy link
Author

@pitfield thanks for that extension. Great idea to use a static Calendar instead of allocating each time! That saved more processing time.

So I went from 8 seconds without the extension to 3 seconds with my first extension and adding the static calendar gets me down to 1.7 seconds. Not bad!

In either approach, its either not thread safe or you need a semaphore to protect the statics. C'est la vie.

In my case, this is working much better. Thanks @helje5 and @pitfield for your input!

If I could re-architect the database I think I'd store the time as a double instead, and avoid all these string parsing headaches.

@pitfield
Copy link
Member

OK, thanks for trying that out. I'm glad we found an approach that is more performant. I will incorporate this into the next release. In doing so, I want to revisit how PostgresClientKit does date/time conversions in general. I don't have time to do that right now, though, so I suggest you use your extension for moment.

@pitfield pitfield added the bug Something isn't working label Jun 16, 2020
@pitfield pitfield changed the title Performance issues Improve performance of date/time conversions Jun 16, 2020
@pitfield
Copy link
Member

This thread identified that the Foundation classes for date/time conversion, such as DateFormatter, are slow. Several more performant approaches were identified.

This bug, then, is to revisit how date/time conversions are implemented throughout PostgresClientKit, and incorporate more performant approaches where applicable.

@pitfield
Copy link
Member

Finally got this done. Here are the benchmark results:

-----------------------------------------------------------------------------------
String to Date conversion performed                 Before     After    Improvement
-----------------------------------------------------------------------------------
PostgresValue(_:).timestampWithTimeZone().date       67 us     11 us     6x faster
PostgresValue(_:).timestamp().date(in:)             171 us     13 us    13x faster
PostgresValue(_:).date().date(in:)                  159 us     13 us    12x faster
PostgresValue(_:).time().date(in:)                  166 us     16 us    10x faster
PostgresValue(_:).timeWithTimeZone().date           290 us     13 us    22x faster
-----------------------------------------------------------------------------------

@pitfield
Copy link
Member

Released in v1.3.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

3 participants