Skip to content
This repository has been archived by the owner on May 5, 2019. It is now read-only.

Specialize row_group_slots() and findrow() on column types to improve performance #79

Merged
merged 1 commit into from Aug 7, 2017

Commits on Aug 7, 2017

  1. Specialize row_group_slots() and findrow() on column types to improve…

    … performance
    
    Looping over columns is very slow when their type is unknown at compile time.
    Specialize the method on the types of the key (grouping) columns by passing
    a tuple of columns rather than a DataTable. This will force compiling a specific
    method for each combination of key types, but their number should remain relatively low
    and the one-time cost is worth it.
    
    This dramatically improves performance of groupby(), but does not have a large
    effect on join() since it is very inefficient in other areas.
    
    Also add return type assertion for rowhash(). The fact that the type of
    the columns isn't known at compile time appears to confuse inference,
    which isn't able to detect that this function always returns UInt.
    This reduces a lot the number of allocations when calling join(),
    but doesn't really change performance.
    nalimilan committed Aug 7, 2017
    Configuration menu
    Copy the full SHA
    9e327e1 View commit details
    Browse the repository at this point in the history