Skip to content
This repository was archived by the owner on Feb 2, 2024. It is now read-only.

Conversation

@kozlov-alexey
Copy link
Contributor

@kozlov-alexey kozlov-alexey commented Nov 11, 2020

Motivation: old implementation of df.drop() produced LLVM IR of large size on DFs
with hundreds of columns, since it extracts all the columns from original DF and then packs
them into a internal structure (a tuple of lists of arrays) again. The new implementation
will make copy of internal df structure and just pop dropped columns from selected lists, which
heavily reduces IR size and compilation time.

Some numbers:

n_columns   8 16 32 64 128 256 512
LLVM IR size, Mb on master 0.46104 0.79895 1.475782 2.833022 5.595263 11.13747 22.23876
LLVM IR size, Mb With PR #945 0.478319 0.491075 0.516738 0.568007 0.671354 0.880129 1.297598
ratio without/with   0.963874 1.626943 2.855959 4.987649 8.334293 12.65437 17.13841
compilation time, s on master 1.009 1.188911 2.137561 4.167516 9.775089 23.66213 68.41024
compilation time, s With PR #945 0.908008 0.751528 0.992976 1.34102 2.412094 4.490081 8.600799
ratio without/with   1.111223 1.581993 2.152681 3.107722 4.052533 5.269869 7.95394

@kozlov-alexey kozlov-alexey changed the title Refactor df.drop to improve compile times WIP: Refactor df.drop to improve compile times Nov 11, 2020
@kozlov-alexey kozlov-alexey changed the title WIP: Refactor df.drop to improve compile times Refactor df.drop to improve compile times Nov 12, 2020
commit 45e67860d0fc7c2339250764944253d45398e88f
Author: Kozlov, Alexey <alexey.kozlov@intel.com>
Date:   Sat Nov 21 00:39:53 2020 +0300

    Reverting back to use of list.initial_value, updating tests

commit 5b15f127f48e836b266c7817844978e1c28e2dbb
Author: Kozlov, Alexey <alexey.kozlov@intel.com>
Date:   Tue Nov 17 01:47:49 2020 +0300

    Trying LiteralList instead (works with typeinfer fix in numba)

commit b54d365e6ae4e735e9a738b9f41fcc28cad81472
Author: Kozlov, Alexey <alexey.kozlov@intel.com>
Date:   Mon Nov 16 20:00:51 2020 +0300

    Non-working case for changed list of dropped columns

commit 31768d874a645824633c2adfbe8a193870486835
Author: Kozlov, Alexey <alexey.kozlov@intel.com>
Date:   Wed Nov 11 21:35:26 2020 +0300

    Make df.drop aligned with pandas and take columns as list
@AlexanderKalistratov AlexanderKalistratov merged commit d5f706f into IntelPython:master Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants