Refactor df.drop to improve compile times #945

kozlov-alexey · 2020-11-11T13:15:13Z

Motivation: old implementation of df.drop() produced LLVM IR of large size on DFs
with hundreds of columns, since it extracts all the columns from original DF and then packs
them into a internal structure (a tuple of lists of arrays) again. The new implementation
will make copy of internal df structure and just pop dropped columns from selected lists, which
heavily reduces IR size and compilation time.

Some numbers:

n_columns		8	16	32	64	128	256	512
LLVM IR size, Mb	on master	0.46104	0.79895	1.475782	2.833022	5.595263	11.13747	22.23876
LLVM IR size, Mb	With PR #945	0.478319	0.491075	0.516738	0.568007	0.671354	0.880129	1.297598
ratio without/with		0.963874	1.626943	2.855959	4.987649	8.334293	12.65437	17.13841
compilation time, s	on master	1.009	1.188911	2.137561	4.167516	9.775089	23.66213	68.41024
compilation time, s	With PR #945	0.908008	0.751528	0.992976	1.34102	2.412094	4.490081	8.600799
ratio without/with		1.111223	1.581993	2.152681	3.107722	4.052533	5.269869	7.95394

sdc/datatypes/hpat_pandas_dataframe_functions.py

sdc/tests/test_dataframe.py

commit 45e67860d0fc7c2339250764944253d45398e88f Author: Kozlov, Alexey <alexey.kozlov@intel.com> Date: Sat Nov 21 00:39:53 2020 +0300 Reverting back to use of list.initial_value, updating tests commit 5b15f127f48e836b266c7817844978e1c28e2dbb Author: Kozlov, Alexey <alexey.kozlov@intel.com> Date: Tue Nov 17 01:47:49 2020 +0300 Trying LiteralList instead (works with typeinfer fix in numba) commit b54d365e6ae4e735e9a738b9f41fcc28cad81472 Author: Kozlov, Alexey <alexey.kozlov@intel.com> Date: Mon Nov 16 20:00:51 2020 +0300 Non-working case for changed list of dropped columns commit 31768d874a645824633c2adfbe8a193870486835 Author: Kozlov, Alexey <alexey.kozlov@intel.com> Date: Wed Nov 11 21:35:26 2020 +0300 Make df.drop aligned with pandas and take columns as list

sdc/datatypes/hpat_pandas_dataframe_functions.py

kozlov-alexey added 2 commits November 10, 2020 21:48

Refactor df.drop() impl to avoid quadratic compile time

7f11b0a

Remove debug traces, add test for empty df

c31f5ad

kozlov-alexey requested a review from AlexanderKalistratov November 11, 2020 13:15

kozlov-alexey changed the title ~~Refactor df.drop to improve compile times~~ WIP: Refactor df.drop to improve compile times Nov 11, 2020

kozlov-alexey added the Ready for Review label Nov 11, 2020

kozlov-alexey commented Nov 11, 2020

View reviewed changes

sdc/datatypes/hpat_pandas_dataframe_functions.py Show resolved Hide resolved

kozlov-alexey commented Nov 11, 2020

View reviewed changes

sdc/tests/test_dataframe.py Outdated Show resolved Hide resolved

kozlov-alexey commented Nov 11, 2020

View reviewed changes

sdc/tests/test_dataframe.py Outdated Show resolved Hide resolved

kozlov-alexey changed the title ~~WIP: Refactor df.drop to improve compile times~~ Refactor df.drop to improve compile times Nov 12, 2020

kozlov-alexey added 2 commits November 20, 2020 20:35

Merge branch 'master' into feature/new_df_drop_impl

3cd74f3

kozlov-alexey requested review from Hardcode84 and densmirn November 21, 2020 12:31

kozlov-alexey commented Nov 23, 2020

View reviewed changes

sdc/datatypes/hpat_pandas_dataframe_functions.py Outdated Show resolved Hide resolved

Fixing a bug in missed array lists copying

ac409f0

AlexanderKalistratov approved these changes Dec 7, 2020

View reviewed changes

AlexanderKalistratov merged commit d5f706f into IntelPython:master Dec 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor df.drop to improve compile times #945

Refactor df.drop to improve compile times #945

Uh oh!

kozlov-alexey commented Nov 11, 2020 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Refactor df.drop to improve compile times #945

Refactor df.drop to improve compile times #945

Uh oh!

Conversation

kozlov-alexey commented Nov 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kozlov-alexey commented Nov 11, 2020 •

edited

Loading