Implements init_dataframe as multiple codegen functions #936

kozlov-alexey · 2020-10-19T00:58:03Z

Motivation: init_dataframe was implemented via Numba intrinsic taking *args,
which seems to generate redundant extractvalue/insertvalue LLVM
instructions, producing quadratic IR when number of DF columns grows and affecting
total compilation time of function that create large DFs. This PR
replaces singe init_dataframe with multiple functions basing on number of columns
in a DF which are generated at compile time, thus avoiding use of *args.

n_columns		8	16	32	64	128	256	512
LLVM IR size, Mb	on master	0.287622	0.55394	1.262865	3.383549	10.44003	35.79943	131.384
LLVM IR size, Mb	With PR #936	0.143275	0.209119	0.341938	0.608992	1.143528	2.220672	4.406426
ratio without/with		2.007482	2.648924	3.693257	5.555986	9.12967	16.12099	29.81645
compilation time, s	on master	0.521313	0.366884	0.67621	1.39326	4.603106	17.54948	126.7943
compilation time, s	With PR #936	0.683099	0.413965	0.450348	0.715598	1.454044	3.210638	6.943996
ratio without/with		0.763159	0.886268	1.501529	1.946987	3.165726	5.466041	18.25956

pep8speaks · 2020-10-19T00:58:09Z

Hello @kozlov-alexey! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-11-13 15:02:36 UTC

Motivation: init_dataframe was implemented via Numba intrinsic taking *args, which seems to generate redundant extractvalue/insertvalue LLVM instructions, producing quadratic IR when number of DF columns grows and affecting total compilation time of function that create large DFs. This PR replaces singe init_dataframe with multiple functions basing on number of columns in a DF which are generated at compile time, thus avoiding use of *args.

kozlov-alexey · 2020-10-19T17:28:35Z

Test failures of read_csv tests with:

Failed in nopython mode pipeline (step: nopython rewrites)
module 'sdc.hiframes.pd_dataframe_ext' has no attribute 'init_dataframe'

are expected because this PR requires changes from #918 which was rolled-back recently. So this will be blocked until #918 is returned.

sdc/rewrites/dataframe_constructor.py

AlexanderKalistratov · 2020-11-12T00:33:52Z

@kozlov-alexey @xaleryb win 3.6 build fails with svml error again:

test_series_apply_np (sdc.tests.test_series.TestSeries) ... LLVM ERROR: Symbol not found: __svml_log4_ha

kozlov-alexey · 2020-11-17T13:27:13Z

@kozlov-alexey @xaleryb win 3.6 build fails with svml error again:
test_series_apply_np (sdc.tests.test_series.TestSeries) ... LLVM ERROR: Symbol not found: __svml_log4_ha

I think something's wrong with the packages being used (see mkl and many others are installed from public channels, but not built). Can this be a reason?

kozlov-alexey force-pushed the feature/reduce_df_ctor_ir_size branch from d2a6b7e to 45bbc80 Compare October 19, 2020 01:02

kozlov-alexey added the Waiting other PR This PR depends on functionality to be merged in other PR label Oct 19, 2020

Merge branch 'master' into feature/reduce_df_ctor_ir_size

527ac9b

kozlov-alexey removed the Waiting other PR This PR depends on functionality to be merged in other PR label Nov 11, 2020

Manually inline fix_df_array/index calls into ctor

1155b3e

kozlov-alexey requested review from AlexanderKalistratov, Hardcode84 and densmirn and removed request for densmirn November 11, 2020 16:31

Fixing PEP

d5863f5

kozlov-alexey force-pushed the feature/reduce_df_ctor_ir_size branch from 3aaeb48 to d5863f5 Compare November 11, 2020 16:54

kozlov-alexey added the Ready for Review label Nov 11, 2020

AlexanderKalistratov reviewed Nov 12, 2020

View reviewed changes

sdc/rewrites/dataframe_constructor.py Outdated Show resolved Hide resolved

AlexanderKalistratov reviewed Nov 12, 2020

View reviewed changes

sdc/rewrites/dataframe_constructor.py Outdated Show resolved Hide resolved

kozlov-alexey added 2 commits November 13, 2020 02:06

Merge branch 'master' into feature/reduce_df_ctor_ir_size

318dfd8

Applying remarks and minor updates to tests

473d773

kozlov-alexey force-pushed the feature/reduce_df_ctor_ir_size branch from 7448111 to 473d773 Compare November 13, 2020 15:02

AlexanderKalistratov approved these changes Nov 17, 2020

View reviewed changes

AlexanderKalistratov merged commit 70b5ae8 into IntelPython:master Nov 17, 2020

kozlov-alexey mentioned this pull request Dec 14, 2021

Why @sdc_overload_method use func_text instead of normal python func definition? #999

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implements init_dataframe as multiple codegen functions #936

Implements init_dataframe as multiple codegen functions #936

Uh oh!

kozlov-alexey commented Oct 19, 2020 •

edited

Loading

Uh oh!

pep8speaks commented Oct 19, 2020 •

edited

Loading

Uh oh!

kozlov-alexey commented Oct 19, 2020

Uh oh!

Uh oh!

Uh oh!

AlexanderKalistratov commented Nov 12, 2020

Uh oh!

kozlov-alexey commented Nov 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Implements init_dataframe as multiple codegen functions #936

Implements init_dataframe as multiple codegen functions #936

Uh oh!

Conversation

kozlov-alexey commented Oct 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Oct 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2020-11-13 15:02:36 UTC

Uh oh!

kozlov-alexey commented Oct 19, 2020

Uh oh!

Uh oh!

Uh oh!

AlexanderKalistratov commented Nov 12, 2020

Uh oh!

kozlov-alexey commented Nov 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kozlov-alexey commented Oct 19, 2020 •

edited

Loading

pep8speaks commented Oct 19, 2020 •

edited

Loading