Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Includings of function.h are expensive #39357

Closed
zanmato1984 opened this issue Dec 22, 2023 · 0 comments · Fixed by #39312
Closed

[C++] Includings of function.h are expensive #39357

zanmato1984 opened this issue Dec 22, 2023 · 0 comments · Fixed by #39312

Comments

@zanmato1984
Copy link
Collaborator

zanmato1984 commented Dec 22, 2023

Describe the enhancement requested

This is a specialized sub-task of #36246 . ClangBuildAnalyzer reports that function.h is listed as the second most expensive header except the ones in third-party libraries.

30921 ms: arrow/array/data.h (included 387 times, avg 79 ms), included via:
29603 ms: arrow/compute/function.h (included 219 times, avg 135 ms), included via:
28142 ms: arrow/compute/kernel.h (included 226 times, avg 124 ms), included via:
...

The solution may be, quote from #36246 :

In addition, the function.h include is rather heavy. It is included often because it is needed by the api_xyz.h files in the compute module. However, these files only need the function options. We should see if breaking function options into its own file helps shave down the build time.

Component(s)

C++

@zanmato1984 zanmato1984 changed the title [C++] Inclusion of function.h is expensive [C++] Includings of function.h are expensive Dec 22, 2023
felipecrv pushed a commit that referenced this issue Dec 26, 2023
### Rationale for this change

As proposed in #36246 , by splitting function option structs from `function.h`, we can reduce the including of `function.h`. So that the total build time could be reduced.

The total parser time could be reduced from 722.3s to 709.7s. And the `function.h` along with its transitive inclusion of `kernel.h` don't show up in expensive headers any more.

The detailed analysis result before and after this PR are attached: 
[analyze-before.txt](https://github.com/apache/arrow/files/13756923/analyze-before.txt)
[analyze-after.txt](https://github.com/apache/arrow/files/13756924/analyze-after.txt)

Disclaimer (quote from #36246 (comment)):
> Note that the time diff is not absolute. The ClangBuildAnalyzer result differs from time to time. I guess it depends on the idle-ness of the building machine when doing the experiment. But the time reduction is almost certain, though sometimes more sometimes less. And the inclusion times of the questioning headers are reduced for sure, as shown in the attachments in my other comment.

### What changes are included in this PR?

Move function option structs into own `compute/options.h`, and change including `function.h` to including `options.h` wherever fits.

### Are these changes tested?

Build is testing.

### Are there any user-facing changes?

There could be potential build failures for user code (quote from #36246 (comment)):
> The header function.h remains in compute/api.h, with and without this PR. The proposed PR removes function.h from api_xxx.h (then includes options.h instead), as proposed in the initial description of this issue. This results in compile failures for user code which includes only compute/api_xxx.h but not compute/api.h, and meanwhile uses CallFunction which is declared in function.h.

But I think it's OK as described in #36246 (comment).

* Closes: #39357

Authored-by: zanmato <zanmato1984@gmail.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
@felipecrv felipecrv added this to the 15.0.0 milestone Dec 26, 2023
clayburn pushed a commit to clayburn/arrow that referenced this issue Jan 23, 2024
### Rationale for this change

As proposed in apache#36246 , by splitting function option structs from `function.h`, we can reduce the including of `function.h`. So that the total build time could be reduced.

The total parser time could be reduced from 722.3s to 709.7s. And the `function.h` along with its transitive inclusion of `kernel.h` don't show up in expensive headers any more.

The detailed analysis result before and after this PR are attached: 
[analyze-before.txt](https://github.com/apache/arrow/files/13756923/analyze-before.txt)
[analyze-after.txt](https://github.com/apache/arrow/files/13756924/analyze-after.txt)

Disclaimer (quote from apache#36246 (comment)):
> Note that the time diff is not absolute. The ClangBuildAnalyzer result differs from time to time. I guess it depends on the idle-ness of the building machine when doing the experiment. But the time reduction is almost certain, though sometimes more sometimes less. And the inclusion times of the questioning headers are reduced for sure, as shown in the attachments in my other comment.

### What changes are included in this PR?

Move function option structs into own `compute/options.h`, and change including `function.h` to including `options.h` wherever fits.

### Are these changes tested?

Build is testing.

### Are there any user-facing changes?

There could be potential build failures for user code (quote from apache#36246 (comment)):
> The header function.h remains in compute/api.h, with and without this PR. The proposed PR removes function.h from api_xxx.h (then includes options.h instead), as proposed in the initial description of this issue. This results in compile failures for user code which includes only compute/api_xxx.h but not compute/api.h, and meanwhile uses CallFunction which is declared in function.h.

But I think it's OK as described in apache#36246 (comment).

* Closes: apache#39357

Authored-by: zanmato <zanmato1984@gmail.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
### Rationale for this change

As proposed in apache#36246 , by splitting function option structs from `function.h`, we can reduce the including of `function.h`. So that the total build time could be reduced.

The total parser time could be reduced from 722.3s to 709.7s. And the `function.h` along with its transitive inclusion of `kernel.h` don't show up in expensive headers any more.

The detailed analysis result before and after this PR are attached: 
[analyze-before.txt](https://github.com/apache/arrow/files/13756923/analyze-before.txt)
[analyze-after.txt](https://github.com/apache/arrow/files/13756924/analyze-after.txt)

Disclaimer (quote from apache#36246 (comment)):
> Note that the time diff is not absolute. The ClangBuildAnalyzer result differs from time to time. I guess it depends on the idle-ness of the building machine when doing the experiment. But the time reduction is almost certain, though sometimes more sometimes less. And the inclusion times of the questioning headers are reduced for sure, as shown in the attachments in my other comment.

### What changes are included in this PR?

Move function option structs into own `compute/options.h`, and change including `function.h` to including `options.h` wherever fits.

### Are these changes tested?

Build is testing.

### Are there any user-facing changes?

There could be potential build failures for user code (quote from apache#36246 (comment)):
> The header function.h remains in compute/api.h, with and without this PR. The proposed PR removes function.h from api_xxx.h (then includes options.h instead), as proposed in the initial description of this issue. This results in compile failures for user code which includes only compute/api_xxx.h but not compute/api.h, and meanwhile uses CallFunction which is declared in function.h.

But I think it's OK as described in apache#36246 (comment).

* Closes: apache#39357

Authored-by: zanmato <zanmato1984@gmail.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants