Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Investigate scalar.h usage and reduce cost of function.h include #36246

Open
westonpace opened this issue Jun 22, 2023 · 22 comments
Open

Comments

@westonpace
Copy link
Member

Describe the enhancement requested

Scalars are usually passed around via shared_ptr. So we can often get away with a forward declaration. However, clang build analyzer reports that the scalar.h header is included quite often. We should investigate why this is and see if we can shave a bit of time off our builds by fixing it.

In addition, the function.h include is rather heavy. It is included often because it is needed by the api_xyz.h files in the compute module. However, these files only need the function options. We should see if breaking function options into its own file helps shave down the build time.

Component(s)

C++

@pitrou
Copy link
Member

pitrou commented Aug 22, 2023

@js8544 @mapleFU Would one of you be interested in looking at this?

@zanmato1984
Copy link
Collaborator

I did some investigation on this. With the change drafted in #39312, the result shows a small reduction on build time (1350.7s VS. 1298.6s) and a noticeable reduction on the inclusion of function.h (not showing up in expensive headers section). The experiment is conducted on my local Mac M1, building in ninja-debug preset and using ninja all command.

Here attached detailed analysis results before and after #39312 :
analyze-after.txt
analyze-before.txt

I'm not sure if we should proceed with the PR though.

cc @pitrou @westonpace

@zanmato1984
Copy link
Collaborator

One thing to note, as you can see in analyze-before.txt, the scalar.h doesn't show up as an expensive header on current code base. Maybe some change in between refined it already.

@zanmato1984
Copy link
Collaborator

Update.

Found a problem in CI indicating something that we should probably pay attention to:
With function.h removed from api_aggregate.h
https://github.com/apache/arrow/pull/39312/files#diff-694a635f185cf0d22dd6eefd91acedb48ed0bf71b15fd591aa6e7c508ba4cb09L25-R25
the declarations in function.h such as CallFunction() will not be in api_aggregate.h.
Then some user code which includes api_aggregate.h only but uses CallFunction() will fail compile. This is the case in this CI failure: https://github.com/apache/arrow/actions/runs/7270431142/job/19809507298?pr=39312

/arrow/cpp/examples/arrow/compute_and_write_csv_example.cc:81:41: error: 'CallFunction' is not a member of 'arrow::compute'; did you mean 'Function'?

I'm worrying this is a breaking change. Thoughts?

@mapleFU
Copy link
Member

mapleFU commented Dec 20, 2023

the declarations in function.h such as CallFunction() will not be in api_aggregate.h.

Seems api_scalar.h or other would have the same problem here?

@zanmato1984
Copy link
Collaborator

the declarations in function.h such as CallFunction() will not be in api_aggregate.h.

Seems api_scalar.h or other would have the same problem here?

Right, they essentially have the same problem. When user code is like below:

#include <arrow/compute/api_scalar.h>
// ...
arrow::compute::CallFunction("xxx", {array_a, array_b}); // Unknown function
// ...

However note that if we include compute/api.h instead, or include compute/api.h and compute/api_xxx.h both, it will be fine:

#include <arrow/compute/api.h>
// ...
arrow::compute::CallFunction("xxx", {array_a, array_b}); // OK
// ...

@pitrou
Copy link
Member

pitrou commented Dec 20, 2023

I agree that CallFunction should remain part of compute/api.h.

I think we could also investigate other improvements:

  • avoid including kernel.h from function.h
  • avoid including scalar.h from datum.h

@zanmato1984
Copy link
Collaborator

I agree that CallFunction should remain part of compute/api.h.

Sorry but I think you meant compute/api_xxx.h, so that legacy user code in the above pattern doesn't break? CallFunction is in compute/api.h already.

I think we could also investigate other improvements:

  • avoid including kernel.h from function.h
  • avoid including scalar.h from datum.h

Of course, will do. (Assuming we need to still keep CallFunction in compute/api_xxx.h).

@pitrou
Copy link
Member

pitrou commented Dec 20, 2023

Sorry but I think you meant compute/api_xxx.h, so that legacy user code in the above pattern doesn't break? CallFunction is in compute/api.h already.

Well, that also depends if we can make function.h cheap enough :-)

@zanmato1984
Copy link
Collaborator

Sorry but I think you meant compute/api_xxx.h, so that legacy user code in the above pattern doesn't break? CallFunction is in compute/api.h already.

Well, that also depends if we can make function.h cheap enough :-)

I think what we were talking about is to NOT make legacy user code fail compiling, i.e., "unknown CallFunction". To do that, we must keep function.h within compute/api_xxx.h. You were mentioning compute/api.h in your second last comment, but compute/api.h has function.h already even in my PR. So I just wanted to make sure that you were suggesting we should keep function.h in compute/api_xxx.h rather than compute/api.h.

So how does "whether function.h is cheap enough" relate to above? Sorry again I'm not really following so I appreciate any clearance. Thanks @pitrou !

@pitrou
Copy link
Member

pitrou commented Dec 21, 2023

So I just wanted to make sure that you were suggesting we should keep function.h in compute/api_xxx.h rather than compute/api.h.

We can just keep it in compute/api.h IMHO. That said, if we make function.h cheap enough to include, then we can also keep in compute/api_xxx.h.

@zanmato1984
Copy link
Collaborator

So I just wanted to make sure that you were suggesting we should keep function.h in compute/api_xxx.h rather than compute/api.h.

We can just keep it in compute/api.h IMHO. That said, if we make function.h cheap enough to include, then we can also keep in compute/api_xxx.h.

So if we can't make function.h cheap enough, then we don't need to keep it in compute/api_xxx.h? What about the legacy user code that may fail compiling, as illustrated below.

the declarations in function.h such as CallFunction() will not be in api_aggregate.h.

Seems api_scalar.h or other would have the same problem here?

Right, they essentially have the same problem. When user code is like below:

#include <arrow/compute/api_scalar.h>
// ...
arrow::compute::CallFunction("xxx", {array_a, array_b}); // Unknown function
// ...

However note that if we include compute/api.h instead, or include compute/api.h and compute/api_xxx.h both, it will be fine:

#include <arrow/compute/api.h>
// ...
arrow::compute::CallFunction("xxx", {array_a, array_b}); // OK
// ...

Forgive my ignorance about this, I'm not sure if this is an acceptable compatibility change. I appreciate any clearance. Thanks.

@pitrou
Copy link
Member

pitrou commented Dec 21, 2023

Well, we don't really have an established policy on this. If the function still exists and there is a common spelling that works accross versions (here, #include <arrow/compute/api.h>), then I think it's ok.

@js8544 @felipecrv What do you think?

@zanmato1984
Copy link
Collaborator

Well, we don't really have an established policy on this. If the function still exists and there is a common spelling that works accross versions (here, #include <arrow/compute/api.h>), then I think it's ok.

I got your point now. Really appreciate the clearance!

@felipecrv
Copy link
Contributor

a small reduction on build time (1350.7s VS. 1298.6s)

@zanmato1984 That's a 50 seconds diff. I think it's significant. Note that improvements in this area will be beneficial as the number of Acero functions grow — the cost of parsing these headers grows according to O(# of functions). When build issues start showing up in build benchmarks it's often too late.

@felipecrv
Copy link
Contributor

Well, we don't really have an established policy on this. If the function still exists and there is a common spelling that works accross versions (here, #include <arrow/compute/api.h>), then I think it's ok.

@js8544 @felipecrv What do you think?

IIUC, the question is about removing function.h from api.h and including it in api_xxx.h where xxx is a function name. I'm OK with that yes. This kind of thing is not uncommon.

@zanmato1984
Copy link
Collaborator

a small reduction on build time (1350.7s VS. 1298.6s)

@zanmato1984 That's a 50 seconds diff. I think it's significant. Note that improvements in this area will be beneficial as the number of Acero functions grow — the cost of parsing these headers grows according to O(# of functions). When build issues start showing up in build benchmarks it's often too late.

Thanks for replying.

Note that the time diff is not absolute. The ClangBuildAnalyzer result differs from time to time. I guess it depends on the idle-ness of the building machine when doing the experiment. But the time reduction is almost certain, though sometimes more sometimes less. And the inclusion times of the questioning headers are reduced for sure, as shown in the attachments in my other comment.

@zanmato1984
Copy link
Collaborator

IIUC, the question is about removing function.h from api.h and including it in api_xxx.h where xxx is a function name. I'm OK with that yes. This kind of thing is not uncommon.

The question is actually opposite. The header function.h remains in compute/api.h, with and without this PR. The proposed PR removes function.h from api_xxx.h (then includes options.h instead), as proposed in the initial description of this issue. This results in compile failures for user code which includes only compute/api_xxx.h but not compute/api.h, and meanwhile uses CallFunction which is declared in function.h.

@felipecrv would you please help to confirm again? Thanks.

@felipecrv
Copy link
Contributor

I see. I think it's a valid "breaking" change. If you don't include what you use and succeed compilation because of transitive inclusion, there is little guarantees that things will continue building after the header changes. It's a common problem in C and C++, so I wouldn't block header hygiene improvements on that.

If we cared a lot about not breaking header inclusion expectations we could do the cleanup like Microsoft did with windows.h -- you can opt-in for a leaner windows.h by defining WIN32_LEAN_AND_MEAN before inclusion but I don't think codebases using Arrow are as ossified as codebases using windows.h to warrant this.

@zanmato1984
Copy link
Collaborator

zanmato1984 commented Dec 21, 2023

I think we could also investigate other improvements:

  • avoid including kernel.h from function.h
  • avoid including scalar.h from datum.h

Assuming we are going to proceed this work. Here lists the most recent comparison of include analysis. The result is trimmed by listing only arrow's own headers.

Without this PR:

30921 ms: arrow/array/data.h (included 387 times, avg 79 ms), included via:
29603 ms: arrow/compute/function.h (included 219 times, avg 135 ms), included via:
28142 ms: arrow/compute/kernel.h (included 226 times, avg 124 ms), included via:
25468 ms: arrow/array/array_base.h (included 385 times, avg 66 ms), included via:
23882 ms: arrow/type.h (included 519 times, avg 46 ms), included via:
22291 ms: arrow/buffer.h (included 491 times, avg 45 ms), included via:
22160 ms: arrow/status.h (included 576 times, avg 38 ms), included via:
21382 ms: arrow/compute/exec.h (included 235 times, avg 90 ms), included via:
20121 ms: arrow/result.h (included 564 times, avg 35 ms), included via:
19198 ms: arrow/util/string_builder.h (included 577 times, avg 33 ms), included via:
18382 ms: arrow/device.h (included 492 times, avg 37 ms), included via:
17972 ms: arrow/array.h (included 256 times, avg 70 ms), included via:
15382 ms: arrow/acero/options.h (included 105 times, avg 146 ms), included via:
14816 ms: arrow/compute/expression.h (included 238 times, avg 62 ms), included via:
13685 ms: arrow/datum.h (included 256 times, avg 53 ms), included via:
13645 ms: arrow/compute/api_aggregate.h (included 137 times, avg 99 ms), included via:
12846 ms: arrow/testing/gtest_util.h (included 240 times, avg 53 ms), included via:

With this PR:

29645 ms: arrow/array/data.h (included 387 times, avg 76 ms), included via:
24930 ms: arrow/array/array_base.h (included 385 times, avg 64 ms), included via:
23593 ms: arrow/type.h (included 519 times, avg 45 ms), included via:
22147 ms: arrow/status.h (included 576 times, avg 38 ms), included via:
20656 ms: arrow/result.h (included 564 times, avg 36 ms), included via:
18931 ms: arrow/util/string_builder.h (included 577 times, avg 32 ms), included via:
17606 ms: arrow/datum.h (included 255 times, avg 69 ms), included via:
17484 ms: arrow/array.h (included 256 times, avg 68 ms), included via:
17216 ms: arrow/buffer.h (included 491 times, avg 35 ms), included via:
14315 ms: arrow/acero/options.h (included 105 times, avg 136 ms), included via:
13494 ms: arrow/device.h (included 492 times, avg 27 ms), included via:
12561 ms: arrow/testing/gtest_util.h (included 240 times, avg 52 ms), included via:
11080 ms: arrow/compute/exec.h (included 230 times, avg 48 ms), included via:
10445 ms: arrow/acero/exec_plan.h (included 89 times, avg 117 ms), included via:

We can see that:

  • avoid including kernel.h from function.h

The cost of including kernel.h is reduced along with the reduction of including function.h.

  • avoid including scalar.h from datum.h

Including scalar.h in datum.h isn't really expensive, with and without this PR.

Is this result good enough already or should I keep investigating anything more? Thanks.

@felipecrv
Copy link
Contributor

Is this result good enough already or should I keep investigating anything more? Thanks.

We can merge these first and you can continue looking for more improvements if you want. Just create a more focused issue to refer in your PR (by changing the title of the PR) and keep this one open after your PR is merged.

@zanmato1984
Copy link
Collaborator

Is this result good enough already or should I keep investigating anything more? Thanks.

We can merge these first and you can continue looking for more improvements if you want. Just create a more focused issue to refer in your PR (by changing the title of the PR) and keep this one open after your PR is merged.

Thanks. I've created #39357 as you suggested to tackle function.h only and make #39312 to subject to #39357 .

felipecrv pushed a commit that referenced this issue Dec 26, 2023
### Rationale for this change

As proposed in #36246 , by splitting function option structs from `function.h`, we can reduce the including of `function.h`. So that the total build time could be reduced.

The total parser time could be reduced from 722.3s to 709.7s. And the `function.h` along with its transitive inclusion of `kernel.h` don't show up in expensive headers any more.

The detailed analysis result before and after this PR are attached: 
[analyze-before.txt](https://github.com/apache/arrow/files/13756923/analyze-before.txt)
[analyze-after.txt](https://github.com/apache/arrow/files/13756924/analyze-after.txt)

Disclaimer (quote from #36246 (comment)):
> Note that the time diff is not absolute. The ClangBuildAnalyzer result differs from time to time. I guess it depends on the idle-ness of the building machine when doing the experiment. But the time reduction is almost certain, though sometimes more sometimes less. And the inclusion times of the questioning headers are reduced for sure, as shown in the attachments in my other comment.

### What changes are included in this PR?

Move function option structs into own `compute/options.h`, and change including `function.h` to including `options.h` wherever fits.

### Are these changes tested?

Build is testing.

### Are there any user-facing changes?

There could be potential build failures for user code (quote from #36246 (comment)):
> The header function.h remains in compute/api.h, with and without this PR. The proposed PR removes function.h from api_xxx.h (then includes options.h instead), as proposed in the initial description of this issue. This results in compile failures for user code which includes only compute/api_xxx.h but not compute/api.h, and meanwhile uses CallFunction which is declared in function.h.

But I think it's OK as described in #36246 (comment).

* Closes: #39357

Authored-by: zanmato <zanmato1984@gmail.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
clayburn pushed a commit to clayburn/arrow that referenced this issue Jan 23, 2024
### Rationale for this change

As proposed in apache#36246 , by splitting function option structs from `function.h`, we can reduce the including of `function.h`. So that the total build time could be reduced.

The total parser time could be reduced from 722.3s to 709.7s. And the `function.h` along with its transitive inclusion of `kernel.h` don't show up in expensive headers any more.

The detailed analysis result before and after this PR are attached: 
[analyze-before.txt](https://github.com/apache/arrow/files/13756923/analyze-before.txt)
[analyze-after.txt](https://github.com/apache/arrow/files/13756924/analyze-after.txt)

Disclaimer (quote from apache#36246 (comment)):
> Note that the time diff is not absolute. The ClangBuildAnalyzer result differs from time to time. I guess it depends on the idle-ness of the building machine when doing the experiment. But the time reduction is almost certain, though sometimes more sometimes less. And the inclusion times of the questioning headers are reduced for sure, as shown in the attachments in my other comment.

### What changes are included in this PR?

Move function option structs into own `compute/options.h`, and change including `function.h` to including `options.h` wherever fits.

### Are these changes tested?

Build is testing.

### Are there any user-facing changes?

There could be potential build failures for user code (quote from apache#36246 (comment)):
> The header function.h remains in compute/api.h, with and without this PR. The proposed PR removes function.h from api_xxx.h (then includes options.h instead), as proposed in the initial description of this issue. This results in compile failures for user code which includes only compute/api_xxx.h but not compute/api.h, and meanwhile uses CallFunction which is declared in function.h.

But I think it's OK as described in apache#36246 (comment).

* Closes: apache#39357

Authored-by: zanmato <zanmato1984@gmail.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
### Rationale for this change

As proposed in apache#36246 , by splitting function option structs from `function.h`, we can reduce the including of `function.h`. So that the total build time could be reduced.

The total parser time could be reduced from 722.3s to 709.7s. And the `function.h` along with its transitive inclusion of `kernel.h` don't show up in expensive headers any more.

The detailed analysis result before and after this PR are attached: 
[analyze-before.txt](https://github.com/apache/arrow/files/13756923/analyze-before.txt)
[analyze-after.txt](https://github.com/apache/arrow/files/13756924/analyze-after.txt)

Disclaimer (quote from apache#36246 (comment)):
> Note that the time diff is not absolute. The ClangBuildAnalyzer result differs from time to time. I guess it depends on the idle-ness of the building machine when doing the experiment. But the time reduction is almost certain, though sometimes more sometimes less. And the inclusion times of the questioning headers are reduced for sure, as shown in the attachments in my other comment.

### What changes are included in this PR?

Move function option structs into own `compute/options.h`, and change including `function.h` to including `options.h` wherever fits.

### Are these changes tested?

Build is testing.

### Are there any user-facing changes?

There could be potential build failures for user code (quote from apache#36246 (comment)):
> The header function.h remains in compute/api.h, with and without this PR. The proposed PR removes function.h from api_xxx.h (then includes options.h instead), as proposed in the initial description of this issue. This results in compile failures for user code which includes only compute/api_xxx.h but not compute/api.h, and meanwhile uses CallFunction which is declared in function.h.

But I think it's OK as described in apache#36246 (comment).

* Closes: apache#39357

Authored-by: zanmato <zanmato1984@gmail.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants