Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-1629: [C++] Add miscellaneous DCHECKs and minor changes based on infer tool output #1151

Closed
wants to merge 6 commits into from

Conversation

wesm
Copy link
Member

@wesm wesm commented Oct 1, 2017

This was an interesting journey through some esoterica. I went through all the warnings/errors that infer (fbinfer.com) outputs and made changes if it seemed warranted. Some of the checks might be overkill.

See https://gist.github.com/wesm/fc6809e4f4aaef3ecfeb21b8123627bc for a summary of actions on each warning

Most of the errors that Infer wasn't happy about were already addressed by DCHECKs. This was useful to go through all these cases -- in nearly all cases the null references are impossible or would be the result of an error on behalf of the application programmer. For example: we do not do array boundschecking in most cases in production builds, but these boundschecks are included in debug builds to assist with catching bugs caused by improper use by application developers.

As a matter of convention, we should not use error Status to do parameter validation or asserting pre-conditions that are the responsibility of the library user. If parameter validation is required in binding code (e.g. Python), then this validation should happen in the binding layer, not in the core C++ library.

There are some other cases where we have a std::shared_ptr<T> out variable with code like:

RETURN_NOT_OK(Foo(..., &out));
out->Method(...);

Here, infer complains that out could contain a null pointer, but our contract with developers is that if Foo returns successfully that out is non-null.

Interestingly, infer doesn't like some stack variables that are bound in C++11 lambda expressions. I noted these in the gist with LAMBDA.

…elper functions internal / non-exported

Change-Id: I4b91ecce99dde4e87dcf70f026df1af17d2f067f
Change-Id: I2ef789a21cdcaf1573480841af1ac3c45e747133
Change-Id: I75c624b0ca1f0daf3940c4cd0c4ec5497a04285f
Change-Id: Ib30ff3d4784983342eca31405b8916dd4474be03
Change-Id: I4e01f340e12d9aa6d319aa77dc05b9bfe6bfa970
@pcmoritz
Copy link
Contributor

pcmoritz commented Oct 1, 2017

+1 the plasma related changes look good to me! I'm very glad it caught the memory leak.

I'd like to do some modernizations on the Plasma code base, in particular getting rid of fling and using explicit MemoryMappedFile from https://github.com/apache/arrow/blob/84e5e02fbf412c979387b0a53b0ad0c6d5c5e790/cpp/src/arrow/io/file.h instead of shipping file descriptors between processes; this will make it harder to do the cleanups properly since we cannot rely on the reference counting from the OS any more to clean the files up after they are not used by anyone any more, but it will make it possible to have a native Java client and potentially other native bindings. I wonder what your thoughts on this tradeoff are.

Change-Id: Icb0567eef9f2fac5284171c4371e3c3d22b91fc5
@wesm
Copy link
Member Author

wesm commented Oct 2, 2017

+1

@asfgit asfgit closed this in 0c8b861 Oct 2, 2017
@wesm wesm deleted the fix-infer-issues branch October 2, 2017 00:33
wesm added a commit to wesm/arrow that referenced this pull request Oct 3, 2017
…n infer tool output

This was an interesting journey through some esoterica. I went through all the warnings/errors that infer (fbinfer.com) outputs and made changes if it seemed warranted. Some of the checks might be overkill.

See https://gist.github.com/wesm/fc6809e4f4aaef3ecfeb21b8123627bc for a summary of actions on each warning

Most of the errors that Infer wasn't happy about were already addressed by DCHECKs. This was useful to go through all these cases -- in nearly all cases the null references are impossible or would be the result of an error on behalf of the application programmer. For example: we do not do array boundschecking in most cases in production builds, but these boundschecks are included in debug builds to assist with catching bugs caused by improper use by application developers.

As a matter of convention, we should not use error `Status` to do parameter validation or asserting pre-conditions that are the responsibility of the library user. If parameter validation is required in binding code (e.g. Python), then this validation should happen in the binding layer, not in the core C++ library.

There are some other cases where we have a `std::shared_ptr<T>` out variable with code like:

```
RETURN_NOT_OK(Foo(..., &out));
out->Method(...);
```

Here, infer complains that `out` could contain a null pointer, but our contract with developers is that if `Foo` returns successfully that `out` is non-null.

Interestingly, infer doesn't like some stack variables that are bound in C++11 lambda expressions. I noted these in the gist with `LAMBDA`.

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#1151 from wesm/fix-infer-issues and squashes the following commits:

f285be9 [Wes McKinney] Restore code paths for empty chunked arrays for backwards compat
5aa86ce [Wes McKinney] More DCHECK esoterica / tweaks based on infer report
22c5d36 [Wes McKinney] Address a couple more infer warnings
75131a6 [Wes McKinney] Some more minor infer fixes
5ff3e3a [Wes McKinney] Compilation fix
05316ce [Wes McKinney] Fix miscellaneous things that infer does not like. Make some Python helper functions internal / non-exported
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants