Skip to content

Some defensive programming #1890

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from

Conversation

dscho
Copy link
Member

@dscho dscho commented Mar 24, 2025

These patches implement some defensive programming to address complaints some static analyzers might have.

cc: Jeff King peff@peff.net

dscho added 10 commits March 21, 2025 11:44
On the off chance that `lookup_decoration()` cannot find anything, let
`leave_one_treesame_to_parent()` return gracefully instead of crashing.

Pointed out by CodeQL.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
CodeQL points out that `lookup_commit_reference()` can return NULL
values.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
CodeQL points out that `parse_object()` can return NULL values.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
CodeQL points out that `lookup_commit()` can return NULL values.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
CodeQL points out that `lookup_commit()` can return NULL values.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
CodeQL points out that `parse_tree_indirect()` can return NULL values.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
CodeQL points out that `lookup_commit()` can return NULL values.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
CodeQL points out that `branch_get()` can return NULL values.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
CodeQL points out that `branch_get()` can return NULL values.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
CodeQL points out that `lookup_commit_reference()` can return NULL
values.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho self-assigned this Mar 24, 2025
@dscho dscho changed the title Defensive programming Some defensive programming Mar 24, 2025
dscho added 4 commits May 15, 2025 14:43
CodeQL points out that `branch_get()` can return NULL values.

Note that the error path in this instance calls `BUG()`, not `die()`,
for two reasons:

1. The code lives in `libgit.a` and calling `die()` from within those
   library functions is a bad practice that needs to be reduced, rather
   than increased.

2. The `inherit_tracking()` function really should only be called with
   the name of an existing branch, therefore a `NULL` return value would
   indeed constitute a bug in Git's code.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
As pointed out by CodeQL, it could be NULL and we usually check for
that.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On the off-chance that it's NULL...

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
As pointed out by CodeQL, `lookup_commit()` can return NULL.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho force-pushed the defensive-programming branch from 7187789 to 17e9e9a Compare May 15, 2025 12:43
@dscho
Copy link
Member Author

dscho commented May 15, 2025

/submit

Copy link

gitgitgadget bot commented May 15, 2025

Submitted as pull.1890.git.1747313139.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1890/dscho/defensive-programming-v1

To fetch this version to local tag pr-1890/dscho/defensive-programming-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1890/dscho/defensive-programming-v1

@@ -3359,6 +3359,9 @@ static int leave_one_treesame_to_parent(struct rev_info *revs, struct commit *co
struct commit_list *p;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> On the off chance that `lookup_decoration()` cannot find anything, let
> `leave_one_treesame_to_parent()` return gracefully instead of crashing.

But wouldn't it be a BUG("") worthy event for the treesame
decoration not to exist for the commit object in question at this
point of the code?  Is it really defensive to silently pretend that
nothing bad happened and to move forward?

> Pointed out by CodeQL.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  revision.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/revision.c b/revision.c
> index c4390f0938cb..59eae4eb8ba8 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -3359,6 +3359,9 @@ static int leave_one_treesame_to_parent(struct rev_info *revs, struct commit *co
>  	struct commit_list *p;
>  	unsigned n;
>  
> +	if (!ts)
> +		return 0;
> +
>  	for (p = commit->parents, n = 0; p; p = p->next, n++) {
>  		if (ts->treesame[n]) {
>  			if (p->item->object.flags & TMP_MARK) {

@@ -1106,7 +1106,7 @@ static enum get_oid_result get_parent(struct repository *r,
if (ret)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> CodeQL points out that `lookup_commit_reference()` can return NULL
> values.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  object-name.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/object-name.c b/object-name.c
> index 76749fbfe652..ca54dad2f4c8 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -1106,7 +1106,7 @@ static enum get_oid_result get_parent(struct repository *r,
>  	if (ret)
>  		return ret;
>  	commit = lookup_commit_reference(r, &oid);
> -	if (repo_parse_commit(r, commit))
> +	if (!commit || repo_parse_commit(r, commit))
>  		return MISSING_OBJECT;

Most of the time, the check for "ret" we see in the pre-context,
which is a result of get_oid_1(), would prevent an oid that is not a
valid name for a committish to even reach this code, I would think,
but with possible repository corruption, we may fail to "lookup" the
commit, so this is a good correction, I would think.

Thanks.


>  	if (!idx) {
>  		oidcpy(result, &commit->object.oid);

@@ -155,7 +155,7 @@ static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
struct tag *tag = (struct tag *)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> CodeQL points out that `parse_object()` can return NULL values.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  fetch-pack.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 1ed5e11dd568..4cbcb0c14c48 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -155,7 +155,7 @@ static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
>  			struct tag *tag = (struct tag *)
>  				parse_object(the_repository, oid);
>  
> -			if (!tag->tagged)
> +			if (!tag || !tag->tagged)
>  				return NULL;

The "oid" can come from corruptible sources like commit graph file,
so I agree with your analysis that it may name a missing object in a
corrupt repository, leading "tag" being NULL.  Looks correct.


>  			if (mark_tags_complete_and_check_obj_db)
>  				tag->object.flags |= COMPLETE;

@@ -1106,7 +1106,7 @@ static enum get_oid_result get_parent(struct repository *r,
if (ret)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, May 15, 2025 at 12:45:27PM +0000, Johannes Schindelin via GitGitGadget wrote:

> CodeQL points out that `lookup_commit_reference()` can return NULL
> values.
> [...]
>  	commit = lookup_commit_reference(r, &oid);
> -	if (repo_parse_commit(r, commit))
> +	if (!commit || repo_parse_commit(r, commit))
>  		return MISSING_OBJECT;

Sure, but repo_parse_commit() can also handle NULL values. It returns
"-1" in that case. I think CodeQL is not smart enough to know that.

-Peff

Copy link

gitgitgadget bot commented May 15, 2025

User Jeff King <peff@peff.net> has been added to the cc: list.

Copy link

gitgitgadget bot commented May 15, 2025

This patch series was integrated into seen via git@d36a872.

@gitgitgadget gitgitgadget bot added the seen label May 15, 2025
Copy link

gitgitgadget bot commented May 16, 2025

This branch is now known as js/misc-defensive.

Copy link

gitgitgadget bot commented May 16, 2025

This patch series was integrated into seen via git@20c7768.

Copy link

gitgitgadget bot commented May 17, 2025

There was a status update in the "New Topics" section about the branch js/misc-defensive on the Git mailing list:

Assorted changes that please CodeQL.

Comments?
source: <pull.1890.git.1747313139.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented May 19, 2025

This patch series was integrated into seen via git@a9e9d2a.

Copy link

gitgitgadget bot commented May 19, 2025

This patch series was integrated into seen via git@5091f3e.

Copy link

gitgitgadget bot commented May 20, 2025

This patch series was integrated into seen via git@e4fa727.

Copy link

gitgitgadget bot commented May 21, 2025

There was a status update in the "Cooking" section about the branch js/misc-defensive on the Git mailing list:

Assorted changes that please CodeQL.

Comments?
source: <pull.1890.git.1747313139.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented May 21, 2025

This patch series was integrated into seen via git@7a2bbc5.

Copy link

gitgitgadget bot commented May 22, 2025

This patch series was integrated into seen via git@edbfd57.

Copy link

gitgitgadget bot commented May 24, 2025

This patch series was integrated into seen via git@3087553.

Copy link

gitgitgadget bot commented May 24, 2025

There was a status update in the "Cooking" section about the branch js/misc-defensive on the Git mailing list:

Assorted changes that please CodeQL.

Comments?
source: <pull.1890.git.1747313139.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented May 27, 2025

This patch series was integrated into seen via git@d5c8251.

Copy link

gitgitgadget bot commented May 28, 2025

There was a status update in the "Cooking" section about the branch js/misc-defensive on the Git mailing list:

Assorted changes that please CodeQL.

Comments?
source: <pull.1890.git.1747313139.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented May 28, 2025

This patch series was integrated into seen via git@4df18b3.

Copy link

gitgitgadget bot commented May 29, 2025

This patch series was integrated into seen via git@27f8825.

Copy link

gitgitgadget bot commented May 30, 2025

This patch series was integrated into seen via git@2e06034.

Copy link

gitgitgadget bot commented May 31, 2025

There was a status update in the "Cooking" section about the branch js/misc-defensive on the Git mailing list:

Assorted changes that please CodeQL.

Comments?
source: <pull.1890.git.1747313139.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Jun 2, 2025

This patch series was integrated into seen via git@c12b2a0.

Copy link

gitgitgadget bot commented Jun 2, 2025

This patch series was integrated into seen via git@762369a.

Copy link

gitgitgadget bot commented Jun 3, 2025

This patch series was integrated into seen via git@4063091.

Copy link

gitgitgadget bot commented Jun 4, 2025

This patch series was integrated into seen via git@46b4ef2.

Copy link

gitgitgadget bot commented Jun 5, 2025

This patch series was integrated into seen via git@4b9646d.

Copy link

gitgitgadget bot commented Jun 5, 2025

There was a status update in the "Cooking" section about the branch js/misc-defensive on the Git mailing list:

Assorted changes that please CodeQL.

Will discard, as no strong support seems to be there in the thread.
source: <pull.1890.git.1747313139.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Jun 6, 2025

This patch series was integrated into seen via git@4d7b6f6.

Copy link

gitgitgadget bot commented Jun 7, 2025

This patch series was integrated into seen via git@58d3894.

Copy link

gitgitgadget bot commented Jun 7, 2025

There was a status update in the "Cooking" section about the branch js/misc-defensive on the Git mailing list:

Assorted changes that please CodeQL.

Will discard, as no strong support seems to be there in the thread.
source: <pull.1890.git.1747313139.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Jun 8, 2025

This patch series was integrated into seen via git@e4c67d3.

Copy link

gitgitgadget bot commented Jun 9, 2025

There was a status update in the "Discarded" section about the branch js/misc-defensive on the Git mailing list:

Assorted changes that please CodeQL.

Will discard, as no strong support seems to be there in the thread.
source: <pull.1890.git.1747313139.gitgitgadget@gmail.com>

@dscho
Copy link
Member Author

dscho commented Jun 23, 2025

No support from upstream for these changes. I'll just suppress the CodeQL queries that pointed out these issues.

@dscho dscho closed this Jun 23, 2025
@dscho dscho deleted the defensive-programming branch June 23, 2025 10:13
dscho added a commit to microsoft/git that referenced this pull request Jul 7, 2025
This PR adds a [CodeQL](https://codeql.github.com/) workflow to this
repository. CodeQL is touted as a "semantic code analysis engine", i.e.
its intention isn't really a full industry-grade static code analyzer
like Coverity (which we [already run in our CI
builds](https://github.com/microsoft/git/actions/workflows/coverity.yml)).
This shows in the number of queries we have to suppress because they
would result in an unwieldy number of false positives.

Or, one might argue, it shows how convoluted part of Git's logic is,
which may not only confuse CodeQL but also human readers. The latter
might not be a _direct_ security issue, but as Tony Hoare
[said](https://en.wikiquote.org/wiki/C._A._R._Hoare#The_Emperor's_Old_Clothes):

> There are two ways of constructing a software design: One way is to
make it so simple that there are obviously no deficiencies, and the
other way is to make it so complicated that there are no obvious
deficiencies. The first method is far more difficult.

I tried to make the code simpler with
gitgitgadget#1888 and with
gitgitgadget#1890, but the discussion went
astray from my original purpose, instead veering off into the direction
of arguing in complicated lines of reasoning that the current code is
actually correct. Nevertheless, I do include these patches here,
structured via merge commits to make it easier to drop them should too
many merge conflicts in rebases to upstream Git versions make that
advisable.

However, for most of the alerts, the approach I took is exclude the
queries wholesale, when they caused alerts. I could have taken the
approach to suppress the alerts in a fine-grained way (via
`codeql[<query-name>]`) so that true positives aren't suppressed in
addition to false positives. However, I am loathe to take that approach
because a voice inside me fully expects backlash on the Git mailing list
along the lines of :"You're littering the code just to appease CodeQL!".

Theoretically, an alternative exists: To develop modified versions of
those CodeQL queries, e.g. to ignore paths inside the `.git/` directory
in the `cpp/toctou-race-condition` query. However, CodeQL is a language
that is not only (intentionally?) difficult to develop in by virtue of
being declarative instead of imperative, its debugging facilities are
pretty much non-existent.

Given all of the above, the obvious question begs itself: Why bother
with CodeQL at all? The truthful answer is: It is mandated by the Secure
Future Initiative. And who knows, maybe CodeQL will come in handy in the
future? After all, it is a framework more than a solution, and should in
principle be able to help with answering questions like: "Which call
paths into `libgit.a` could result in `die()` being called?".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant