Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load trees lazily when walking commit-graph #4

Closed
wants to merge 436 commits into from

Conversation

derrickstolee
Copy link
Owner

When walking many commits in the commit-graph file, we may not need all of the root trees. Delay this load until a read is requested. Use get_commit_tree() and get_commit_tree_oid() methods for all consumers.

This change gets a 20% speedup on git log --graph --oneline -1000 for the Linux repo.

gitster and others added 30 commits February 27, 2018 10:34
Doc update for a recently added feature.

* bp/fsmonitor:
  fsmonitor: update documentation to remove reference to invalid config settings
Code to unquote single-quoted string (used in the parser for
configuration files, etc.) did not diagnose bogus input correctly
and produced bogus results instead.

* jk/sq-dequote-on-bogus-input:
  sq_dequote: fix extra consumption of source string
Code clean-up.

* jk/test-hashmap-updates:
  test-hashmap: use "unsigned int" for hash storage
  test-hashmap: simplify alloc_test_entry
  test-hashmap: use strbuf_getline rather than fgets
  test-hashmap: use xsnprintf rather than snprintf
  test-hashmap: check allocation computation for overflow
  test-hashmap: use ALLOC_ARRAY rather than bare malloc
Typofix.

* rd/typofix:
  Correct mispellings of ".gitmodule" to ".gitmodules"
  t/: correct obvious typo "detahced"
Refactor the code to binary search starting from a fan-out table
(which is how the packfile is indexed with object names) into a
reusable helper.

* jt/binsearch-with-fanout:
  packfile: refactor hash search with fanout table
  packfile: remove GIT_DEBUG_LOOKUP log statements
"git status" after moving a path in the working tree (hence making
it appear "removed") and then adding with the -N option (hence
making that appear "added") detected it as a rename, but did not
report the  old and new pathnames correctly.

* nd/ita-wt-renames-in-status:
  wt-status.c: handle worktree renames
  wt-status.c: rename rename-related fields in wt_status_change_data
  wt-status.c: catch unhandled diff status codes
  wt-status.c: coding style fix
  Use DIFF_DETECT_RENAME for detect_rename assignments
  t2203: test status output with porcelain v2 format
"git commit --fixup" did not allow "-m<message>" option to be used
at the same time; allow it to annotate resulting commit with more
text.

* ab/commit-m-with-fixup:
  commit: add support for --fixup <commit> -m"<extra message>"
  commit doc: document that -c, -C, -F and --fixup with -m error
When resetting the working tree files recursively, the working tree
of submodules are now also reset to match.

* sb/submodule-update-reset-fix:
  submodule: submodule_move_head omits old argument in forced case
  unpack-trees: oneway_merge to update submodules
  t/lib-submodule-update.sh: fix test ignoring ignored files in submodules
  t/lib-submodule-update.sh: clarify test
Fix for a commented-out code to adjust it to a rather old API change.

* ys/bisect-object-id-missing-conversion-fix:
  bisect: debug: convert struct object to object_id
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Running a test script with '-x' turns on 'set -x' tracing, the output
of which is normally sent to stderr.  This causes a lot of
test failures, because many tests redirect and verify the stderr
of shell functions, most frequently that of 'test_must_fail'.
These issues were worked around somewhat in d88785e (test-lib: set
BASH_XTRACEFD automatically, 2016-05-11), so at least we could
reliably run tests with '-x' tracing under a Bash version supporting
BASH_XTRACEFD, i.e. v4.1 and later.

Futhermore, redirecting the stderr of test helper functions like
'test_must_fail' or 'test_expect_code' is the cause of a different
issue as well.  If these functions detect something unexpected, they
will write their error messages intended to the user to thier stderr.
However, if their stderr is redirected in order to save and verify the
stderr of the tested git command invoked in the function, then the
function's error messages will be redirected as well.  Consequently,
those messages won't reach the user, making the test's verbose output
less useful.

This patch makes it safe to redirect and verify the stderr of those
test helper functions which are meant to run the tested command given
as argument, even when running tests with '-x' and /bin/sh.  This is
achieved through a couple of file descriptor redirections:

  - Duplicate stderr of the tested command executed in the test helper
    function from the function's fd 7 (see next point), to ensure that
    the tested command's error messages go to a different fd than the
    '-x' trace of the commands executed in the function or the
    function's error messages.

  - Duplicate the test helper function's fd 7 from the function's
    original stderr, meaning that, after taking a detour through fd 7,
    the error messages of the tested command do end up on the
    function's original stderr.

  - Duplicate stderr of the test helper function from fd 4, i.e. the
    fd connected to the test script's original stderr and the fd used
    for BASH_XTRACEFD.  This ensures that the '-x' trace of the
    commands executed in the function

      - doesn't go to the function's original stderr, so it won't mess
	with callers who want to save and verify the tested command's
	stderr.

      - does go to the same fd independently from the shell running
        the test script, be it /bin/sh, an older Bash without
        BASH_XTRACEFD, or a more recent Bash already supporting
        BASH_XTRACEFD.

    Furthermore, this also makes sure that the function's error
    messages go to this fd 4, meaning that the user will be able to
    see them even if the function's stderr is redirected in the test.

  - Specify the latter two redirections above in the test helper
    function's definition, so they are performed every time the
    function is invoked, without the need to modify the callsites of
    the function.

Perform these redirections in those test helper functions which can be
expected to have their stderr redirected, i.e. in the functions
'test_must_fail', 'test_might_fail', 'test_expect_code', 'test_env',
'nongit', 'test_terminal' and 'perl'.  Note that 'test_might_fail',
'test_env', and 'nongit' are not involved in any test failures when
running tests with '-x' and /bin/sh.

The other test helper functions are left unchanged, because they
either don't run commands specified as their arguments, or redirecting
their stderr wouldn't make sense, or both.

With this change the number of failures when running the test suite
with '-x' tracing and /bin/sh goes down from 340 failed tests in 43
test scripts to 22 failed tests in 6 scripts (or 23 in 7, if the
system (OSX) uses an older Bash version without BASH_XTRACEFD to run
't9903-bash-prompt.sh').

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous patch resolved most of the test failures caused by
running our test suite with '-x' tracing and /bin/sh, and the
following patches in this series will resolve almost all of the
remaining failures.  Unfortunately, not yet all.

Add means to disable '-x' tracing for individual test scripts by
setting the $test_untraceable variable to a non-empty value in the
test script before sourcing 'test-lib.sh'.  However, since '-x'
tracing is not an issue with recent Bash versions supporting
BASH_XTRACEFD, i.e. v4.1 and later, don't disable tracing when the
test script is run with such a Bash version even when
$test_untraceable is set.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Three tests in 't1507-rev-parse-upstream.sh' fail when the test script
is run with '-x' tracing (and using a shell other than a Bash version
supporting BASH_XTRACEFD).  The reason for those failures is that the
tests check the stderr of the function 'error_message', which includes
the trace of commands executed in that function as well, throwing off
the comparison with the expected output.

Save stderr of 'git rev-parse' only instead of the whole function, so
it remains free from tracing output.

After this change t1507 passes with '-x', even when running with
/bin/sh.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The helper function test_must_be_empty is meant to make sure the
given file is empty, but its implementation is:

	if test -s "$1"
	then
		... not empty, we detected a failure ...
	fi

Surely, the file having non-zero size is a sign that the condition
"the file must be empty" is violated, but it misses the case where
the file does not even exist.  It is an accident waiting to happen
with a buggy test like this:

	git frotz 2>error-message &&
	test_must_be_empty errro-message

that won't get caught until you deliberately break 'git frotz' and
notice why the test does not fail.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove erroneous space between % and < in '% <(<N>)'.

Signed-off-by: Mårten Kongstad <marten.kongstad@gmail.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During abbreviation checks, we navigate to the position within a
pack-index that an OID would be inserted and check surrounding OIDs
for the maximum matching prefix. This position may be beyond the
last position, because the given OID is lexicographically larger
than every OID in the pack. Then nth_packed_object_oid() does not
initialize "oid".

Use the return value of nth_packed_object_oid() to prevent these
errors.

Also the comment about checking near-by objects miscounts the
neighbours.  If we have a hit at "first", we check "first-1" and
"first+1" to make sure we have sufficiently long abbreviation not to
match either.  If we do not have a hit, "first" is the smallest
among the objects that are larger than what we want to name, so we
check that and "first-1" to make sure we have sufficiently long
abbreviation not to match either.  In either case, we only check up
to two near-by objects.

Reported-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After 076aa2c (tempfile: auto-allocate tempfiles on heap, 2017-09-05),
we can have lockfiles on the stack.

One of these functions fails to always roll back the lock. That will be
fixed in the next commit.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we return early, we forget to roll back the lockfile. Do so.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Git::Mail::Address file added in bd869f6 ("send-email: add and
use a local copy of Mail::Address", 2018-01-05) had the executable bit
set. That bit should not be set for *.pm files. It breaks nothing but
it is redundant and confusing as none of the other files have it and
these files are never executed as stand-alone programs.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In Perl the "use strict/warnings" pragmas are lexical, thus there's no
reason to do:

    package Foo;
    use strict;
    package Bar;
    use strict;
    $x = 5;

To satisfy the desire that the undeclared $x variable will be spotted
at compile-time. It's enough to include the first "use strict".

This functionally changes nothing, but makes a subsequent change where
"use warnings" will be added to Git.pm less confusing and less
verbose, since as with "strict" we'll only need to do that at the top
of the file.

Changes code initially added in a6065b5 ("Git.pm: Try to support
ActiveState output pipe", 2006-06-25).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Amend Git.pm to load the "warnings" pragma like the rest of the code
in perl/ in addition to the existing "strict" pragma. This is
considered the bare minimum best practice in Perl.

Ever since this code was introduced in b1edc53 ("Introduce
Git.pm (v4)", 2006-06-24) it's only been using "strict", not
"warnings".

This leaves contrib/buildsystems/Generators/{QMake,VCproj}.pm and
contrib/mw-to-git/Git/Mediawiki.pm without "use warnings". Amending
those would be a sensible follow-up change, but I don't have an easy
way to test those so I'm not changing them.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
9ba95ed (perf/run: update get_var_from_env_or_config() for
subsections) stopped setting a default value for GIT_PERF_REPEAT_COUNT
if no perf config file is present, because get_var_from_env_or_config
returns early in that case.

Fix it by setting the default value after calling this function.  Its
fifth parameter is not used for any other variable, so remove the
associated code.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In mark_parents_uninteresting(), we check for the existence of an
object file to see if we should treat a commit as parsed. The result
is to set the "parsed" bit on the commit.

Modify the condition to only check has_object_file() if the result
would change the parsed bit.

When a local branch is different from its upstream ref, "git status"
will compute ahead/behind counts. This uses paint_down_to_common()
and hits mark_parents_uninteresting(). On a copy of the Linux repo
with a local instance of "master" behind the remote branch
"origin/master" by ~60,000 commits, we find the performance of
"git status" went from 1.42 seconds to 1.32 seconds, for a relative
difference of -7.0%.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Certain information is currently shown with --summary, but when used
in combination with --stat it's a bit hard to read since info of the
same file is in two places (--stat and --summary).

On top of that, commits that add or remove files double the number of
display lines, which could be a lot if you add or remove a lot of
files.

--compact-summary embeds most of --summary back in --stat in the
little space between the file name part and the graph line, e.g. with
commit 0433d53:

   Documentation/merge-config.txt         |  4 +
   builtin/merge.c                        |  2 +
   ...-pull-verify-signatures.sh (new +x) | 81 ++++++++++++++
   t/t7612-merge-verify-signatures.sh     | 45 ++++++++
   4 files changed, 132 insertions(+)

It helps both condensing information and saving some text
space. What's new in diffstat is:

- A new 0644 file is shown as (new)
- A new 0755 file is shown as (new +x)
- A new symlink is shown as (new +l)
- A deleted file is shown as (gone)
- A mode change adding executable bit is shown as (mode +x)
- A mode change removing it is shown as (mode -x)

Note that --compact-summary does not contain all the information
--summary provides. Rewrite percentage is not shown but it could be
added later, like R50% or C20%.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 2071e05 ("t5536: new test of refspec conflicts when
fetching", 2013-10-30), introduced the verify_stderr() function
which was used to verify that certain fatal/warning messages were
issued by a given git command. In addition, verify_stderr() would
filter a specific "fatal: The remote end hung up unexpectedly"
message, which may, or may not, be present (depending on the
relative timing of the git-fetch and git-upload-pack processes).

The verify_stderr() function has seen several modifications, which
has introduced a couple of minor problems. For example, commit
1edbaac ("tests: use test_i18n* functions to suppress false
positives", 2016-06-17) introduced an inappropriate test_i18ngrep
call and commit f096e6e ("fetch: improve the error messages
emitted for conflicting refspecs", 2013-10-30) included an
ineffective invocation of sort at the end of a grep pipeline.

Instead of fixing these minor problems in verify_stderr(), we take
the simpler approach of directly searching the error file, using
test_i18ngrep, for the specific message(s) we expect. (The only
minor downside is that we would not notice any new messages).

Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we return early, or if `active_cache_changed` is false, we forget to
roll back the lockfile.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function originated in builtin/merge.c. It was moved to merge.c in
commit db699a8 (Move try_merge_command and checkout_fast_forward to
libgit.a, 2012-10-26), but was used from sequencer.c even before that.

If a problem occurs, the function returns without rolling back the
lockfile. Teach it to do so.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If `commit_lock_file()` or `hold_lock_file_for_update()` fail, there is
no need to call `rollback_lock_file()` on the lockfile. It doesn't hurt
either, but it does make different callers in this file inconsistent,
which might be confusing.

While at it, remove a trailing '.' from a recurring error message.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The two test checking 'git mmerge-recursive' in an empty worktree in
't3030-merge-recursive.sh' fail when the test script is run with '-x'
tracing (and using a shell other than a Bash version supporting
BASH_XTRACEFD).  The reason for those failures is that the tests check
the emptiness of a subshell's stderr, which includes the trace of
commands executed in that subshell as well, throwing off the emptiness
check.

Note that both subshells execute four git commands each, meaning that
checking the emptiness of the whole subshell implicitly ensures that
not only 'git merge-recursive' but none of the other three commands
outputs anything to their stderr.  Note also that if one of those
commands were to output anything on its stderr, then the current
combined check would not tell us which one of those four commands the
unexpected output came from.

Save the stderr of those four commands only instead of the whole
subshell, so it remains free from tracing output, and save and check
them individually, so they will show us from which command the
unexpected output came from.

After this change t3030 passes with '-x', even when running with
/bin/sh.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Aug 10, 2021
These leaks all happen at the end of cmd_mv, hence don't matter in any
way. But we still fix the easy ones and squash the rest to get us closer
to being able to run tests without leaks.

LSAN output from t0050:

Direct leak of 384 byte(s) in 1 object(s) allocated from:
    #0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0xa8c015 in xrealloc wrapper.c:126:8
    #2 0xa0a7e1 in add_entry string-list.c:44:2
    #3 0xa0a7e1 in string_list_insert string-list.c:58:14
    #4 0x5dac03 in cmd_mv builtin/mv.c:248:4
    #5 0x4ce83e in run_builtin git.c:475:11
    #6 0x4ccafe in handle_builtin git.c:729:3
    #7 0x4cb01c in run_argv git.c:818:4
    #8 0x4cb01c in cmd_main git.c:949:19
    #9 0x6bd9ad in main common-main.c:52:11
    #10 0x7fbfeffc4349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x49a82d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
    #1 0xa8bd09 in do_xmalloc wrapper.c:41:8
    #2 0x5dbc34 in internal_prefix_pathspec builtin/mv.c:32:2
    #3 0x5da575 in cmd_mv builtin/mv.c:158:14
    #4 0x4ce83e in run_builtin git.c:475:11
    #5 0x4ccafe in handle_builtin git.c:729:3
    #6 0x4cb01c in run_argv git.c:818:4
    #7 0x4cb01c in cmd_main git.c:949:19
    #8 0x6bd9ad in main common-main.c:52:11
    #9 0x7fbfeffc4349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x49a82d in malloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
    #1 0xa8bd09 in do_xmalloc wrapper.c:41:8
    #2 0x5dbc34 in internal_prefix_pathspec builtin/mv.c:32:2
    #3 0x5da4e4 in cmd_mv builtin/mv.c:148:11
    #4 0x4ce83e in run_builtin git.c:475:11
    #5 0x4ccafe in handle_builtin git.c:729:3
    #6 0x4cb01c in run_argv git.c:818:4
    #7 0x4cb01c in cmd_main git.c:949:19
    #8 0x6bd9ad in main common-main.c:52:11
    #9 0x7fbfeffc4349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Direct leak of 8 byte(s) in 1 object(s) allocated from:
    #0 0x49a9a2 in calloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
    #1 0xa8c119 in xcalloc wrapper.c:140:8
    #2 0x5da585 in cmd_mv builtin/mv.c:159:22
    #3 0x4ce83e in run_builtin git.c:475:11
    #4 0x4ccafe in handle_builtin git.c:729:3
    #5 0x4cb01c in run_argv git.c:818:4
    #6 0x4cb01c in cmd_main git.c:949:19
    #7 0x6bd9ad in main common-main.c:52:11
    #8 0x7fbfeffc4349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Direct leak of 4 byte(s) in 1 object(s) allocated from:
    #0 0x49a9a2 in calloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3
    #1 0xa8c119 in xcalloc wrapper.c:140:8
    #2 0x5da4f8 in cmd_mv builtin/mv.c:149:10
    #3 0x4ce83e in run_builtin git.c:475:11
    #4 0x4ccafe in handle_builtin git.c:729:3
    #5 0x4cb01c in run_argv git.c:818:4
    #6 0x4cb01c in cmd_main git.c:949:19
    #7 0x6bd9ad in main common-main.c:52:11
    #8 0x7fbfeffc4349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Indirect leak of 65 byte(s) in 1 object(s) allocated from:
    #0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0xa8c015 in xrealloc wrapper.c:126:8
    #2 0xa00226 in strbuf_grow strbuf.c:98:2
    #3 0xa00226 in strbuf_vaddf strbuf.c:394:3
    #4 0xa065c7 in xstrvfmt strbuf.c:981:2
    #5 0xa065c7 in xstrfmt strbuf.c:991:8
    #6 0x9e7ce7 in prefix_path_gently setup.c:115:15
    #7 0x9e7fa6 in prefix_path setup.c:128:12
    #8 0x5dbdbf in internal_prefix_pathspec builtin/mv.c:55:23
    #9 0x5da575 in cmd_mv builtin/mv.c:158:14
    #10 0x4ce83e in run_builtin git.c:475:11
    #11 0x4ccafe in handle_builtin git.c:729:3
    #12 0x4cb01c in run_argv git.c:818:4
    #13 0x4cb01c in cmd_main git.c:949:19
    #14 0x6bd9ad in main common-main.c:52:11
    #15 0x7fbfeffc4349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Indirect leak of 65 byte(s) in 1 object(s) allocated from:
    #0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0xa8c015 in xrealloc wrapper.c:126:8
    #2 0xa00226 in strbuf_grow strbuf.c:98:2
    #3 0xa00226 in strbuf_vaddf strbuf.c:394:3
    #4 0xa065c7 in xstrvfmt strbuf.c:981:2
    #5 0xa065c7 in xstrfmt strbuf.c:991:8
    #6 0x9e7ce7 in prefix_path_gently setup.c:115:15
    #7 0x9e7fa6 in prefix_path setup.c:128:12
    #8 0x5dbdbf in internal_prefix_pathspec builtin/mv.c:55:23
    #9 0x5da4e4 in cmd_mv builtin/mv.c:148:11
    #10 0x4ce83e in run_builtin git.c:475:11
    #11 0x4ccafe in handle_builtin git.c:729:3
    #12 0x4cb01c in run_argv git.c:818:4
    #13 0x4cb01c in cmd_main git.c:949:19
    #14 0x6bd9ad in main common-main.c:52:11
    #15 0x7fbfeffc4349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 558 byte(s) leaked in 7 allocation(s).

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Aug 10, 2021
merge_name() calls dwim_ref(), which allocates a new string into
found_ref. Therefore add a free() to avoid leaking found_ref.

LSAN output from t0021:

Direct leak of 16 byte(s) in 1 object(s) allocated from:
    #0 0x486804 in strdup ../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
    #1 0xa8beb8 in xstrdup wrapper.c:29:14
    #2 0x954054 in expand_ref refs.c:671:12
    #3 0x953cb6 in repo_dwim_ref refs.c:644:22
    #4 0x5d3759 in dwim_ref refs.h:162:9
    #5 0x5d3759 in merge_name builtin/merge.c:517:6
    #6 0x5d3759 in collect_parents builtin/merge.c:1214:5
    #7 0x5cf60d in cmd_merge builtin/merge.c:1458:16
    #8 0x4ce83e in run_builtin git.c:475:11
    #9 0x4ccafe in handle_builtin git.c:729:3
    #10 0x4cb01c in run_argv git.c:818:4
    #11 0x4cb01c in cmd_main git.c:949:19
    #12 0x6bdbfd in main common-main.c:52:11
    #13 0x7f0430502349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 16 byte(s) leaked in 1 allocation(s).

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Aug 10, 2021
- cmd_rebase populates rebase_options.strategy with newly allocated
  strings, hence we need to free those strings at the end of cmd_rebase
  to avoid a leak.
- In some cases: get_replay_opts() is called, which prepares replay_opts
  using data from rebase_options. We used to simply copy the pointer
  from rebase_options.strategy,  however that would now result in a
  double-free because sequencer_remove_state() is eventually used to
  free replay_opts.strategy. To avoid this we xstrdup() strategy when
  adding it to replay_opts.

The original leak happens because we always populate
rebase_options.strategy, but we don't always enter the path that calls
get_replay_opts() and later sequencer_remove_state() - in  other words
we'd always allocate a new string into rebase_options.strategy but
only sometimes did we free it. We now make sure that rebase_options
and replay_opts both own their own copies of strategy, and each copy
is free'd independently.

This was first seen when running t0021 with LSAN, but t2012 helped catch
the fact that we can't just free(options.strategy) at the end of
cmd_rebase (as that can cause a double-free). LSAN output from t0021:

LSAN output from t0021:

Direct leak of 4 byte(s) in 1 object(s) allocated from:
    #0 0x486804 in strdup ../projects/compiler-rt/lib/asan/asan_interceptors.cpp:452:3
    #1 0xa71eb8 in xstrdup wrapper.c:29:14
    #2 0x61b1cc in cmd_rebase builtin/rebase.c:1779:22
    #3 0x4ce83e in run_builtin git.c:475:11
    #4 0x4ccafe in handle_builtin git.c:729:3
    #5 0x4cb01c in run_argv git.c:818:4
    #6 0x4cb01c in cmd_main git.c:949:19
    #7 0x6b3fad in main common-main.c:52:11
    #8 0x7f267b512349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 4 byte(s) leaked in 1 allocation(s).

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Aug 10, 2021
setup_unpack_trees_porcelain() populates various fields on
unpack_tree_opts, we need to call clear_unpack_trees_porcelain() to
avoid leaking them. Specifically, we used to leak
unpack_tree_opts.msgs_to_free.

We have to do this in leave_reset_head because there are multiple
scenarios where unpack_tree_opts has already been configured, followed
by a 'goto leave_reset_head'. But we can also 'goto leave_reset_head'
prior to having initialised unpack_tree_opts via memset(..., 0, ...).
Therefore we also move unpack_tree_opts initialisation to the start of
reset_head(), and convert it to use brace initialisation - which
guarantees that we can never clear an uninitialised unpack_tree_opts.
clear_unpack_tree_opts() is always safe to call as long as
unpack_tree_opts is at least zero-initialised, i.e. it does not depend
on a previous call to setup_unpack_trees_porcelain().

LSAN output from t0021:

Direct leak of 192 byte(s) in 1 object(s) allocated from:
    #0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0xa721e5 in xrealloc wrapper.c:126:8
    #2 0x9f7861 in strvec_push_nodup strvec.c:19:2
    #3 0x9f7861 in strvec_pushf strvec.c:39:2
    #4 0xa43e14 in setup_unpack_trees_porcelain unpack-trees.c:129:3
    #5 0x97e011 in reset_head reset.c:53:2
    #6 0x61dfa5 in cmd_rebase builtin/rebase.c:1991:9
    #7 0x4ce83e in run_builtin git.c:475:11
    #8 0x4ccafe in handle_builtin git.c:729:3
    #9 0x4cb01c in run_argv git.c:818:4
    #10 0x4cb01c in cmd_main git.c:949:19
    #11 0x6b3f3d in main common-main.c:52:11
    #12 0x7fa8addf3349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Indirect leak of 147 byte(s) in 1 object(s) allocated from:
    #0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0xa721e5 in xrealloc wrapper.c:126:8
    #2 0x9e8d54 in strbuf_grow strbuf.c:98:2
    #3 0x9e8d54 in strbuf_vaddf strbuf.c:401:3
    #4 0x9f7774 in strvec_pushf strvec.c:36:2
    #5 0xa43e14 in setup_unpack_trees_porcelain unpack-trees.c:129:3
    #6 0x97e011 in reset_head reset.c:53:2
    #7 0x61dfa5 in cmd_rebase builtin/rebase.c:1991:9
    #8 0x4ce83e in run_builtin git.c:475:11
    #9 0x4ccafe in handle_builtin git.c:729:3
    #10 0x4cb01c in run_argv git.c:818:4
    #11 0x4cb01c in cmd_main git.c:949:19
    #12 0x6b3f3d in main common-main.c:52:11
    #13 0x7fa8addf3349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Indirect leak of 134 byte(s) in 1 object(s) allocated from:
    #0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0xa721e5 in xrealloc wrapper.c:126:8
    #2 0x9e8d54 in strbuf_grow strbuf.c:98:2
    #3 0x9e8d54 in strbuf_vaddf strbuf.c:401:3
    #4 0x9f7774 in strvec_pushf strvec.c:36:2
    #5 0xa43fe4 in setup_unpack_trees_porcelain unpack-trees.c:168:3
    #6 0x97e011 in reset_head reset.c:53:2
    #7 0x61dfa5 in cmd_rebase builtin/rebase.c:1991:9
    #8 0x4ce83e in run_builtin git.c:475:11
    #9 0x4ccafe in handle_builtin git.c:729:3
    #10 0x4cb01c in run_argv git.c:818:4
    #11 0x4cb01c in cmd_main git.c:949:19
    #12 0x6b3f3d in main common-main.c:52:11
    #13 0x7fa8addf3349 in __libc_start_main (/lib64/libc.so.6+0x24349)

Indirect leak of 130 byte(s) in 1 object(s) allocated from:
    #0 0x49ab49 in realloc ../projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3
    #1 0xa721e5 in xrealloc wrapper.c:126:8
    #2 0x9e8d54 in strbuf_grow strbuf.c:98:2
    #3 0x9e8d54 in strbuf_vaddf strbuf.c:401:3
    #4 0x9f7774 in strvec_pushf strvec.c:36:2
    #5 0xa43f20 in setup_unpack_trees_porcelain unpack-trees.c:150:3
    #6 0x97e011 in reset_head reset.c:53:2
    #7 0x61dfa5 in cmd_rebase builtin/rebase.c:1991:9
    #8 0x4ce83e in run_builtin git.c:475:11
    #9 0x4ccafe in handle_builtin git.c:729:3
    #10 0x4cb01c in run_argv git.c:818:4
    #11 0x4cb01c in cmd_main git.c:949:19
    #12 0x6b3f3d in main common-main.c:52:11
    #13 0x7fa8addf3349 in __libc_start_main (/lib64/libc.so.6+0x24349)

SUMMARY: AddressSanitizer: 603 byte(s) leaked in 4 allocation(s).

Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Oct 7, 2021
In a sparse index it is possible for the tree that is being verified
to be freed while it is being verified. This happens when the index is
sparse but the cache tree is not and index_name_pos() looks up a path
from the cache tree that is a descendant of a sparse index entry. That
triggers a call to ensure_full_index() which frees the cache tree that
is being verified.  Carrying on trying to verify the tree after this
results in a use-after-free bug. Instead restart the verification if a
sparse index is converted to a full index. This bug is triggered by a
call to reset_head() in "git rebase --apply". Thanks to René Scharfe
and Derick Stolee for their help analyzing the problem.

==74345==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000001b20 at pc 0x557cbe82d3a2 bp 0x7ffdfee08090 sp 0x7ffdfee08080
READ of size 4 at 0x606000001b20 thread T0
    #0 0x557cbe82d3a1 in verify_one /home/phil/src/git/cache-tree.c:863
    #1 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #2 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #3 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #4 0x557cbe830a2b in cache_tree_verify /home/phil/src/git/cache-tree.c:910
    #5 0x557cbea53741 in write_locked_index /home/phil/src/git/read-cache.c:3250
    #6 0x557cbeab7fdd in reset_head /home/phil/src/git/reset.c:87
    #7 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
    #8 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
    #9 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
    #10 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
    #11 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
    #12 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
    #13 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
    #14 0x557cbe5bcb8d in _start (/home/phil/src/git/git+0x1b9b8d)

0x606000001b20 is located 0 bytes inside of 56-byte region [0x606000001b20,0x606000001b58)
freed by thread T0 here:
    #0 0x7fdd4bacff19 in __interceptor_free /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:127
    #1 0x557cbe82af60 in cache_tree_free /home/phil/src/git/cache-tree.c:35
    #2 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
    #3 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
    #4 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
    #5 0x557cbeb2557a in ensure_full_index /home/phil/src/git/sparse-index.c:310
    #6 0x557cbea45c4a in index_name_stage_pos /home/phil/src/git/read-cache.c:588
    #7 0x557cbe82ce37 in verify_one /home/phil/src/git/cache-tree.c:850
    #8 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #9 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #10 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #11 0x557cbe830a2b in cache_tree_verify /home/phil/src/git/cache-tree.c:910
    #12 0x557cbea53741 in write_locked_index /home/phil/src/git/read-cache.c:3250
    #13 0x557cbeab7fdd in reset_head /home/phil/src/git/reset.c:87
    #14 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
    #15 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
    #16 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
    #17 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
    #18 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
    #19 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
    #20 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)

previously allocated by thread T0 here:
    #0 0x7fdd4bad0459 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x557cbebc1807 in xcalloc /home/phil/src/git/wrapper.c:140
    #2 0x557cbe82b7d8 in cache_tree /home/phil/src/git/cache-tree.c:17
    #3 0x557cbe82b7d8 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:763
    #4 0x557cbe82b837 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:764
    #5 0x557cbe82b837 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:764
    #6 0x557cbe8304e1 in prime_cache_tree /home/phil/src/git/cache-tree.c:779
    #7 0x557cbeab7fa7 in reset_head /home/phil/src/git/reset.c:85
    #8 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
    #9 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
    #10 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
    #11 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
    #12 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
    #13 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
    #14 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee pushed a commit that referenced this pull request Oct 7, 2021
In a sparse index it is possible for the tree that is being verified
to be freed while it is being verified. This happens when the index is
sparse but the cache tree is not and index_name_pos() looks up a path
from the cache tree that is a descendant of a sparse index entry. That
triggers a call to ensure_full_index() which frees the cache tree that
is being verified.  Carrying on trying to verify the tree after this
results in a use-after-free bug. Instead restart the verification if a
sparse index is converted to a full index. This bug is triggered by a
call to reset_head() in "git rebase --apply". Thanks to René Scharfe
and Derrick Stolee for their help analyzing the problem.

==74345==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000001b20 at pc 0x557cbe82d3a2 bp 0x7ffdfee08090 sp 0x7ffdfee08080
READ of size 4 at 0x606000001b20 thread T0
    #0 0x557cbe82d3a1 in verify_one /home/phil/src/git/cache-tree.c:863
    #1 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #2 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #3 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #4 0x557cbe830a2b in cache_tree_verify /home/phil/src/git/cache-tree.c:910
    #5 0x557cbea53741 in write_locked_index /home/phil/src/git/read-cache.c:3250
    #6 0x557cbeab7fdd in reset_head /home/phil/src/git/reset.c:87
    #7 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
    #8 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
    #9 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
    #10 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
    #11 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
    #12 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
    #13 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
    #14 0x557cbe5bcb8d in _start (/home/phil/src/git/git+0x1b9b8d)

0x606000001b20 is located 0 bytes inside of 56-byte region [0x606000001b20,0x606000001b58)
freed by thread T0 here:
    #0 0x7fdd4bacff19 in __interceptor_free /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:127
    #1 0x557cbe82af60 in cache_tree_free /home/phil/src/git/cache-tree.c:35
    #2 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
    #3 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
    #4 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
    #5 0x557cbeb2557a in ensure_full_index /home/phil/src/git/sparse-index.c:310
    #6 0x557cbea45c4a in index_name_stage_pos /home/phil/src/git/read-cache.c:588
    #7 0x557cbe82ce37 in verify_one /home/phil/src/git/cache-tree.c:850
    #8 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #9 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #10 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #11 0x557cbe830a2b in cache_tree_verify /home/phil/src/git/cache-tree.c:910
    #12 0x557cbea53741 in write_locked_index /home/phil/src/git/read-cache.c:3250
    #13 0x557cbeab7fdd in reset_head /home/phil/src/git/reset.c:87
    #14 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
    #15 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
    #16 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
    #17 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
    #18 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
    #19 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
    #20 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)

previously allocated by thread T0 here:
    #0 0x7fdd4bad0459 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x557cbebc1807 in xcalloc /home/phil/src/git/wrapper.c:140
    #2 0x557cbe82b7d8 in cache_tree /home/phil/src/git/cache-tree.c:17
    #3 0x557cbe82b7d8 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:763
    #4 0x557cbe82b837 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:764
    #5 0x557cbe82b837 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:764
    #6 0x557cbe8304e1 in prime_cache_tree /home/phil/src/git/cache-tree.c:779
    #7 0x557cbeab7fa7 in reset_head /home/phil/src/git/reset.c:85
    #8 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
    #9 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
    #10 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
    #11 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
    #12 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
    #13 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
    #14 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee pushed a commit that referenced this pull request Oct 26, 2021
In a sparse index it is possible for the tree that is being verified
to be freed while it is being verified. This happens when the index is
sparse but the cache tree is not and index_name_pos() looks up a path
from the cache tree that is a descendant of a sparse index entry. That
triggers a call to ensure_full_index() which frees the cache tree that
is being verified.  Carrying on trying to verify the tree after this
results in a use-after-free bug. Instead restart the verification if a
sparse index is converted to a full index. This bug is triggered by a
call to reset_head() in "git rebase --apply". Thanks to René Scharfe
and Derrick Stolee for their help analyzing the problem.

==74345==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000001b20 at pc 0x557cbe82d3a2 bp 0x7ffdfee08090 sp 0x7ffdfee08080
READ of size 4 at 0x606000001b20 thread T0
    #0 0x557cbe82d3a1 in verify_one /home/phil/src/git/cache-tree.c:863
    #1 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #2 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #3 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #4 0x557cbe830a2b in cache_tree_verify /home/phil/src/git/cache-tree.c:910
    #5 0x557cbea53741 in write_locked_index /home/phil/src/git/read-cache.c:3250
    #6 0x557cbeab7fdd in reset_head /home/phil/src/git/reset.c:87
    #7 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
    #8 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
    #9 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
    #10 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
    #11 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
    #12 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
    #13 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
    #14 0x557cbe5bcb8d in _start (/home/phil/src/git/git+0x1b9b8d)

0x606000001b20 is located 0 bytes inside of 56-byte region [0x606000001b20,0x606000001b58)
freed by thread T0 here:
    #0 0x7fdd4bacff19 in __interceptor_free /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:127
    #1 0x557cbe82af60 in cache_tree_free /home/phil/src/git/cache-tree.c:35
    #2 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
    #3 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
    #4 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
    #5 0x557cbeb2557a in ensure_full_index /home/phil/src/git/sparse-index.c:310
    #6 0x557cbea45c4a in index_name_stage_pos /home/phil/src/git/read-cache.c:588
    #7 0x557cbe82ce37 in verify_one /home/phil/src/git/cache-tree.c:850
    #8 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #9 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #10 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
    #11 0x557cbe830a2b in cache_tree_verify /home/phil/src/git/cache-tree.c:910
    #12 0x557cbea53741 in write_locked_index /home/phil/src/git/read-cache.c:3250
    #13 0x557cbeab7fdd in reset_head /home/phil/src/git/reset.c:87
    #14 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
    #15 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
    #16 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
    #17 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
    #18 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
    #19 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
    #20 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)

previously allocated by thread T0 here:
    #0 0x7fdd4bad0459 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x557cbebc1807 in xcalloc /home/phil/src/git/wrapper.c:140
    #2 0x557cbe82b7d8 in cache_tree /home/phil/src/git/cache-tree.c:17
    #3 0x557cbe82b7d8 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:763
    #4 0x557cbe82b837 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:764
    #5 0x557cbe82b837 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:764
    #6 0x557cbe8304e1 in prime_cache_tree /home/phil/src/git/cache-tree.c:779
    #7 0x557cbeab7fa7 in reset_head /home/phil/src/git/reset.c:85
    #8 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
    #9 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
    #10 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
    #11 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
    #12 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
    #13 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
    #14 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee added a commit that referenced this pull request Nov 2, 2021
When I was playing around with trace2 data and creating flamegraphs, I tried a `git fetch` call to see how the `git-remote-https` command would show up. What I didn't expect was an `ensure_full_index()` region!

It turns out that `git fetch` and `git pull` need to check the index for a `.gitmodules` file to see if it should recurse into any submodules. Here is the stack trace from a debugger:

```
#0  ensure_full_index (istate=0x555555ac1c80 <the_index>) at sparse-index.c:404
#1  0x000055555571a979 in do_read_index (istate=istate@entry=0x555555ac1c80 <the_index>, path=path@entry=0x555555ad7b90 ".git/index", must_exist=must_exist@entry=0) at read-cache.c:2386
#2  0x000055555571eb7d in do_read_index (must_exist=0, path=0x555555ad7b90 ".git/index", istate=0x555555ac1c80 <the_index>) at hash.h:244
#3  read_index_from (istate=0x555555ac1c80 <the_index>, path=0x555555ad7b90 ".git/index", gitdir=0x555555ad7b30 ".git") at read-cache.c:2426
#4  0x000055555573f4c2 in repo_read_index (repo=repo@entry=0x555555ac1da0 <the_repo>) at repository.c:286
#5  0x00005555556f14d0 in get_oid_with_context_1 (repo=repo@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", flags=flags@entry=0, prefix=prefix@entry=0x0, 
    oid=oid@entry=0x7fffffffdb00, oc=oc@entry=0x7fffffffda70) at object-name.c:1850
#6  0x00005555556f1f53 in get_oid_with_context (oc=0x7fffffffda70, oid=0x7fffffffdb00, flags=0, str=0x55555582c022 ":.gitmodules", repo=0x555555ac1da0 <the_repo>) at object-name.c:1947
#7  repo_get_oid (r=r@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", oid=oid@entry=0x7fffffffdb00) at object-name.c:1603
#8  0x000055555577330f in config_from_gitmodules (fn=fn@entry=0x555555773460 <gitmodules_fetch_config>, repo=0x555555ac1da0 <the_repo>, data=data@entry=0x7fffffffdb60) at submodule-config.c:650
#9  0x000055555577462d in config_from_gitmodules (data=0x7fffffffdb60, repo=<optimized out>, fn=0x555555773460 <gitmodules_fetch_config>) at submodule-config.c:638
#10 fetch_config_from_gitmodules (max_children=<optimized out>, recurse_submodules=<optimized out>) at submodule-config.c:800
#11 0x00005555555b9e41 in cmd_fetch (argc=1, argv=0x7fffffffe090, prefix=0x0) at builtin/fetch.c:1999
#12 0x0000555555573ff6 in run_builtin (argv=<optimized out>, argc=<optimized out>, p=<optimized out>) at git.c:528
#13 handle_builtin (argc=<optimized out>, argv=<optimized out>) at git.c:785
#14 0x000055555557528c in run_argv (argv=0x7fffffffddf0, argcp=0x7fffffffddfc) at git.c:857
#15 cmd_main (argc=<optimized out>, argv=<optimized out>) at git.c:993
#16 0x0000555555573ac8 in main (argc=3, argv=0x7fffffffe088) at common-main.c:52
```

The operations these commands use are guarded by items such as `index_name_pos()` and others. Since the `.gitmodules` file is always at root, we would not need to expand, anyway.
derrickstolee added a commit that referenced this pull request Nov 8, 2021
When I was playing around with trace2 data and creating flamegraphs, I tried a `git fetch` call to see how the `git-remote-https` command would show up. What I didn't expect was an `ensure_full_index()` region!

It turns out that `git fetch` and `git pull` need to check the index for a `.gitmodules` file to see if it should recurse into any submodules. Here is the stack trace from a debugger:

```
#0  ensure_full_index (istate=0x555555ac1c80 <the_index>) at sparse-index.c:404
#1  0x000055555571a979 in do_read_index (istate=istate@entry=0x555555ac1c80 <the_index>, path=path@entry=0x555555ad7b90 ".git/index", must_exist=must_exist@entry=0) at read-cache.c:2386
#2  0x000055555571eb7d in do_read_index (must_exist=0, path=0x555555ad7b90 ".git/index", istate=0x555555ac1c80 <the_index>) at hash.h:244
#3  read_index_from (istate=0x555555ac1c80 <the_index>, path=0x555555ad7b90 ".git/index", gitdir=0x555555ad7b30 ".git") at read-cache.c:2426
#4  0x000055555573f4c2 in repo_read_index (repo=repo@entry=0x555555ac1da0 <the_repo>) at repository.c:286
#5  0x00005555556f14d0 in get_oid_with_context_1 (repo=repo@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", flags=flags@entry=0, prefix=prefix@entry=0x0, 
    oid=oid@entry=0x7fffffffdb00, oc=oc@entry=0x7fffffffda70) at object-name.c:1850
#6  0x00005555556f1f53 in get_oid_with_context (oc=0x7fffffffda70, oid=0x7fffffffdb00, flags=0, str=0x55555582c022 ":.gitmodules", repo=0x555555ac1da0 <the_repo>) at object-name.c:1947
#7  repo_get_oid (r=r@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", oid=oid@entry=0x7fffffffdb00) at object-name.c:1603
#8  0x000055555577330f in config_from_gitmodules (fn=fn@entry=0x555555773460 <gitmodules_fetch_config>, repo=0x555555ac1da0 <the_repo>, data=data@entry=0x7fffffffdb60) at submodule-config.c:650
#9  0x000055555577462d in config_from_gitmodules (data=0x7fffffffdb60, repo=<optimized out>, fn=0x555555773460 <gitmodules_fetch_config>) at submodule-config.c:638
#10 fetch_config_from_gitmodules (max_children=<optimized out>, recurse_submodules=<optimized out>) at submodule-config.c:800
#11 0x00005555555b9e41 in cmd_fetch (argc=1, argv=0x7fffffffe090, prefix=0x0) at builtin/fetch.c:1999
#12 0x0000555555573ff6 in run_builtin (argv=<optimized out>, argc=<optimized out>, p=<optimized out>) at git.c:528
#13 handle_builtin (argc=<optimized out>, argv=<optimized out>) at git.c:785
#14 0x000055555557528c in run_argv (argv=0x7fffffffddf0, argcp=0x7fffffffddfc) at git.c:857
#15 cmd_main (argc=<optimized out>, argv=<optimized out>) at git.c:993
#16 0x0000555555573ac8 in main (argc=3, argv=0x7fffffffe088) at common-main.c:52
```

The operations these commands use are guarded by items such as `index_name_pos()` and others. Since the `.gitmodules` file is always at root, we would not need to expand, anyway.
derrickstolee added a commit that referenced this pull request Nov 10, 2021
When I was playing around with trace2 data and creating flamegraphs, I tried a `git fetch` call to see how the `git-remote-https` command would show up. What I didn't expect was an `ensure_full_index()` region!

It turns out that `git fetch` and `git pull` need to check the index for a `.gitmodules` file to see if it should recurse into any submodules. Here is the stack trace from a debugger:

```
#0  ensure_full_index (istate=0x555555ac1c80 <the_index>) at sparse-index.c:404
#1  0x000055555571a979 in do_read_index (istate=istate@entry=0x555555ac1c80 <the_index>, path=path@entry=0x555555ad7b90 ".git/index", must_exist=must_exist@entry=0) at read-cache.c:2386
#2  0x000055555571eb7d in do_read_index (must_exist=0, path=0x555555ad7b90 ".git/index", istate=0x555555ac1c80 <the_index>) at hash.h:244
#3  read_index_from (istate=0x555555ac1c80 <the_index>, path=0x555555ad7b90 ".git/index", gitdir=0x555555ad7b30 ".git") at read-cache.c:2426
#4  0x000055555573f4c2 in repo_read_index (repo=repo@entry=0x555555ac1da0 <the_repo>) at repository.c:286
#5  0x00005555556f14d0 in get_oid_with_context_1 (repo=repo@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", flags=flags@entry=0, prefix=prefix@entry=0x0, 
    oid=oid@entry=0x7fffffffdb00, oc=oc@entry=0x7fffffffda70) at object-name.c:1850
#6  0x00005555556f1f53 in get_oid_with_context (oc=0x7fffffffda70, oid=0x7fffffffdb00, flags=0, str=0x55555582c022 ":.gitmodules", repo=0x555555ac1da0 <the_repo>) at object-name.c:1947
#7  repo_get_oid (r=r@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", oid=oid@entry=0x7fffffffdb00) at object-name.c:1603
#8  0x000055555577330f in config_from_gitmodules (fn=fn@entry=0x555555773460 <gitmodules_fetch_config>, repo=0x555555ac1da0 <the_repo>, data=data@entry=0x7fffffffdb60) at submodule-config.c:650
#9  0x000055555577462d in config_from_gitmodules (data=0x7fffffffdb60, repo=<optimized out>, fn=0x555555773460 <gitmodules_fetch_config>) at submodule-config.c:638
#10 fetch_config_from_gitmodules (max_children=<optimized out>, recurse_submodules=<optimized out>) at submodule-config.c:800
#11 0x00005555555b9e41 in cmd_fetch (argc=1, argv=0x7fffffffe090, prefix=0x0) at builtin/fetch.c:1999
#12 0x0000555555573ff6 in run_builtin (argv=<optimized out>, argc=<optimized out>, p=<optimized out>) at git.c:528
#13 handle_builtin (argc=<optimized out>, argv=<optimized out>) at git.c:785
#14 0x000055555557528c in run_argv (argv=0x7fffffffddf0, argcp=0x7fffffffddfc) at git.c:857
#15 cmd_main (argc=<optimized out>, argv=<optimized out>) at git.c:993
#16 0x0000555555573ac8 in main (argc=3, argv=0x7fffffffe088) at common-main.c:52
```

The operations these commands use are guarded by items such as `index_name_pos()` and others. Since the `.gitmodules` file is always at root, we would not need to expand, anyway.
derrickstolee added a commit that referenced this pull request Nov 16, 2021
When I was playing around with trace2 data and creating flamegraphs, I tried a `git fetch` call to see how the `git-remote-https` command would show up. What I didn't expect was an `ensure_full_index()` region!

It turns out that `git fetch` and `git pull` need to check the index for a `.gitmodules` file to see if it should recurse into any submodules. Here is the stack trace from a debugger:

```
#0  ensure_full_index (istate=0x555555ac1c80 <the_index>) at sparse-index.c:404
#1  0x000055555571a979 in do_read_index (istate=istate@entry=0x555555ac1c80 <the_index>, path=path@entry=0x555555ad7b90 ".git/index", must_exist=must_exist@entry=0) at read-cache.c:2386
#2  0x000055555571eb7d in do_read_index (must_exist=0, path=0x555555ad7b90 ".git/index", istate=0x555555ac1c80 <the_index>) at hash.h:244
#3  read_index_from (istate=0x555555ac1c80 <the_index>, path=0x555555ad7b90 ".git/index", gitdir=0x555555ad7b30 ".git") at read-cache.c:2426
#4  0x000055555573f4c2 in repo_read_index (repo=repo@entry=0x555555ac1da0 <the_repo>) at repository.c:286
#5  0x00005555556f14d0 in get_oid_with_context_1 (repo=repo@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", flags=flags@entry=0, prefix=prefix@entry=0x0, 
    oid=oid@entry=0x7fffffffdb00, oc=oc@entry=0x7fffffffda70) at object-name.c:1850
#6  0x00005555556f1f53 in get_oid_with_context (oc=0x7fffffffda70, oid=0x7fffffffdb00, flags=0, str=0x55555582c022 ":.gitmodules", repo=0x555555ac1da0 <the_repo>) at object-name.c:1947
#7  repo_get_oid (r=r@entry=0x555555ac1da0 <the_repo>, name=name@entry=0x55555582c022 ":.gitmodules", oid=oid@entry=0x7fffffffdb00) at object-name.c:1603
#8  0x000055555577330f in config_from_gitmodules (fn=fn@entry=0x555555773460 <gitmodules_fetch_config>, repo=0x555555ac1da0 <the_repo>, data=data@entry=0x7fffffffdb60) at submodule-config.c:650
#9  0x000055555577462d in config_from_gitmodules (data=0x7fffffffdb60, repo=<optimized out>, fn=0x555555773460 <gitmodules_fetch_config>) at submodule-config.c:638
#10 fetch_config_from_gitmodules (max_children=<optimized out>, recurse_submodules=<optimized out>) at submodule-config.c:800
#11 0x00005555555b9e41 in cmd_fetch (argc=1, argv=0x7fffffffe090, prefix=0x0) at builtin/fetch.c:1999
#12 0x0000555555573ff6 in run_builtin (argv=<optimized out>, argc=<optimized out>, p=<optimized out>) at git.c:528
#13 handle_builtin (argc=<optimized out>, argv=<optimized out>) at git.c:785
#14 0x000055555557528c in run_argv (argv=0x7fffffffddf0, argcp=0x7fffffffddfc) at git.c:857
#15 cmd_main (argc=<optimized out>, argv=<optimized out>) at git.c:993
#16 0x0000555555573ac8 in main (argc=3, argv=0x7fffffffe088) at common-main.c:52
```

The operations these commands use are guarded by items such as `index_name_pos()` and others. Since the `.gitmodules` file is always at root, we would not need to expand, anyway.
derrickstolee pushed a commit that referenced this pull request Feb 23, 2022
When trying to write a MIDX, we already prevent the case where there
weren't any packs present, and thus we would have written an empty MIDX.

But there is another "empty" case, which is more interesting, and we
don't yet handle. If we try to write a MIDX which has at least one pack,
but those packs together don't contain any objects, we will encounter a
BUG() when trying to use the bitmap corresponding to that MIDX, like so:

    $ git rev-parse HEAD | git pack-objects --revs --use-bitmap-index --stdout >/dev/null
    BUG: pack-revindex.c:394: pack_pos_to_midx: out-of-bounds object at 0

(note that in the above reproduction, both `--use-bitmap-index` and
`--stdout` are important, since without the former we won't even both to
load the .bitmap, and without the latter we wont attempt pack reuse).

The problem occurs when we try to discover the identity of the
preferred pack to determine which range if any of existing packs we can
reuse verbatim. This path is: `reuse_packfile_objects()` ->
`reuse_partial_packfile_from_bitmap()` -> `midx_preferred_pack()`.

    #4  0x000055555575401f in pack_pos_to_midx (m=0x555555997160, pos=0) at pack-revindex.c:394
    #5  0x00005555557502c8 in midx_preferred_pack (bitmap_git=0x55555599c280) at pack-bitmap.c:1431
    #6  0x000055555575036c in reuse_partial_packfile_from_bitmap (bitmap_git=0x55555599c280, packfile_out=0x5555559666b0 <reuse_packfile>,
        entries=0x5555559666b8 <reuse_packfile_objects>, reuse_out=0x5555559666c0 <reuse_packfile_bitmap>) at pack-bitmap.c:1452
    #7  0x00005555556041f6 in get_object_list_from_bitmap (revs=0x7fffffffcbf0) at builtin/pack-objects.c:3658
    #8  0x000055555560465c in get_object_list (ac=2, av=0x555555997050) at builtin/pack-objects.c:3765
    #9  0x0000555555605e4e in cmd_pack_objects (argc=0, argv=0x7fffffffe920, prefix=0x0) at builtin/pack-objects.c:4154

Since neither the .bitmap or MIDX stores the identity of the
preferred pack, we infer it by trying to load the first object in
pseudo-pack order, and then asking the MIDX which pack was chosen to
represent that object.

But this fails our bounds check, since there are zero objects in the
MIDX to begin with, which results in the BUG().

We could catch this more carefully in `midx_preferred_pack()`, but
signaling the absence of a preferred pack out to all of its callers is
somewhat awkward.

Instead, let's avoid writing a MIDX .bitmap without any objects
altogether. We catch this case in `write_midx_internal()`, and emit a
warning if the caller indicated they wanted to write a bitmap before
clearing out the relevant flags. If we somehow got to
write_midx_bitmap(), then we will call BUG(), but this should now be an
unreachable path.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Mar 15, 2022
Add "fast_unwind_on_malloc=0" to LSAN_OPTIONS to get more meaningful
stack traces from LSAN. This isn't required under ASAN which will emit
traces such as this one for a leak in "t/t0006-date.sh":

    $ ASAN_OPTIONS=detect_leaks=1 ./t0006-date.sh -vixd
    [...]
    Direct leak of 3 byte(s) in 1 object(s) allocated from:
        #0 0x488b94 in strdup (t/helper/test-tool+0x488b94)
        #1 0x9444a4 in xstrdup wrapper.c:29:14
        #2 0x5995fa in parse_date_format date.c:991:24
        #3 0x4d2056 in show_dates t/helper/test-date.c:39:2
        #4 0x4d174a in cmd__date t/helper/test-date.c:116:3
        #5 0x4cce89 in cmd_main t/helper/test-tool.c:127:11
        #6 0x4cd1e3 in main common-main.c:52:11
        #7 0x7fef3c695e49 in __libc_start_main csu/../csu/libc-start.c:314:16
        #8 0x422b09 in _start (t/helper/test-tool+0x422b09)

    SUMMARY: AddressSanitizer: 3 byte(s) leaked in 1 allocation(s).
    Aborted

Whereas LSAN would emit this instead:

    $ ./t0006-date.sh -vixd
    [...]
    Direct leak of 3 byte(s) in 1 object(s) allocated from:
        #0 0x4323b8 in malloc (t/helper/test-tool+0x4323b8)
        #1 0x7f2be1d614aa in strdup string/strdup.c:42:15

    SUMMARY: LeakSanitizer: 3 byte(s) leaked in 1 allocation(s).
    Aborted

Now we'll instead git this sensible stack trace under
LSAN. I.e. almost the same one (but starting with "malloc", as is
usual for LSAN) as under ASAN:

    Direct leak of 3 byte(s) in 1 object(s) allocated from:
        #0 0x4323b8 in malloc (t/helper/test-tool+0x4323b8)
        #1 0x7f012af5c4aa in strdup string/strdup.c:42:15
        #2 0x5cb164 in xstrdup wrapper.c:29:14
        #3 0x495ee9 in parse_date_format date.c:991:24
        #4 0x453aac in show_dates t/helper/test-date.c:39:2
        #5 0x453782 in cmd__date t/helper/test-date.c:116:3
        #6 0x451d95 in cmd_main t/helper/test-tool.c:127:11
        #7 0x451f1e in main common-main.c:52:11
        #8 0x7f012aef5e49 in __libc_start_main csu/../csu/libc-start.c:314:16
        #9 0x42e0a9 in _start (t/helper/test-tool+0x42e0a9)

    SUMMARY: LeakSanitizer: 3 byte(s) leaked in 1 allocation(s).
    Aborted

As the option name suggests this does make things slower, e.g. for
t0001-init.sh we're around 10% slower:

    $ hyperfine -L v 0,1 'LSAN_OPTIONS=fast_unwind_on_malloc={v} make T=t0001-init.sh' -r 3
    Benchmark 1: LSAN_OPTIONS=fast_unwind_on_malloc=0 make T=t0001-init.sh
      Time (mean ± σ):      2.135 s ±  0.015 s    [User: 1.951 s, System: 0.554 s]
      Range (min … max):    2.122 s …  2.152 s    3 runs

    Benchmark 2: LSAN_OPTIONS=fast_unwind_on_malloc=1 make T=t0001-init.sh
      Time (mean ± σ):      1.981 s ±  0.055 s    [User: 1.769 s, System: 0.488 s]
      Range (min … max):    1.941 s …  2.044 s    3 runs

    Summary
      'LSAN_OPTIONS=fast_unwind_on_malloc=1 make T=t0001-init.sh' ran
        1.08 ± 0.03 times faster than 'LSAN_OPTIONS=fast_unwind_on_malloc=0 make T=t0001-init.sh'

I think that's more than worth it to get the more meaningful stack
traces, we can always provide LSAN_OPTIONS=fast_unwind_on_malloc=0 for
one-off "fast" runs.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Mar 24, 2022
Add a failing test which demonstrates a regression in
a18d66c ("diff.c: free "buf" in diff_words_flush()", 2022-03-04),
the regression is discussed in detail in the subsequent commit. With
it running `git show --word-diff --color-moved` with SANITIZE=address
would emit:

	==31191==ERROR: AddressSanitizer: attempting double-free on 0x617000021100 in thread T0:
	    #0 0x49f0a2 in free (git+0x49f0a2)
	    #1 0x9b0e4d in diff_words_flush diff.c:2153:3
	    #2 0x9aed5d in fn_out_consume diff.c:2354:3
	    #3 0xe092ab in consume_one xdiff-interface.c:43:9
	    #4 0xe072eb in xdiff_outf xdiff-interface.c:76:10
	    #5 0xec7014 in xdl_emit_diffrec xdiff/xutils.c:53:6
	    [...]

	0x617000021100 is located 0 bytes inside of 768-byte region [0x617000021100,0x617000021400)
	freed by thread T0 here:
	    #0 0x49f0a2 in free (git+0x49f0a2)
	    [...(same stacktrace)...]

	previously allocated by thread T0 here:
	    #0 0x49f603 in __interceptor_realloc (git+0x49f603)
	    #1 0xde4da4 in xrealloc wrapper.c:126:8
	    #2 0x995dc5 in append_emitted_diff_symbol diff.c:794:2
	    #3 0x96c44a in emit_diff_symbol diff.c:1527:3
	    [...]

This was not caught by the test suite because we test `diff
--word-diff --color-moved` only so far.

Therefore, add a test for `show`, too.

Signed-off-by: Michael J Gruber <git@grubix.eu>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Apr 7, 2022
Stop moving the .git/hooks directory out of the way, or creating it
during test setup. Instead assume that it will contain
harmless *.sample files.

That we can assume that is discussed in point #4 of
f0d4d39 (test-lib: split up and deprecate test_create_repo(),
2021-05-10), those parts of this could and should have been done in
that change.

Removing the "mkdir -p" here will then validate that our templates are
being used, since we'd subsequently fail to create a hook in that
directory if it didn't exist. Subsequent commits will have those hooks
created by a "test_hook" wrapper, which will then being doing that
same validation.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Apr 7, 2022
In the preceding [1] (pack-objects: move revs out of
get_object_list(), 2022-03-22) the "repo_init_revisions()" was moved
to cmd_pack_objects() so that it unconditionally took place for all
invocations of "git pack-objects".

We'd thus start leaking memory, which is easily reproduced in
e.g. git.git by feeding e83c516 (Initial revision of "git", the
information manager from hell, 2005-04-07) to "git pack-objects";

    $ echo e83c516 | ./git pack-objects initial
    [...]
	==19130==ERROR: LeakSanitizer: detected memory leaks

	Direct leak of 7120 byte(s) in 1 object(s) allocated from:
	    #0 0x455308 in __interceptor_malloc (/home/avar/g/git/git+0x455308)
	    #1 0x75b399 in do_xmalloc /home/avar/g/git/wrapper.c:41:8
	    #2 0x75b356 in xmalloc /home/avar/g/git/wrapper.c:62:9
	    #3 0x5d7609 in prep_parse_options /home/avar/g/git/diff.c:5647:2
	    #4 0x5d415a in repo_diff_setup /home/avar/g/git/diff.c:4621:2
	    #5 0x6dffbb in repo_init_revisions /home/avar/g/git/revision.c:1853:2
	    #6 0x4f599d in cmd_pack_objects /home/avar/g/git/builtin/pack-objects.c:3980:2
	    #7 0x4592ca in run_builtin /home/avar/g/git/git.c:465:11
	    #8 0x457d81 in handle_builtin /home/avar/g/git/git.c:718:3
	    #9 0x458ca5 in run_argv /home/avar/g/git/git.c:785:4
	    #10 0x457b40 in cmd_main /home/avar/g/git/git.c:916:19
	    #11 0x562259 in main /home/avar/g/git/common-main.c:56:11
	    #12 0x7fce792ac7ec in __libc_start_main csu/../csu/libc-start.c:332:16
	    #13 0x4300f9 in _start (/home/avar/g/git/git+0x4300f9)

	SUMMARY: LeakSanitizer: 7120 byte(s) leaked in 1 allocation(s).
	Aborted

Narrowly fixing that commit would have been easy, just add call
repo_init_revisions() right before get_object_list(), which is
effectively what was done before that commit.

But an unstated constraint when setting it up early is that it was
needed for the subsequent [2] (pack-objects: parse --filter directly
into revs.filter, 2022-03-22), i.e. we might have a --filter
command-line option, and need to either have the "struct rev_info"
setup when we encounter that option, or later.

Let's just change the control flow so that we'll instead set up the
"struct rev_info" only when we need it. Doing so leads to a bit more
verbosity, but it's a lot clearer what we're doing and why.

An earlier version of this commit[3] went behind
opt_parse_list_objects_filter()'s back by faking up a "struct option"
before calling it. Let's avoid that and instead create a blessed API
for this pattern.

We could furthermore combine the two get_object_list() invocations
here by having repo_init_revisions() invoked on &pfd.revs, but I think
clearly separating the two makes the flow clearer. Likewise
redundantly but explicitly (i.e. redundant v.s. a "{ 0 }") "0" to
"have_revs" early in cmd_pack_objects().

While we're at it add parentheses around the arguments to the OPT_*
macros in in list-objects-filter-options.h, as we need to change those
lines anyway. It doesn't matter in this case, but is good general
practice.

1. https://lore.kernel.org/git/619b757d98465dbc4995bdc11a5282fbfcbd3daa.1647970119.git.gitgitgadget@gmail.com
2. https://lore.kernel.org/git/97de926904988b89b5663bd4c59c011a1723a8f5.1647970119.git.gitgitgadget@gmail.com
3. https://lore.kernel.org/git/patch-1.1-193534b0f07-20220325T121715Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Sep 19, 2022
Since commit fcc07e9 (is_promisor_object(): free tree buffer after
parsing, 2021-04-13), we'll always free the buffers attached to a
"struct tree" after searching them for promisor links. But there's an
important case where we don't want to do so: if somebody else is already
using the tree!

This can happen during a "rev-list --missing=allow-promisor" traversal
in a partial clone that is missing one or more trees or blobs. The
backtrace for the free looks like this:

      #1 free_tree_buffer tree.c:147
      #2 add_promisor_object packfile.c:2250
      #3 for_each_object_in_pack packfile.c:2190
      #4 for_each_packed_object packfile.c:2215
      #5 is_promisor_object packfile.c:2272
      #6 finish_object__ma builtin/rev-list.c:245
      #7 finish_object builtin/rev-list.c:261
      #8 show_object builtin/rev-list.c:274
      #9 process_blob list-objects.c:63
      #10 process_tree_contents list-objects.c:145
      #11 process_tree list-objects.c:201
      #12 traverse_trees_and_blobs list-objects.c:344
      [...]

We're in the middle of walking through the entries of a tree object via
process_tree_contents(). We see a blob (or it could even be another tree
entry) that we don't have, so we call is_promisor_object() to check it.
That function loops over all of the objects in the promisor packfile,
including the tree we're currently walking. When we're done with it
there, we free the tree buffer. But as we return to the walk in
process_tree_contents(), it's still holding on to a pointer to that
buffer, via its tree_desc iterator, and it accesses the freed memory.

Even a trivial use of "--missing=allow-promisor" triggers this problem,
as the included test demonstrates (it's just a vanilla --blob:none
clone).

We can detect this case by only freeing the tree buffer if it was
allocated on our behalf. This is a little tricky since that happens
inside parse_object(), and it doesn't tell us whether the object was
already parsed, or whether it allocated the buffer itself. But by
checking for an already-parsed tree beforehand, we can distinguish the
two cases.

That feels a little hacky, and does incur an extra lookup in the
object-hash table. But that cost is fairly minimal compared to actually
loading objects (and since we're iterating the whole pack here, we're
likely to be loading most objects, rather than reusing cached results).

It may also be a good direction for this function in general, as there
are other possible optimizations that rely on doing some analysis before
parsing:

  - we could detect blobs and avoid reading their contents; they can't
    link to other objects, but parse_object() doesn't know that we don't
    care about checking their hashes.

  - we could avoid allocating object structs entirely for most objects
    (since we really only need them in the oidset), which would save
    some memory.

  - promisor commits could use the commit-graph rather than loading the
    object from disk

This commit doesn't do any of those optimizations, but I think it argues
that this direction is reasonable, rather than relying on parse_object()
and trying to teach it to give us more information about whether it
parsed.

The included test fails reliably under SANITIZE=address just when
running "rev-list --missing=allow-promisor". Checking the output isn't
strictly necessary to detect the bug, but it seems like a reasonable
addition given the general lack of coverage for "allow-promisor" in the
test suite.

Reported-by: Andrew Olsen <andrew.olsen@koordinates.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Sep 19, 2022
Fix a memory leak occuring in case of pathspec copy in preload_index.

Direct leak of 8 byte(s) in 8 object(s) allocated from:
    #0 0x7f0a353ead47 in __interceptor_malloc (/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/libasan.so.6+0xb5d47)
    #1 0x55750995e840 in do_xmalloc /home/anthony/src/c/git/wrapper.c:51
    #2 0x55750995e840 in xmalloc /home/anthony/src/c/git/wrapper.c:72
    #3 0x55750970f824 in copy_pathspec /home/anthony/src/c/git/pathspec.c:684
    #4 0x557509717278 in preload_index /home/anthony/src/c/git/preload-index.c:135
    #5 0x55750975f21e in refresh_index /home/anthony/src/c/git/read-cache.c:1633
    #6 0x55750915b926 in cmd_status builtin/commit.c:1547
    #7 0x5575090e1680 in run_builtin /home/anthony/src/c/git/git.c:466
    #8 0x5575090e1680 in handle_builtin /home/anthony/src/c/git/git.c:720
    #9 0x5575090e284a in run_argv /home/anthony/src/c/git/git.c:787
    #10 0x5575090e284a in cmd_main /home/anthony/src/c/git/git.c:920
    #11 0x5575090dbf82 in main /home/anthony/src/c/git/common-main.c:56
    #12 0x7f0a348230ab  (/lib64/libc.so.6+0x290ab)

Signed-off-by: Anthony Delannoy <anthony.2lannoy@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Oct 24, 2022
In the tmp-objdir api, tmp_objdir_create will create a temporary
directory but also register signal handlers responsible for removing
the directory's contents and the directory itself. However, the
function responsible for recursively removing the contents and
directory, remove_dir_recurse() calls opendir(3) and closedir(3).
This can be problematic because these functions allocate and free
memory, which are not async-signal-safe functions. This can lead to
deadlocks.

One place we call tmp_objdir_create() is in git-receive-pack, where
we create a temporary quarantine directory "incoming". Incoming
objects will be written to this directory before they get moved to
the object directory.

We have observed this code leading to a deadlock:

	Thread 1 (Thread 0x7f621ba0b200 (LWP 326305)):
	#0  __lll_lock_wait_private (futex=futex@entry=0x7f621bbf8b80
		<main_arena>) at ./lowlevellock.c:35
	#1  0x00007f621baa635b in __GI___libc_malloc
		(bytes=bytes@entry=32816) at malloc.c:3064
	#2  0x00007f621bae9f49 in __alloc_dir (statp=0x7fff2ea7ed60,
		flags=0, close_fd=true, fd=5)
		at ../sysdeps/posix/opendir.c:118
	#3  opendir_tail (fd=5) at ../sysdeps/posix/opendir.c:69
	#4  __opendir (name=<optimized out>)
		at ../sysdeps/posix/opendir.c:92
	#5  0x0000557c19c77de1 in remove_dir_recurse ()
	git#6  0x0000557c19d81a4f in remove_tmp_objdir_on_signal ()
	#7  <signal handler called>
	git#8  _int_malloc (av=av@entry=0x7f621bbf8b80 <main_arena>,
		bytes=bytes@entry=7160) at malloc.c:4116
	git#9  0x00007f621baa62c9 in __GI___libc_malloc (bytes=7160)
		at malloc.c:3066
	git#10 0x00007f621bd1e987 in inflateInit2_ ()
		from /opt/gitlab/embedded/lib/libz.so.1
	git#11 0x0000557c19dbe5f4 in git_inflate_init ()
	git#12 0x0000557c19cee02a in unpack_compressed_entry ()
	git#13 0x0000557c19cf08cb in unpack_entry ()
	git#14 0x0000557c19cf0f32 in packed_object_info ()
	git#15 0x0000557c19cd68cd in do_oid_object_info_extended ()
	git#16 0x0000557c19cd6e2b in read_object_file_extended ()
	git#17 0x0000557c19cdec2f in parse_object ()
	git#18 0x0000557c19c34977 in lookup_commit_reference_gently ()
	git#19 0x0000557c19d69309 in mark_uninteresting ()
	git#20 0x0000557c19d2d180 in do_for_each_repo_ref_iterator ()
	git#21 0x0000557c19d21678 in for_each_ref ()
	git#22 0x0000557c19d6a94f in assign_shallow_commits_to_refs ()
	git#23 0x0000557c19bc02b2 in cmd_receive_pack ()
	git#24 0x0000557c19b29fdd in handle_builtin ()
	git#25 0x0000557c19b2a526 in cmd_main ()
	git#26 0x0000557c19b28ea2 in main ()

Since we can't do the cleanup in a portable and signal-safe way, skip
the cleanup when we're handling a signal.

This means that when signal handling, the temporary directory may not
get cleaned up properly. This is mitigated by b3cecf4 (tmp-objdir: new
API for creating temporary writable databases, 2021-12-06) which changed
the default name and allows gc to clean up these temporary directories.

In the event of a normal exit, we should still be cleaning up via the
atexit() handler.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Jan 17, 2023
There is an out-of-bounds read possible when parsing gitattributes that
have an attribute that is 2^31+1 bytes long. This is caused due to an
integer overflow when we assign the result of strlen(3P) to an `int`,
where we use the wrapped-around value in a subsequent call to
memcpy(3P). The following code reproduces the issue:

    blob=$(perl -e 'print "a" x 2147483649 . " attr"' | git hash-object -w --stdin)
    git update-index --add --cacheinfo 100644,$blob,.gitattributes
    git check-attr --all file

    AddressSanitizer:DEADLYSIGNAL
    =================================================================
    ==8451==ERROR: AddressSanitizer: SEGV on unknown address 0x7f93efa00800 (pc 0x7f94f1f8f082 bp 0x7ffddb59b3a0 sp 0x7ffddb59ab28 T0)
    ==8451==The signal is caused by a READ memory access.
        #0 0x7f94f1f8f082  (/usr/lib/libc.so.6+0x176082)
        #1 0x7f94f2047d9c in __interceptor_strspn /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:752
        #2 0x560e190f7f26 in parse_attr_line attr.c:375
        #3 0x560e190f9663 in handle_attr_line attr.c:660
        #4 0x560e190f9ddd in read_attr_from_index attr.c:769
        #5 0x560e190f9f14 in read_attr attr.c:797
        #6 0x560e190fa24e in bootstrap_attr_stack attr.c:867
        #7 0x560e190fa4a5 in prepare_attr_stack attr.c:902
        #8 0x560e190fb5dc in collect_some_attrs attr.c:1097
        #9 0x560e190fb93f in git_all_attrs attr.c:1128
        #10 0x560e18e6136e in check_attr builtin/check-attr.c:67
        #11 0x560e18e61c12 in cmd_check_attr builtin/check-attr.c:183
        #12 0x560e18e15993 in run_builtin git.c:466
        #13 0x560e18e16397 in handle_builtin git.c:721
        #14 0x560e18e16b2b in run_argv git.c:788
        #15 0x560e18e17991 in cmd_main git.c:926
        #16 0x560e190ae2bd in main common-main.c:57
        #17 0x7f94f1e3c28f  (/usr/lib/libc.so.6+0x2328f)
        #18 0x7f94f1e3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        #19 0x560e18e110e4 in _start ../sysdeps/x86_64/start.S:115

    AddressSanitizer can not provide additional info.
    SUMMARY: AddressSanitizer: SEGV (/usr/lib/libc.so.6+0x176082)
    ==8451==ABORTING

Fix this bug by converting the variable to a `size_t` instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Jan 17, 2023
It is possible to trigger an integer overflow when parsing attribute
names when there are more than 2^31 of them for a single pattern. This
can either lead to us dying due to trying to request too many bytes:

     blob=$(perl -e 'print "f" . " a=" x 2147483649' | git hash-object -w --stdin)
     git update-index --add --cacheinfo 100644,$blob,.gitattributes
     git attr-check --all file

    =================================================================
    ==1022==ERROR: AddressSanitizer: requested allocation size 0xfffffff800000032 (0xfffffff800001038 after adjustments for alignment, red zones etc.) exceeds maximum supported size of 0x10000000000 (thread T0)
        #0 0x7fd3efabf411 in __interceptor_calloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77
        #1 0x5563a0a1e3d3 in xcalloc wrapper.c:150
        #2 0x5563a058d005 in parse_attr_line attr.c:384
        #3 0x5563a058e661 in handle_attr_line attr.c:660
        #4 0x5563a058eddb in read_attr_from_index attr.c:769
        #5 0x5563a058ef12 in read_attr attr.c:797
        #6 0x5563a058f24c in bootstrap_attr_stack attr.c:867
        #7 0x5563a058f4a3 in prepare_attr_stack attr.c:902
        #8 0x5563a05905da in collect_some_attrs attr.c:1097
        #9 0x5563a059093d in git_all_attrs attr.c:1128
        #10 0x5563a02f636e in check_attr builtin/check-attr.c:67
        #11 0x5563a02f6c12 in cmd_check_attr builtin/check-attr.c:183
        #12 0x5563a02aa993 in run_builtin git.c:466
        #13 0x5563a02ab397 in handle_builtin git.c:721
        #14 0x5563a02abb2b in run_argv git.c:788
        #15 0x5563a02ac991 in cmd_main git.c:926
        #16 0x5563a05432bd in main common-main.c:57
        #17 0x7fd3ef82228f  (/usr/lib/libc.so.6+0x2328f)

    ==1022==HINT: if you don't care about these errors you may set allocator_may_return_null=1
    SUMMARY: AddressSanitizer: allocation-size-too-big /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77 in __interceptor_calloc
    ==1022==ABORTING

Or, much worse, it can lead to an out-of-bounds write because we
underallocate and then memcpy(3P) into an array:

    perl -e '
        print "A " . "\rh="x2000000000;
        print "\rh="x2000000000;
        print "\rh="x294967294 . "\n"
    ' >.gitattributes
    git add .gitattributes
    git commit -am "evil attributes"

    $ git clone --quiet /path/to/repo
    =================================================================
    ==15062==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000002550 at pc 0x5555559884d5 bp 0x7fffffffbc60 sp 0x7fffffffbc58
    WRITE of size 8 at 0x602000002550 thread T0
        #0 0x5555559884d4 in parse_attr_line attr.c:393
        #1 0x5555559884d4 in handle_attr_line attr.c:660
        #2 0x555555988902 in read_attr_from_index attr.c:784
        #3 0x555555988902 in read_attr_from_index attr.c:747
        #4 0x555555988a1d in read_attr attr.c:800
        #5 0x555555989b0c in bootstrap_attr_stack attr.c:882
        #6 0x555555989b0c in prepare_attr_stack attr.c:917
        #7 0x555555989b0c in collect_some_attrs attr.c:1112
        #8 0x55555598b141 in git_check_attr attr.c:1126
        #9 0x555555a13004 in convert_attrs convert.c:1311
        #10 0x555555a95e04 in checkout_entry_ca entry.c:553
        #11 0x555555d58bf6 in checkout_entry entry.h:42
        #12 0x555555d58bf6 in check_updates unpack-trees.c:480
        #13 0x555555d5eb55 in unpack_trees unpack-trees.c:2040
        #14 0x555555785ab7 in checkout builtin/clone.c:724
        #15 0x555555785ab7 in cmd_clone builtin/clone.c:1384
        #16 0x55555572443c in run_builtin git.c:466
        #17 0x55555572443c in handle_builtin git.c:721
        #18 0x555555727872 in run_argv git.c:788
        #19 0x555555727872 in cmd_main git.c:926
        #20 0x555555721fa0 in main common-main.c:57
        #21 0x7ffff73f1d09 in __libc_start_main ../csu/libc-start.c:308
        #22 0x555555723f39 in _start (git+0x1cff39)

    0x602000002552 is located 0 bytes to the right of 2-byte region [0x602000002550,0x602000002552) allocated by thread T0 here:
        #0 0x7ffff768c037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
        #1 0x555555d7fff7 in xcalloc wrapper.c:150
        #2 0x55555598815f in parse_attr_line attr.c:384
        #3 0x55555598815f in handle_attr_line attr.c:660
        #4 0x555555988902 in read_attr_from_index attr.c:784
        #5 0x555555988902 in read_attr_from_index attr.c:747
        #6 0x555555988a1d in read_attr attr.c:800
        #7 0x555555989b0c in bootstrap_attr_stack attr.c:882
        #8 0x555555989b0c in prepare_attr_stack attr.c:917
        #9 0x555555989b0c in collect_some_attrs attr.c:1112
        #10 0x55555598b141 in git_check_attr attr.c:1126
        #11 0x555555a13004 in convert_attrs convert.c:1311
        #12 0x555555a95e04 in checkout_entry_ca entry.c:553
        #13 0x555555d58bf6 in checkout_entry entry.h:42
        #14 0x555555d58bf6 in check_updates unpack-trees.c:480
        #15 0x555555d5eb55 in unpack_trees unpack-trees.c:2040
        #16 0x555555785ab7 in checkout builtin/clone.c:724
        #17 0x555555785ab7 in cmd_clone builtin/clone.c:1384
        #18 0x55555572443c in run_builtin git.c:466
        #19 0x55555572443c in handle_builtin git.c:721
        #20 0x555555727872 in run_argv git.c:788
        #21 0x555555727872 in cmd_main git.c:926
        #22 0x555555721fa0 in main common-main.c:57
        #23 0x7ffff73f1d09 in __libc_start_main ../csu/libc-start.c:308

    SUMMARY: AddressSanitizer: heap-buffer-overflow attr.c:393 in parse_attr_line
    Shadow bytes around the buggy address:
      0x0c047fff8450: fa fa 00 02 fa fa 00 07 fa fa fd fd fa fa 00 00
      0x0c047fff8460: fa fa 02 fa fa fa fd fd fa fa 00 06 fa fa 05 fa
      0x0c047fff8470: fa fa fd fd fa fa 00 02 fa fa 06 fa fa fa 05 fa
      0x0c047fff8480: fa fa 07 fa fa fa fd fd fa fa 00 01 fa fa 00 02
      0x0c047fff8490: fa fa 00 03 fa fa 00 fa fa fa 00 01 fa fa 00 03
    =>0x0c047fff84a0: fa fa 00 01 fa fa 00 02 fa fa[02]fa fa fa fa fa
      0x0c047fff84b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c047fff84c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c047fff84d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c047fff84e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c047fff84f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    Shadow byte legend (one shadow byte represents 8 application bytes):
      Addressable:           00
      Partially addressable: 01 02 03 04 05 06 07
      Heap left redzone:       fa
      Freed heap region:       fd
      Stack left redzone:      f1
      Stack mid redzone:       f2
      Stack right redzone:     f3
      Stack after return:      f5
      Stack use after scope:   f8
      Global redzone:          f9
      Global init order:       f6
      Poisoned by user:        f7
      Container overflow:      fc
      Array cookie:            ac
      Intra object redzone:    bb
      ASan internal:           fe
      Left alloca redzone:     ca
      Right alloca redzone:    cb
      Shadow gap:              cc
    ==15062==ABORTING

Fix this bug by using `size_t` instead to count the number of attributes
so that this value cannot reasonably overflow without running out of
memory before already.

Reported-by: Markus Vervier <markus.vervier@x41-dsec.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Jan 17, 2023
When using a padding specifier in the pretty format passed to git-log(1)
we need to calculate the string length in several places. These string
lengths are stored in `int`s though, which means that these can easily
overflow when the input lengths exceeds 2GB. This can ultimately lead to
an out-of-bounds write when these are used in a call to memcpy(3P):

        ==8340==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f1ec62f97fe at pc 0x7f2127e5f427 bp 0x7ffd3bd63de0 sp 0x7ffd3bd63588
    WRITE of size 1 at 0x7f1ec62f97fe thread T0
        #0 0x7f2127e5f426 in __interceptor_memcpy /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827
        #1 0x5628e96aa605 in format_and_pad_commit pretty.c:1762
        #2 0x5628e96aa7f4 in format_commit_item pretty.c:1801
        #3 0x5628e97cdb24 in strbuf_expand strbuf.c:429
        #4 0x5628e96ab060 in repo_format_commit_message pretty.c:1869
        #5 0x5628e96acd0f in pretty_print_commit pretty.c:2161
        #6 0x5628e95a44c8 in show_log log-tree.c:781
        #7 0x5628e95a76ba in log_tree_commit log-tree.c:1117
        #8 0x5628e922bed5 in cmd_log_walk_no_free builtin/log.c:508
        #9 0x5628e922c35b in cmd_log_walk builtin/log.c:549
        #10 0x5628e922f1a2 in cmd_log builtin/log.c:883
        #11 0x5628e9106993 in run_builtin git.c:466
        #12 0x5628e9107397 in handle_builtin git.c:721
        #13 0x5628e9107b07 in run_argv git.c:788
        #14 0x5628e91088a7 in cmd_main git.c:923
        #15 0x5628e939d682 in main common-main.c:57
        #16 0x7f2127c3c28f  (/usr/lib/libc.so.6+0x2328f)
        #17 0x7f2127c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        #18 0x5628e91020e4 in _start ../sysdeps/x86_64/start.S:115

    0x7f1ec62f97fe is located 2 bytes to the left of 4831838265-byte region [0x7f1ec62f9800,0x7f1fe62f9839)
    allocated by thread T0 here:
        #0 0x7f2127ebe7ea in __interceptor_realloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:85
        #1 0x5628e98774d4 in xrealloc wrapper.c:136
        #2 0x5628e97cb01c in strbuf_grow strbuf.c:99
        #3 0x5628e97ccd42 in strbuf_addchars strbuf.c:327
        #4 0x5628e96aa55c in format_and_pad_commit pretty.c:1761
        #5 0x5628e96aa7f4 in format_commit_item pretty.c:1801
        #6 0x5628e97cdb24 in strbuf_expand strbuf.c:429
        #7 0x5628e96ab060 in repo_format_commit_message pretty.c:1869
        #8 0x5628e96acd0f in pretty_print_commit pretty.c:2161
        #9 0x5628e95a44c8 in show_log log-tree.c:781
        #10 0x5628e95a76ba in log_tree_commit log-tree.c:1117
        #11 0x5628e922bed5 in cmd_log_walk_no_free builtin/log.c:508
        #12 0x5628e922c35b in cmd_log_walk builtin/log.c:549
        #13 0x5628e922f1a2 in cmd_log builtin/log.c:883
        #14 0x5628e9106993 in run_builtin git.c:466
        #15 0x5628e9107397 in handle_builtin git.c:721
        #16 0x5628e9107b07 in run_argv git.c:788
        #17 0x5628e91088a7 in cmd_main git.c:923
        #18 0x5628e939d682 in main common-main.c:57
        #19 0x7f2127c3c28f  (/usr/lib/libc.so.6+0x2328f)
        #20 0x7f2127c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        #21 0x5628e91020e4 in _start ../sysdeps/x86_64/start.S:115

    SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827 in __interceptor_memcpy
    Shadow bytes around the buggy address:
      0x0fe458c572a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0fe458c572b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0fe458c572c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0fe458c572d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0fe458c572e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    =>0x0fe458c572f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]
      0x0fe458c57300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0fe458c57310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0fe458c57320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0fe458c57330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0fe458c57340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    Shadow byte legend (one shadow byte represents 8 application bytes):
      Addressable:           00
      Partially addressable: 01 02 03 04 05 06 07
      Heap left redzone:       fa
      Freed heap region:       fd
      Stack left redzone:      f1
      Stack mid redzone:       f2
      Stack right redzone:     f3
      Stack after return:      f5
      Stack use after scope:   f8
      Global redzone:          f9
      Global init order:       f6
      Poisoned by user:        f7
      Container overflow:      fc
      Array cookie:            ac
      Intra object redzone:    bb
      ASan internal:           fe
      Left alloca redzone:     ca
      Right alloca redzone:    cb
    ==8340==ABORTING

The pretty format can also be used in `git archive` operations via the
`export-subst` attribute. So this is what in our opinion makes this a
critical issue in the context of Git forges which allow to download an
archive of user supplied Git repositories.

Fix this vulnerability by using `size_t` instead of `int` to track the
string lengths. Add tests which detect this vulnerability when Git is
compiled with the address sanitizer.

Reported-by: Joern Schneeweisz <jschneeweisz@gitlab.com>
Original-patch-by: Joern Schneeweisz <jschneeweisz@gitlab.com>
Modified-by: Taylor  Blau <me@ttalorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Jan 17, 2023
With the `%>>(<N>)` pretty formatter, you can ask git-log(1) et al to
steal spaces. To do so we need to look ahead of the next token to see
whether there are spaces there. This loop takes into account ANSI
sequences that end with an `m`, and if it finds any it will skip them
until it finds the first space. While doing so it does not take into
account the buffer's limits though and easily does an out-of-bounds
read.

Add a test that hits this behaviour. While we don't have an easy way to
verify this, the test causes the following failure when run with
`SANITIZE=address`:

    ==37941==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000000baf at pc 0x55ba6f88e0d0 bp 0x7ffc84c50d20 sp 0x7ffc84c50d10
    READ of size 1 at 0x603000000baf thread T0
        #0 0x55ba6f88e0cf in format_and_pad_commit pretty.c:1712
        #1 0x55ba6f88e7b4 in format_commit_item pretty.c:1801
        #2 0x55ba6f9b1ae4 in strbuf_expand strbuf.c:429
        #3 0x55ba6f88f020 in repo_format_commit_message pretty.c:1869
        #4 0x55ba6f890ccf in pretty_print_commit pretty.c:2161
        #5 0x55ba6f7884c8 in show_log log-tree.c:781
        #6 0x55ba6f78b6ba in log_tree_commit log-tree.c:1117
        #7 0x55ba6f40fed5 in cmd_log_walk_no_free builtin/log.c:508
        #8 0x55ba6f41035b in cmd_log_walk builtin/log.c:549
        #9 0x55ba6f4131a2 in cmd_log builtin/log.c:883
        #10 0x55ba6f2ea993 in run_builtin git.c:466
        #11 0x55ba6f2eb397 in handle_builtin git.c:721
        #12 0x55ba6f2ebb07 in run_argv git.c:788
        #13 0x55ba6f2ec8a7 in cmd_main git.c:923
        #14 0x55ba6f581682 in main common-main.c:57
        #15 0x7f2d08c3c28f  (/usr/lib/libc.so.6+0x2328f)
        #16 0x7f2d08c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        #17 0x55ba6f2e60e4 in _start ../sysdeps/x86_64/start.S:115

    0x603000000baf is located 1 bytes to the left of 24-byte region [0x603000000bb0,0x603000000bc8)
    allocated by thread T0 here:
        #0 0x7f2d08ebe7ea in __interceptor_realloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:85
        #1 0x55ba6fa5b494 in xrealloc wrapper.c:136
        #2 0x55ba6f9aefdc in strbuf_grow strbuf.c:99
        #3 0x55ba6f9b0a06 in strbuf_add strbuf.c:298
        #4 0x55ba6f9b1a25 in strbuf_expand strbuf.c:418
        #5 0x55ba6f88f020 in repo_format_commit_message pretty.c:1869
        #6 0x55ba6f890ccf in pretty_print_commit pretty.c:2161
        #7 0x55ba6f7884c8 in show_log log-tree.c:781
        #8 0x55ba6f78b6ba in log_tree_commit log-tree.c:1117
        #9 0x55ba6f40fed5 in cmd_log_walk_no_free builtin/log.c:508
        #10 0x55ba6f41035b in cmd_log_walk builtin/log.c:549
        #11 0x55ba6f4131a2 in cmd_log builtin/log.c:883
        #12 0x55ba6f2ea993 in run_builtin git.c:466
        #13 0x55ba6f2eb397 in handle_builtin git.c:721
        #14 0x55ba6f2ebb07 in run_argv git.c:788
        #15 0x55ba6f2ec8a7 in cmd_main git.c:923
        #16 0x55ba6f581682 in main common-main.c:57
        #17 0x7f2d08c3c28f  (/usr/lib/libc.so.6+0x2328f)
        #18 0x7f2d08c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        #19 0x55ba6f2e60e4 in _start ../sysdeps/x86_64/start.S:115

    SUMMARY: AddressSanitizer: heap-buffer-overflow pretty.c:1712 in format_and_pad_commit
    Shadow bytes around the buggy address:
      0x0c067fff8120: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd
      0x0c067fff8130: fd fd fa fa fd fd fd fd fa fa fd fd fd fa fa fa
      0x0c067fff8140: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa
      0x0c067fff8150: fa fa fd fd fd fd fa fa 00 00 00 fa fa fa fd fd
      0x0c067fff8160: fd fa fa fa fd fd fd fa fa fa fd fd fd fa fa fa
    =>0x0c067fff8170: fd fd fd fa fa[fa]00 00 00 fa fa fa 00 00 00 fa
      0x0c067fff8180: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff8190: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff81a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff81b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff81c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    Shadow byte legend (one shadow byte represents 8 application bytes):
      Addressable:           00
      Partially addressable: 01 02 03 04 05 06 07
      Heap left redzone:       fa
      Freed heap region:       fd
      Stack left redzone:      f1
      Stack mid redzone:       f2
      Stack right redzone:     f3
      Stack after return:      f5
      Stack use after scope:   f8
      Global redzone:          f9
      Global init order:       f6
      Poisoned by user:        f7
      Container overflow:      fc
      Array cookie:            ac
      Intra object redzone:    bb
      ASan internal:           fe
      Left alloca redzone:     ca
      Right alloca redzone:    cb

Luckily enough, this would only cause us to copy the out-of-bounds data
into the formatted commit in case we really had an ANSI sequence
preceding our buffer. So this bug likely has no security consequences.

Fix it regardless by not traversing past the buffer's start.

Reported-by: Patrick Steinhardt <ps@pks.im>
Reported-by: Eric Sesterhenn <eric.sesterhenn@x41-dsec.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Jan 17, 2023
An out-of-bounds read can be triggered when parsing an incomplete
padding format string passed via `--pretty=format` or in Git archives
when files are marked with the `export-subst` gitattribute.

This bug exists since we have introduced support for truncating output
via the `trunc` keyword a7f01c6 (pretty: support truncating in %>, %<
and %><, 2013-04-19). Before this commit, we used to find the end of the
formatting string by using strchr(3P). This function returns a `NULL`
pointer in case the character in question wasn't found. The subsequent
check whether any character was found thus simply checked the returned
pointer. After the commit we switched to strcspn(3P) though, which only
returns the offset to the first found character or to the trailing NUL
byte. As the end pointer is now computed by adding the offset to the
start pointer it won't be `NULL` anymore, and as a consequence the check
doesn't do anything anymore.

The out-of-bounds data that is being read can in fact end up in the
formatted string. As a consequence, it is possible to leak memory
contents either by calling git-log(1) or via git-archive(1) when any of
the archived files is marked with the `export-subst` gitattribute.

    ==10888==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000398 at pc 0x7f0356047cb2 bp 0x7fff3ffb95d0 sp 0x7fff3ffb8d78
    READ of size 1 at 0x602000000398 thread T0
        #0 0x7f0356047cb1 in __interceptor_strchrnul /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:725
        #1 0x563b7cec9a43 in strbuf_expand strbuf.c:417
        #2 0x563b7cda7060 in repo_format_commit_message pretty.c:1869
        #3 0x563b7cda8d0f in pretty_print_commit pretty.c:2161
        #4 0x563b7cca04c8 in show_log log-tree.c:781
        #5 0x563b7cca36ba in log_tree_commit log-tree.c:1117
        #6 0x563b7c927ed5 in cmd_log_walk_no_free builtin/log.c:508
        #7 0x563b7c92835b in cmd_log_walk builtin/log.c:549
        #8 0x563b7c92b1a2 in cmd_log builtin/log.c:883
        #9 0x563b7c802993 in run_builtin git.c:466
        #10 0x563b7c803397 in handle_builtin git.c:721
        #11 0x563b7c803b07 in run_argv git.c:788
        #12 0x563b7c8048a7 in cmd_main git.c:923
        #13 0x563b7ca99682 in main common-main.c:57
        #14 0x7f0355e3c28f  (/usr/lib/libc.so.6+0x2328f)
        #15 0x7f0355e3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        #16 0x563b7c7fe0e4 in _start ../sysdeps/x86_64/start.S:115

    0x602000000398 is located 0 bytes to the right of 8-byte region [0x602000000390,0x602000000398)
    allocated by thread T0 here:
        #0 0x7f0356072faa in __interceptor_strdup /usr/src/debug/gcc/libsanitizer/asan/asan_interceptors.cpp:439
        #1 0x563b7cf7317c in xstrdup wrapper.c:39
        #2 0x563b7cd9a06a in save_user_format pretty.c:40
        #3 0x563b7cd9b3e5 in get_commit_format pretty.c:173
        #4 0x563b7ce54ea0 in handle_revision_opt revision.c:2456
        #5 0x563b7ce597c9 in setup_revisions revision.c:2850
        #6 0x563b7c9269e0 in cmd_log_init_finish builtin/log.c:269
        #7 0x563b7c927362 in cmd_log_init builtin/log.c:348
        #8 0x563b7c92b193 in cmd_log builtin/log.c:882
        #9 0x563b7c802993 in run_builtin git.c:466
        #10 0x563b7c803397 in handle_builtin git.c:721
        #11 0x563b7c803b07 in run_argv git.c:788
        #12 0x563b7c8048a7 in cmd_main git.c:923
        #13 0x563b7ca99682 in main common-main.c:57
        #14 0x7f0355e3c28f  (/usr/lib/libc.so.6+0x2328f)
        #15 0x7f0355e3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        #16 0x563b7c7fe0e4 in _start ../sysdeps/x86_64/start.S:115

    SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:725 in __interceptor_strchrnul
    Shadow bytes around the buggy address:
      0x0c047fff8020: fa fa fd fd fa fa 00 06 fa fa 05 fa fa fa fd fd
      0x0c047fff8030: fa fa 00 02 fa fa 06 fa fa fa 05 fa fa fa fd fd
      0x0c047fff8040: fa fa 00 07 fa fa 03 fa fa fa fd fd fa fa 00 00
      0x0c047fff8050: fa fa 00 01 fa fa fd fd fa fa 00 00 fa fa 00 01
      0x0c047fff8060: fa fa 00 06 fa fa 00 06 fa fa 05 fa fa fa 05 fa
    =>0x0c047fff8070: fa fa 00[fa]fa fa fd fa fa fa fd fd fa fa fd fd
      0x0c047fff8080: fa fa fd fd fa fa 00 00 fa fa 00 fa fa fa fd fa
      0x0c047fff8090: fa fa fd fd fa fa 00 00 fa fa fa fa fa fa fa fa
      0x0c047fff80a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c047fff80b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c047fff80c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    Shadow byte legend (one shadow byte represents 8 application bytes):
      Addressable:           00
      Partially addressable: 01 02 03 04 05 06 07
      Heap left redzone:       fa
      Freed heap region:       fd
      Stack left redzone:      f1
      Stack mid redzone:       f2
      Stack right redzone:     f3
      Stack after return:      f5
      Stack use after scope:   f8
      Global redzone:          f9
      Global init order:       f6
      Poisoned by user:        f7
      Container overflow:      fc
      Array cookie:            ac
      Intra object redzone:    bb
      ASan internal:           fe
      Left alloca redzone:     ca
      Right alloca redzone:    cb
    ==10888==ABORTING

Fix this bug by checking whether `end` points at the trailing NUL byte.
Add a test which catches this out-of-bounds read and which demonstrates
that we used to write out-of-bounds data into the formatted message.

Reported-by: Markus Vervier <markus.vervier@x41-dsec.de>
Original-patch-by: Markus Vervier <markus.vervier@x41-dsec.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Jan 17, 2023
The return type of both `utf8_strwidth()` and `utf8_strnwidth()` is
`int`, but we operate on string lengths which are typically of type
`size_t`. This means that when the string is longer than `INT_MAX`, we
will overflow and thus return a negative result.

This can lead to an out-of-bounds write with `--pretty=format:%<1)%B`
and a commit message that is 2^31+1 bytes long:

    =================================================================
    ==26009==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000001168 at pc 0x7f95c4e5f427 bp 0x7ffd8541c900 sp 0x7ffd8541c0a8
    WRITE of size 2147483649 at 0x603000001168 thread T0
        #0 0x7f95c4e5f426 in __interceptor_memcpy /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827
        #1 0x5612bbb1068c in format_and_pad_commit pretty.c:1763
        #2 0x5612bbb1087a in format_commit_item pretty.c:1801
        #3 0x5612bbc33bab in strbuf_expand strbuf.c:429
        #4 0x5612bbb110e7 in repo_format_commit_message pretty.c:1869
        #5 0x5612bbb12d96 in pretty_print_commit pretty.c:2161
        #6 0x5612bba0a4d5 in show_log log-tree.c:781
        #7 0x5612bba0d6c7 in log_tree_commit log-tree.c:1117
        #8 0x5612bb691ed5 in cmd_log_walk_no_free builtin/log.c:508
        #9 0x5612bb69235b in cmd_log_walk builtin/log.c:549
        #10 0x5612bb6951a2 in cmd_log builtin/log.c:883
        #11 0x5612bb56c993 in run_builtin git.c:466
        #12 0x5612bb56d397 in handle_builtin git.c:721
        #13 0x5612bb56db07 in run_argv git.c:788
        #14 0x5612bb56e8a7 in cmd_main git.c:923
        #15 0x5612bb803682 in main common-main.c:57
        #16 0x7f95c4c3c28f  (/usr/lib/libc.so.6+0x2328f)
        #17 0x7f95c4c3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        #18 0x5612bb5680e4 in _start ../sysdeps/x86_64/start.S:115

    0x603000001168 is located 0 bytes to the right of 24-byte region [0x603000001150,0x603000001168)
    allocated by thread T0 here:
        #0 0x7f95c4ebe7ea in __interceptor_realloc /usr/src/debug/gcc/libsanitizer/asan/asan_malloc_linux.cpp:85
        #1 0x5612bbcdd556 in xrealloc wrapper.c:136
        #2 0x5612bbc310a3 in strbuf_grow strbuf.c:99
        #3 0x5612bbc32acd in strbuf_add strbuf.c:298
        #4 0x5612bbc33aec in strbuf_expand strbuf.c:418
        #5 0x5612bbb110e7 in repo_format_commit_message pretty.c:1869
        #6 0x5612bbb12d96 in pretty_print_commit pretty.c:2161
        #7 0x5612bba0a4d5 in show_log log-tree.c:781
        #8 0x5612bba0d6c7 in log_tree_commit log-tree.c:1117
        #9 0x5612bb691ed5 in cmd_log_walk_no_free builtin/log.c:508
        #10 0x5612bb69235b in cmd_log_walk builtin/log.c:549
        #11 0x5612bb6951a2 in cmd_log builtin/log.c:883
        #12 0x5612bb56c993 in run_builtin git.c:466
        #13 0x5612bb56d397 in handle_builtin git.c:721
        #14 0x5612bb56db07 in run_argv git.c:788
        #15 0x5612bb56e8a7 in cmd_main git.c:923
        #16 0x5612bb803682 in main common-main.c:57
        #17 0x7f95c4c3c28f  (/usr/lib/libc.so.6+0x2328f)

    SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827 in __interceptor_memcpy
    Shadow bytes around the buggy address:
      0x0c067fff81d0: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa
      0x0c067fff81e0: fa fa fd fd fd fd fa fa fd fd fd fd fa fa fd fd
      0x0c067fff81f0: fd fa fa fa fd fd fd fa fa fa fd fd fd fa fa fa
      0x0c067fff8200: fd fd fd fa fa fa fd fd fd fd fa fa 00 00 00 fa
      0x0c067fff8210: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd
    =>0x0c067fff8220: fd fa fa fa fd fd fd fa fa fa 00 00 00[fa]fa fa
      0x0c067fff8230: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff8240: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff8250: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff8260: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
      0x0c067fff8270: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    Shadow byte legend (one shadow byte represents 8 application bytes):
      Addressable:           00
      Partially addressable: 01 02 03 04 05 06 07
      Heap left redzone:       fa
      Freed heap region:       fd
      Stack left redzone:      f1
      Stack mid redzone:       f2
      Stack right redzone:     f3
      Stack after return:      f5
      Stack use after scope:   f8
      Global redzone:          f9
      Global init order:       f6
      Poisoned by user:        f7
      Container overflow:      fc
      Array cookie:            ac
      Intra object redzone:    bb
      ASan internal:           fe
      Left alloca redzone:     ca
      Right alloca redzone:    cb
    ==26009==ABORTING

Now the proper fix for this would be to convert both functions to return
an `size_t` instead of an `int`. But given that this commit may be part
of a security release, let's instead do the minimal viable fix and die
in case we see an overflow.

Add a test that would have previously caused us to crash.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Sep 27, 2023
When t5583-push-branches.sh was originally introduced via 425b4d7
(push: introduce '--branches' option, 2023-05-06), it was not leak-free.
In fact, the test did not even run correctly until 022fbb6 (t5583:
fix shebang line, 2023-05-12), but after applying that patch, we see a
failure at t5583.8:

    ==2529087==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 384 byte(s) in 1 object(s) allocated from:
        #0 0x7fb536330986 in __interceptor_realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
        #1 0x55e07606cbf9 in xrealloc wrapper.c:140
        #2 0x55e075fb6cb3 in prio_queue_put prio-queue.c:42
        #3 0x55e075ec81cb in get_reachable_subset commit-reach.c:917
        #4 0x55e075fe9cce in add_missing_tags remote.c:1518
        #5 0x55e075fea1e4 in match_push_refs remote.c:1665
        #6 0x55e076050a8e in transport_push transport.c:1378
        #7 0x55e075e2eb74 in push_with_options builtin/push.c:401
        #8 0x55e075e2edb0 in do_push builtin/push.c:458
        #9 0x55e075e2ff7a in cmd_push builtin/push.c:702
        #10 0x55e075d8aaf0 in run_builtin git.c:452
        #11 0x55e075d8af08 in handle_builtin git.c:706
        #12 0x55e075d8b12c in run_argv git.c:770
        #13 0x55e075d8b6a0 in cmd_main git.c:905
        #14 0x55e075e81f07 in main common-main.c:60
        #15 0x7fb5360ab6c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
        #16 0x7fb5360ab784 in __libc_start_main_impl ../csu/libc-start.c:360
        #17 0x55e075d88f40 in _start (git+0x1ff40) (BuildId: 38ad998b85a535e786129979443630d025ec2453)

    SUMMARY: LeakSanitizer: 384 byte(s) leaked in 1 allocation(s).

This leak was addressed independently via 68b5117 (commit-reach: fix
memory leak in get_reachable_subset(), 2023-06-03), which makes t5583
leak-free.

But t5583 was not in the tree when 68b5117 was written, and the two
only met after the latter was merged back in via 693bde4 (Merge
branch 'mh/commit-reach-get-reachable-plug-leak', 2023-06-20).

At that point, t5583 was leak-free. Let's mark it as such accordingly.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Apr 4, 2024
It is tempting to think of "files and directories" of the current
directory as valid inputs to the add and set subcommands of git
sparse-checkout.  However, in non-cone mode, they often aren't and using
them as potential completions leads to *many* forms of confusion:

Issue #1. It provides the *wrong* files and directories.

For
    git sparse-checkout add
we always want to add files and directories not currently in our sparse
checkout, which means we want file and directories not currently present
in the current working tree.  Providing the files and directories
currently present is thus always wrong.

For
    git sparse-checkout set
we have a similar problem except in the subset of cases where we are
trying to narrow our checkout to a strict subset of what we already
have.  That is not a very common scenario, especially since it often
does not even happen to be true for the first use of the command; for
years we required users to create a sparse-checkout via
    git sparse-checkout init
    git sparse-checkout set <args...>
(or use a clone option that did the init step for you at clone time).
The init command creates a minimal sparse-checkout with just the
top-level directory present, meaning the set command has to be used to
expand the checkout.  Thus, only in a special and perhaps unusual cases
would any of the suggestions from normal file and directory completion
be appropriate.

Issue #2: Suggesting patterns that lead to warnings is unfriendly.

If the user specifies any regular file and omits the leading '/', then
the sparse-checkout command will warn the user that their command is
problematic and suggest they use a leading slash instead.

Issue #3: Completion gets confused by leading '/', and provides wrong paths.

Users often want to anchor their patterns to the toplevel of the
repository, especially when listing individual files.  There are a
number of reasons for this, but notably even sparse-checkout encourages
them to do so (as noted above).  However, if users do so (via adding a
leading '/' to their pattern), then bash completion will interpret the
leading slash not as a request for a path at the toplevel of the
repository, but as a request for a path at the root of the filesytem.
That means at best that completion cannot help with such paths, and if
it does find any completions, they are almost guaranteed to be wrong.

Issue #4: Suggesting invalid patterns from subdirectories is unfriendly.

There is no per-directory equivalent to .gitignore with
sparse-checkouts.  There is only a single worktree-global
$GIT_DIR/info/sparse-checkout file.  As such, paths to files must be
specified relative to the toplevel of a repository.  Providing
suggestions of paths that are relative to the current working directory,
as bash completion defaults to, is wrong when the current working
directory is not the worktree toplevel directory.

Issue #5: Paths with special characters will be interpreted incorrectly

The entries in the sparse-checkout file are patterns, not paths.  While
most paths also qualify as patterns (though even in such cases it would
be better for users to not use them directly but prefix them with a
leading '/'), there are a variety of special characters that would need
special escaping beyond the normal shell escaping: '*', '?', '\', '[',
']', and any leading '#' or '!'.  If completion suggests any such paths,
users will likely expect them to be treated as an exact path rather than
as a pattern that might match some number of files other than 1.

However, despite the first four issues, we can note that _if_ users are
using tab completion, then they are probably trying to specify a path in
the index.  As such, we transform their argument into a top-level-rooted
pattern that matches such a file.  For example, if they type:
   git sparse-checkout add Make<TAB>
we could "complete" to
   git sparse-checkout add /Makefile
or, if they ran from the Documentation/technical/ subdirectory:
   git sparse-checkout add m<TAB>
we could "complete" it to:
   git sparse-checkout add /Documentation/technical/multi-pack-index.txt
Note in both cases I use "complete" in quotes, because we actually add
characters both before and after the argument in question, so we are
kind of abusing "bash completions" to be "bash completions AND
beginnings".

The fifth issue is a bit stickier, especially when you consider that we
not only need to deal with escaping issues because of special meanings
of patterns in sparse-checkout & gitignore files, but also that we need
to consider escaping issues due to ls-files needing to sometimes quote
or escape characters, and because the shell needs to escape some
characters.  The multiple interacting forms of escaping could get ugly;
this patch makes no attempt to do so and simply documents that we
decided to not deal with those corner cases for now but at least get the
common cases right.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Apr 4, 2024
The t5309 script triggers a racy false positive with SANITIZE=leak on a
multi-core system. Running with "--stress --run=6" usually fails within
10 seconds or so for me, complaining with something like:

    + git index-pack --fix-thin --stdin
    fatal: REF_DELTA at offset 46 already resolved (duplicate base 01d7713666f4de822776c7622c10f1b07de280dc?)

    =================================================================
    ==3904583==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 32 byte(s) in 1 object(s) allocated from:
        #0 0x7fa790d01986 in __interceptor_realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
        #1 0x7fa790add769 in __pthread_getattr_np nptl/pthread_getattr_np.c:180
        #2 0x7fa790d117c5 in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150
        #3 0x7fa790d11957 in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:598
        #4 0x7fa790d03fe8 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:51
        #5 0x7fa790d013fd in __lsan_thread_start_func ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:440
        #6 0x7fa790adc3eb in start_thread nptl/pthread_create.c:444
        #7 0x7fa790b5ca5b in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

    SUMMARY: LeakSanitizer: 32 byte(s) leaked in 1 allocation(s).
    Aborted

What happens is this:

  0. We construct a bogus pack with a duplicate object in it and trigger
     index-pack.

  1. We spawn a bunch of worker threads to resolve deltas (on my system
     it is 16 threads).

  2. One of the threads sees the duplicate object and bails by calling
     exit(), taking down all of the threads. This is expected and is the
     point of the test.

  3. At the time exit() is called, we may still be spawning threads from
     the main process via pthread_create(). LSan hooks thread creation
     to update its book-keeping; it has to know where each thread's
     stack is (so it can find entry points for reachable memory). So it
     calls pthread_getattr_np() to get information about the new thread.
     That may allocate memory that must be freed with a matching call to
     pthread_attr_destroy(). Probably LSan does that immediately, but
     if you're unlucky enough, the exit() will happen while it's between
     those two calls, and the allocated pthread_attr_t appears as a
     leak.

This isn't a real leak. It's not even in our code, but rather in the
LSan instrumentation code. So we could just ignore it. But the false
positive can cause people to waste time tracking it down.

It's possibly something that LSan could protect against (e.g., cover the
getattr/destroy pair with a mutex, and then in the final post-exit()
check for leaks try to take the same mutex). But I don't know enough
about LSan to say if that's a reasonable approach or not (or if my
analysis is even completely correct).

In the meantime, it's pretty easy to avoid the race by making creation
of the worker threads "atomic". That is, we'll spawn all of them before
letting any of them start to work. That's easy to do because we already
have a work_lock() mutex for handing out that work. If the main process
takes it, then all of the threads will immediately block until we've
finished spawning and released it.

This shouldn't make any practical difference for non-LSan runs. The
thread spawning is quick, and could happen before any worker thread gets
scheduled anyway.

Probably other spots that use threads are subject to the same issues.
But since we have to manually insert locking (and since this really is
kind of a hack), let's not bother with them unless somebody experiences
a similar racy false-positive in practice.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Jul 3, 2024
When performing multi-pack reuse, reuse_partial_packfile_from_bitmap()
is responsible for generating an array of bitmapped_pack structs from
which to perform reuse.

In the multi-pack case, we loop over the MIDXs packs and copy the result
of calling `nth_bitmapped_pack()` to construct the list of reusable
paths.

But we may also want to do pack-reuse over a single pack, either because
we only had one pack to perform reuse over (in the case of single-pack
bitmaps), or because we explicitly asked to do single pack reuse even
with a MIDX[^1].

When this is the case, the array we generate of reusable packs contains
only a single element, which is either (a) the pack attached to the
single-pack bitmap, or (b) the MIDX's preferred pack.

In 795006f (pack-bitmap: gracefully handle missing BTMP chunks,
2024-04-15), we refactored the reuse_partial_packfile_from_bitmap()
function and stopped assigning the pack_int_id field when reusing only
the MIDX's preferred pack. This results in an uninitialized read down in
try_partial_reuse() like so:

    ==7474==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x55c5cd191dde in try_partial_reuse pack-bitmap.c:1887:8
    #1 0x55c5cd191dde in reuse_partial_packfile_from_bitmap_1 pack-bitmap.c:2001:8
    #2 0x55c5cd191dde in reuse_partial_packfile_from_bitmap pack-bitmap.c:2105:3
    #3 0x55c5cce0bd0e in get_object_list_from_bitmap builtin/pack-objects.c:4043:3
    #4 0x55c5cce0bd0e in get_object_list builtin/pack-objects.c:4156:27
    #5 0x55c5cce0bd0e in cmd_pack_objects builtin/pack-objects.c:4596:3
    #6 0x55c5ccc8fac8 in run_builtin git.c:474:11

which happens when try_partial_reuse() tries to call
midx_pair_to_pack_pos() when it tries to reject cross-pack deltas.

Avoid the uninitialized read by ensuring that the pack_int_id field is
set in the single-pack reuse case by setting it to either the MIDX
preferred pack's pack_int_id, or '-1', in the case of single-pack
bitmaps.  In the latter case, we never read the pack_int_id field, so
the choice of '-1' is intentional as a "garbage in, garbage out"
measure.

Guard against further regressions in this area by adding a test which
ensures that we do not throw out deltas from the preferred pack as
"cross-pack" due to an uninitialized pack_int_id.

[^1]: This can happen for a couple of reasons, either because the
  repository is configured with 'pack.allowPackReuse=(true|single)', or
  because the MIDX was generated prior to the introduction of the BTMP
  chunk, which contains information necessary to perform multi-pack
  reuse.

Reported-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
derrickstolee pushed a commit that referenced this pull request Jul 3, 2024
Memory sanitizer (msan) is detecting a use of an uninitialized variable
(`size`) in `read_attr_from_index`:

    ==2268==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x5651f3416504 in read_attr_from_index git/attr.c:868:11
    #1 0x5651f3415530 in read_attr git/attr.c
    #2 0x5651f3413d74 in bootstrap_attr_stack git/attr.c:968:6
    #3 0x5651f3413d74 in prepare_attr_stack git/attr.c:1004:2
    #4 0x5651f3413d74 in collect_some_attrs git/attr.c:1199:2
    #5 0x5651f3413144 in git_check_attr git/attr.c:1345:2
    #6 0x5651f34728da in convert_attrs git/convert.c:1320:2
    #7 0x5651f3473425 in would_convert_to_git_filter_fd git/convert.c:1373:2
    #8 0x5651f357a35e in index_fd git/object-file.c:2630:34
    #9 0x5651f357aa15 in index_path git/object-file.c:2657:7
    #10 0x5651f35db9d9 in add_to_index git/read-cache.c:766:7
    #11 0x5651f35dc170 in add_file_to_index git/read-cache.c:799:9
    #12 0x5651f321f9b2 in add_files git/builtin/add.c:346:7
    #13 0x5651f321f9b2 in cmd_add git/builtin/add.c:565:18
    #14 0x5651f321d327 in run_builtin git/git.c:474:11
    #15 0x5651f321bc9e in handle_builtin git/git.c:729:3
    #16 0x5651f321a792 in run_argv git/git.c:793:4
    #17 0x5651f321a792 in cmd_main git/git.c:928:19
    #18 0x5651f33dde1f in main git/common-main.c:62:11

The issue exists because `size` is an output parameter from
`read_blob_data_from_index`, but it's only modified if
`read_blob_data_from_index` returns non-NULL. The read of `size` when
calling `read_attr_from_buf` unconditionally may read from an
uninitialized value. `read_attr_from_buf` checks that `buf` is non-NULL
before reading from `size`, but by then it's already too late: the
uninitialized read will have happened already. Furthermore, there's no
guarantee that the compiler won't reorder things so that it checks
`size` before checking `!buf`.

Make the call to `read_attr_from_buf` conditional on `buf` being
non-NULL, ensuring that `size` is not read if it's never set.

Signed-off-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet