Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure robustness when killed in the middle #24

Open
joshtriplett opened this issue Sep 23, 2020 · 10 comments
Open

Ensure robustness when killed in the middle #24

joshtriplett opened this issue Sep 23, 2020 · 10 comments
Labels
acknowledged an issue is accepted as shortcoming to be fixed

Comments

@joshtriplett
Copy link
Sponsor Contributor

git is somewhat robust when killed in the middle. libgit2 seems somewhat less so. I'd love to see gitoxide be rock-solid on that front.

When doing things like clone, fetch, and similar, it's important to deal with network failure, ctrl-c, systems being powered off, and other sources of abrupt interruption. It's important that this never leave the repository in an inconsistent state (e.g. references to objects that don't exist, corrupted packs or indexes that git will choke on, and similar). This includes things like putting things in temporary locations and atomic-renaming them in place, or ensuring that objects are put in place before references to them.

@Byron
Copy link
Owner

Byron commented Sep 23, 2020

I absolutely agree, thanks for making that explicit.

How would you create multiple files atomically? This would be required to perfectly conclude a clone, which currently is done in a plumbing command by creating refs sequentially.

Would you go as far as to maintain an undo list to respond to interrupts properly?

@joshtriplett
Copy link
Sponsor Contributor Author

@Byron Catching interrupts won't handle kill -9, or a power failure, or similar.

You don't have to make entire operations atomic when that isn't possible. It's OK if an interrupted fetch leaves some refs created and others not; another fetch will clear that up. (Clones can be made atomic by doing them into a temporary directory and renaming that directory into place.) What must not happen is a ref getting created but the object it references not existing, or an incomplete pack existing where git looks for packs, or other cases where the repository is in an inconsistent state. There should always be an order you can perform operations in such that the repository remains consistent.

@Byron
Copy link
Owner

Byron commented Sep 24, 2020

I see, so rather than complicating things try to design operations to never corrupt the git repository, and ideally recover automatically next time the operation is run in case resources have been leaked, like some refs still being present after interruption.

In the same vein, operations should probably be hardened against races, too. For example, cloning into the same location will have one git process fail early as it races to 'creation of .git directory' only, instead of racing for moving the cloned repository into place. Interestingly, when kill -9ed, git probably would be unable to recover such a partial clone as it couldn't differentiate a race from a dead process. Thinking about it, since there is a special kind of temp file which is always cleaned up if a process is killed, that could possibly be used as marker to differentiate this case as well.

Generally, when seeing the .git repository as database, all best-practices should certainly be applied to prevent corruption and allow (auto) recovery.

@joshtriplett
Copy link
Sponsor Contributor Author

For example, cloning into the same location will have one git process fail early as it races to 'creation of .git directory' only, instead of racing for moving the cloned repository into place. Interestingly, when kill -9ed, git probably would be unable to recover such a partial clone as it couldn't differentiate a race from a dead process.

You could potentially handle that with file locks, which don't outlive the process holding them.

@Byron Byron added this to the 1.0 - stable release milestone Jun 13, 2021
@Byron
Copy link
Owner

Byron commented Jun 24, 2021

git-tempfile and git-lock will help for sure in keeping the repository consistent in cases that are not kill -9. It's still to be figured out how to respond to stray locks though, especially on the server side where doing so should probably be automated.

@kim
Copy link
Contributor

kim commented Jun 25, 2021

It is potentially interesting to note that git.git repacks refs when more than one is updated in a transaction (for —atomic support). Last time I checked, libgit2 doesn’t do that, making ref transactions not actually atomic.

@Byron
Copy link
Owner

Byron commented Jun 26, 2021

@kim And on top of that one has to write the reflog which isn't contained in the packed-refs file and consist of a file per ref. I will still have to see how exactly that is locked, but I am hopeful it's made in a way to not completely tank performance.

@kim
Copy link
Contributor

kim commented Jun 26, 2021

Oh my yes reflogs. I do think they’re a performance sink, which is why they are disabled by default for bare repos. I did, however, end up recently forcing creation on select refs and installing inotify watches for cache invalidation purposes (I cannot watch the refdb itself, because a maintenance pack refs would generate false remove events, and the volume of events would just be too high).

Guess you can see now why I keep crying “reftable” ;)

@Byron
Copy link
Owner

Byron commented Jun 26, 2021

Oh my yes reflogs. I do think they’re a performance sink, which is why they are disabled by default for bare repos. I did, however, end up recently forcing creation on select refs and installing inotify watches for cache invalidation purposes (I cannot watch the refdb itself, because a maintenance pack refs would generate false remove events, and the volume of events would just be too high).

Super interesting, thanks for sharing! I checked it against my current sketch for transactions and am glad this would naturally be supported. I also noted that in bare repos, no reflog is created otherwise which wasn't on my radar yet.

Guess you can see now why I keep crying “reftable” ;)

There is no way around it on the server for sure. Since it operates in blocks it's probably less likely to be overwhelmed by the amount of filesystem events that it produces. But whatever it's going to be there is no escape if there are too many events it's either the files backend or the reftable.

@kim
Copy link
Contributor

kim commented Jun 28, 2021

if there are too many events

I misspoke there, it's more related to the notify crate not allowing the set event masks for portability reasons, and also the inherent raciness of watching directories recursively.

@Byron Byron added the acknowledged an issue is accepted as shortcoming to be fixed label Nov 19, 2022
Byron added a commit that referenced this issue Oct 15, 2023
The new URL should trigger an overflow check but it only
happens when `url::Url::parse()` is called directly as our
code doesn't let it through anymore.

Here is the log from the fuzzer run as reported:
```
	[Environment] ASAN_OPTIONS=handle_abort=2
+----------------------------------------Release Build Stacktrace----------------------------------------+
Command: /mnt/scratch0/clusterfuzz/resources/platform/linux/unshare -c -n /mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_gitoxide_9a561c2a19701ceb3cded247e9ae8f349711bbca/revisions/gix-url-parse -rss_limit_mb=2560 -timeout=60 -runs=100 /mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases/f508abd59698de9914f2b8894cc135f55208e494873d456c6c19828509103805
Time ran: 0.12717413902282715
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 1001182178
INFO: Loaded 1 modules   (90683 inline 8-bit counters): 90683 [0x5d0c0597cce0, 0x5d0c05992f1b),
INFO: Loaded 1 PC tables (90683 PCs): 90683 [0x5d0c05992f20,0x5d0c05af52d0),
/mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_gitoxide_9a561c2a19701ceb3cded247e9ae8f349711bbca/revisions/gix-url-parse: Running 1 inputs 100 time(s) each.
Running: /mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases/f508abd59698de9914f2b8894cc135f55208e494873d456c6c19828509103805
thread '<unnamed>' panicked at /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/punycode.rs:272:17:
attempt to add with overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1200==ERROR: AddressSanitizer: ABRT on unknown address 0x0539000004b0 (pc 0x7c51742fb00b bp 0x7ffcd1eadd80 sp 0x7ffcd1eadaf0 T0)
    #0 0x7c51742fb00b in raise /build/glibc-SzIz7B/glibc-2.31/sysdeps/unix/sysv/linux/raise.c:51:1
    #1 0x7c51742da858 in abort /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:79:7
    #2 0x5d0c0572a1e6 in std::sys::unix::abort_internal::he854d2f74b119e66 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/sys/unix/mod.rs:375:14
    #3 0x5d0c0518cda6 in std::process::abort::h68c27a968dc7c74f /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/process.rs:2271:5
    #4 0x5d0c0564d3d4 in libfuzzer_sys::initialize::_$u7b$$u7b$closure$u7d$$u7d$::h1e76e422e0c48db0 /rust/registry/src/index.crates.io-6f17d22bba15001f/libfuzzer-sys-0.4.3/src/lib.rs:57:9
    #5 0x5d0c0571dcf7 in _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..Fn$LT$Args$GT$$GT$::call::h0c028c5af3475e03 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/alloc/src/boxed.rs:2021:9
    #6 0x5d0c0571dcf7 in std::panicking::rust_panic_with_hook::hd26c5407fbf20d71 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/panicking.rs:735:13
    #7 0x5d0c0571da05 in std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h944e23ea90982f5a /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/panicking.rs:601:13
    #8 0x5d0c0571aee5 in std::sys_common::backtrace::__rust_end_short_backtrace::h8a3632d339dd3313 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/sys_common/backtrace.rs:170:18
    #9 0x5d0c0571d781 in rust_begin_unwind /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/panicking.rs:597:5
    #10 0x5d0c05190634 in core::panicking::panic_fmt::h85c36fc727234039 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/core/src/panicking.rs:72:14
    #11 0x5d0c051906d2 in core::panicking::panic::h6a47ed7881a36f4d /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/core/src/panicking.rs:127:5
    #12 0x5d0c053e09c6 in idna::punycode::encode_into::hd674630fb161bf5b /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/punycode.rs:0
    #13 0x5d0c053eacbc in idna::uts46::Idna::to_ascii_inner::h69c52eb69ae48276 /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/uts46.rs:469:34
    #14 0x5d0c053eb793 in idna::uts46::Idna::to_ascii::h76237795045112f3 /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/uts46.rs:481:26
    #15 0x5d0c053eda7a in idna::uts46::Config::to_ascii::h423c722ab2fa9813 /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/uts46.rs:572:9
    #16 0x5d0c053f0070 in idna::domain_to_ascii::h93e94e995d03e9ef /rust/registry/src/index.crates.io-6f17d22bba15001f/idna-0.4.0/src/lib.rs:64:5
    #17 0x5d0c0530374a in url::host::Host::domain_to_ascii::h6cb1ae8fe42a1e42 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/host.rs:166:9
    #18 0x5d0c0530374a in url::host::Host::parse::h962d3990e0ff5091 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/host.rs:86:22
    #19 0x5d0c0532e87d in url::parser::Parser::parse_host::h89faea9182ce2512 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/parser.rs:1024:20
    #20 0x5d0c0532ba8d in url::parser::Parser::parse_host_and_port::heb44bd7ebd2593f6 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/parser.rs:944:33
    #21 0x5d0c0532896f in url::parser::Parser::after_double_slash::hbb313f562f0978a2 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/parser.rs:843:13
    #22 0x5d0c0531a129 in url::parser::Parser::parse_with_scheme::h54a417e4650ea024 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/parser.rs:453:17
    #23 0x5d0c05317824 in url::parser::Parser::parse_url::hfa6b21c53cd0ac1c /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/parser.rs:366:20
    #24 0x5d0c05350ba0 in url::ParseOptions::parse::hb8b3309b3b920457 /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/lib.rs:257:9
    #25 0x5d0c052b1ffc in url::Url::parse::h82a965c69df59bba /rust/registry/src/index.crates.io-6f17d22bba15001f/url-2.4.1/src/lib.rs:292:9
    #26 0x5d0c052b1ffc in gix_url::parse::input_to_utf8_and_url::h14b70a32a8884316 gitoxide/gix-url/src/parse.rs:252:5
    #27 0x5d0c052a9b9d in gix_url::parse::url::h6f7f7b0bddf4b8d7 gitoxide/gix-url/src/parse.rs:99:24
    #28 0x5d0c052b34cc in gix_url::parse::hfaab74909f01c9cc gitoxide/gix-url/src/lib.rs:38:46
    #29 0x5d0c05270b4a in rust_fuzzer_test_input gitoxide/gix-url/fuzz/fuzz_targets/parse.rs:5:14
    #30 0x5d0c0564d537 in __rust_try libfuzzer_sys.f28e88650cadb2d4-cgu.0:0
    #31 0x5d0c0564c79f in std::panicking::try::h90783eeef7e35925 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/panicking.rs:468:19
    #32 0x5d0c0564c79f in std::panic::catch_unwind::h041b281a0e92d580 /rustc/e20cb7702117f1ad8127a16406ba9edd230c4f65/library/std/src/panic.rs:142:14
    #33 0x5d0c0564c79f in LLVMFuzzerTestOneInput /rust/registry/src/index.crates.io-6f17d22bba15001f/libfuzzer-sys-0.4.3/src/lib.rs:28:22
    #34 0x5d0c0566bb83 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
    #35 0x5d0c056572e2 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
    #36 0x5d0c0565cb8c in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9
    #37 0x5d0c056860c2 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
    #38 0x7c51742dc082 in __libc_start_main /build/glibc-SzIz7B/glibc-2.31/csu/libc-start.c:308:16
    #39 0x5d0c05191c4d in _start
```
`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged an issue is accepted as shortcoming to be fixed
Projects
None yet
Development

No branches or pull requests

3 participants