Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gixp pack-receive] The first proper fetch to a bare repository #104

Open
4 of 7 tasks
Byron opened this issue Jun 13, 2021 · 6 comments
Open
4 of 7 tasks

[gixp pack-receive] The first proper fetch to a bare repository #104

Byron opened this issue Jun 13, 2021 · 6 comments
Labels
C-tracking-issue An issue to track to track the progress of multiple PRs or issues

Comments

@Byron
Copy link
Owner

Byron commented Jun 13, 2021

Do what's needed to fetch as good as git does (on a bare repository, one without a working tree). This particularly includes proper ref handling as well as safety in the light of concurrent repository access.

Tasks

  • fix gitoxide interrupt and signal handling
    • now it will work for CLIs and servers alike with fine-grained control and no global state (unless the application wants it)
  • git-tempfile (based on git
  • git-lock - a crate providing git-style lock files.
  • git-refs - write loose refs and handle the git reflog, temp files, lock files, packed-refs and namespaces
    • publish latest release (and everything else, too)
  • git-pack - assure packs are written safely, that is won't interfere with multi-packs or other pack writers writing the very same pack. Check how locking works.
  • gix clone
    • turn gixp pack-receive into gixp clone creating an empty repository (for lack of index handling/checkout) and cloning the first pack.
  • gix fetch
    • [git-ref] change edits after prepare() #181
    • A tool to fetch into an existing repository correctly, creating a new pack and writing refs using transactions (for now without hook execution)
    • investigate fetch negotiation and see how much work is truly needed there. If logic is involved, make it readily reusable in via git-protocol.
  • git-repository
    • Is there a way to bring transport/protocol related functionality to git-repository to greatly simplifying doing ref-listings and fetches?
Archive

Research

Research

Reflog Handling

  • entirely disabled in bare repos
  • forward iterators could be bstr::lines()
  • reverse-iterators could be bstr::SplitReverse with a VecDeque for refilling a read buffer from the end of a file with seeks.
  • line parsing is here
  • expiry is done by rewriting the entire file based on a filter, writing is literally here

Refs Writing

  • You can turn a symbolic ref into a peeled one (i.e. detach a HEAD) with transactions but you cannot turn it back into a symbolic one with that. All that happens directly and outside of transactions.
  • Writing symbolic references like HEAD splits the ref update transparently and across any amount of refs.
  • You cannot delete ref logs using REF_LOG_ONLY but they are deleted with the owning reference.
  • ref transactions
    • there is a transaction hook which gets all transaction data without flags, that is old and new oid and refname, along with the 'action' indicating what happened to the transaction.
    • probably it should be possible to introspect transactions as they are executing, but theoretically this can also happen outside of the method itself.
  • git file lock
    • it looks like they are creating a tempfile with a specified name for locks (exclusive and all using atomic FS ops) which can then potentially be written in the same moment. Definitely good for loose refs that don't exist.
  • loose refs writing intricately knows packed refs, which makes sense in order to keep them consistent.

File Locking

  • investigate tempfile to conclude that it's certainly great as reference but won't be exactly what git does. Let's see if it's needed after all to do it exactly like that. Git definitely sets up signal handlers to delete tempfiles so probably these will have to be threadsafe or interned objects.
  • If directories are involved, use raceproof file creation
  • lockfile.c holds the entire blocking implementation, including backoff. Looks like that's git-lock.

Reflogs

  • The file is read line by line and entries are handled on the fly using iterators, easiest to use bstr::lines() there.
  • reverse iterators use a buffer of 1024 bytes to seek lines backwards
  • parsing is here
  • for expiry the file is rewritten based on iteration
  • for new reflogs, these are appended (only)

Refs Writing

  • git file lock
    • cargo uses flock for comparison with different semantics.
    • fslock seems a bit newer and has a few tests
    • fs2 does not compile anymore and seems unmaintained for years now. Can do more than we need, too.
    • file-lock is posix only but uses fcntl under the hood.

Signal-Hook

  • The use of mutexes is unsafe as the current thread might be interrupted while holding the mutex. When trying to obtain a lock in the handler the thread will inevitably deadlock.
  • Memory allocation and deallocation is not allowed! So inside a handler we have to do what we do and call std::mem::forget to implement it correctly.

Done Tasks

  • prodash
    • replace usage of ctrlc that starts yet another thread with the signal-hook iterator to process pending events from time to time as part fo the ticker thread. Saves a thread and enables proper handler chaining.
  • git-features
    • Replace ctrlc usage with signal-hook (i.e. current atexit handler for interrupts)
    • don't use stdout in interrupt handler as it does use a mutex under the hood. Instead allow aborting after the second interrupt in case the application is not responding. It would be great to have a lock-free version of stderr though… .
    • Integrate 'git-tempfile' behind feature toggle to allow interrupt handlers to be tempfile handler aware and not interfere.
    • replace existing usage of git_features::interrupt::is_interrupted() with versions of it that are local to the method or function.
    • move git-features::interrupt into git-repository as this kind of utility is for application usage only. There the git-tempfile integration makes sense, too.
  • git-tempfile
    • registered tempfile support to allow deletion on exit (and other signals). Use dashmap as storage.
    • Make sure pid is recorded to assure forking works as expected.
    • docs
    • fix windows build
    • a test validating default handlers are installed
    • release
    • race-proof creation of directories leading to the tempfile
    • a way to use the above for actual tempfiles
    • race-proof deletion of empty directories that conflict with the filename
    • a way to use the above for actual tempfiles
    • differentiate between closed and writable tempfiles in the typesystem to make choice permanent
    • a way to not install any handlers so that git-repository interrupt can run the tempfile removal itself right before aborting.
    • Make with_mut less cumbersome to use by assuming the interrupt handler will indeed abort.
  • git-lock - a crate providing git-style lock files.
    • lock file for update
    • marker for holding a lock
    • exponential backoff
    • the above with randomization
    • actual retries with blocking sleep
    • test for the above
  • git-refs
    • sketch transaction type
    • figure out whether or not to 'extend' the API to include changes from Symbolic refs to peeled ones in transactions
    • git signature parsing code is shared and moved to git-actor
    • git-object uses git-actor
    • git-object: unify nom error handling everywhere (to reuse the nom error handling machinery instead of re-inventing it)
    • git-object can use verbose errors and () - unit errors per feature toggle.
    • parse ref log line
    • reflog forward iteration
    • reflog backward iteration
    • file reflog writing
    • git-tempfile close (Handler -> Handle)
    • git-lock File close and Marker persist
    • an API to access ref logs for a reference
    • create single symbolic ref without reflog
    • split refs and reusable edit preprocessing
    • delete refs with reflog handling
    • handle parent links for 'old' oid in the log of parent refs
    • handle parent links for error messages of reference names (for lock errors at least)
    • Figure out how to deal with 'previous-value' ambiguity with create-or-update modes.
    • git-lock commit() is recoverable
    • commit()'ing onto empty directories can delete the directory in git-ref
    • internal reflog writing or appending for locked refs
    • persisting lock file onto an empty directory deletes the empty directory and tries again
    • create or update refs with reflog handling
    • research different mmap implementation but ultimately stick to fast-and-simple filebuffer
    • packed-refs iteration - important for being able to read all refs during packfile negotiation
    • iter packed refs from separately loaded buffer
    • iter loose refs with prefix
    • packed-refs lookup with binary search (full-paths)
    • packed-refs lookup with binary search (partial-paths), following lookup rules
    • re-add perf test of sorts, see script to generate big pack file
      • ~6.2mio/s in iteration and 720k/s for lookups/finds using full paths
    • use binary search to find start point for packed prefix iteration
    • iterate all refs (including packed ones)
    • the above, with prefix filtering
    • find_one uses packed-refs if available (use appropriate strategy for reading in full or mapping)
    • remove and test remaining todos
    • packed-refs writing and integration with transaction (must be) - deletions have to be propagated, updates only go to refs (I think, check)
    • packed-refs - updates, assure it doesn't care about mismatches because that's the only thing we need #138
    • packed-refs - write-through - apply creates/updates to packed-refs and optionally delete the original refs (purge) #139
    • Reference::peel_to_id() should optionally peel tags as well to obtain the final object id #140
    • [git-ref] namespaces support #152
    • Make sure broken/invalid loose refs don't break ref iteration and have a way to find them
@Byron Byron added this to To do in Collaboration Board via automation Jun 13, 2021
@Byron Byron moved this from To do to In progress in Collaboration Board Jun 14, 2021
@Byron Byron moved this from In progress to To do in Collaboration Board Jun 15, 2021
@Byron Byron moved this from To do to In progress in Collaboration Board Jun 16, 2021
@Byron Byron moved this from In progress to To do in Collaboration Board Jun 21, 2021
@Byron Byron moved this from To do to In progress in Collaboration Board Jun 23, 2021
@Byron Byron moved this from In progress to To do in Collaboration Board Jun 23, 2021
@Byron Byron moved this from To do to In progress in Collaboration Board Jun 24, 2021
@Byron Byron moved this from In progress to To do in Collaboration Board Jun 24, 2021
@Byron Byron moved this from To do to In progress in Collaboration Board Jun 25, 2021
@Byron Byron moved this from In progress to To do in Collaboration Board Jun 26, 2021
@Byron Byron moved this from To do to In progress in Collaboration Board Jun 29, 2021
@Byron Byron moved this from In progress to To do in Collaboration Board Jun 29, 2021
@Byron Byron moved this from To do to In progress in Collaboration Board Jun 30, 2021
@Byron Byron moved this from In progress to To do in Collaboration Board Jul 1, 2021
@Byron Byron moved this from To do to In progress in Collaboration Board Jul 2, 2021
@Byron Byron moved this from In progress to To do in Collaboration Board Jul 3, 2021
@Byron Byron moved this from To do to In progress in Collaboration Board Jul 4, 2021
@Byron Byron moved this from In progress to To do in Collaboration Board Jul 11, 2021
@Byron Byron moved this from To do to In progress in Collaboration Board Jul 12, 2021
@Byron Byron moved this from In progress to To do in Collaboration Board Jul 18, 2021
@Byron Byron moved this from To do to In progress in Collaboration Board Jul 19, 2021
@Byron Byron moved this from In progress to To do in Collaboration Board Jul 20, 2021
@Byron Byron moved this from To do to In progress in Collaboration Board Jul 22, 2021
@Byron Byron moved this from In progress to To do in Collaboration Board Jul 28, 2021
@Byron Byron moved this from To do to In progress in Collaboration Board Jul 29, 2021
@Byron Byron moved this from In progress to To do in Collaboration Board Jul 31, 2021
@Byron Byron moved this from To do to In progress in Collaboration Board Aug 9, 2021
@Byron Byron moved this from In progress to To do in Collaboration Board Aug 9, 2021
@Byron Byron moved this from To do to In progress in Collaboration Board Aug 10, 2021
@Byron Byron moved this from In progress to To do in Collaboration Board Aug 10, 2021
@Byron Byron moved this from To do to In progress in Collaboration Board Aug 10, 2021
@Byron Byron moved this from In progress to To do in Collaboration Board Aug 11, 2021
@Nytelife26
Copy link

What's the status on gixp clone? I'm very much interested in helping out on that front.

@Byron
Copy link
Owner Author

Byron commented Sep 29, 2021

gixp clone as it's seen here would only clone bare repositories. The biggest requirement for achieving work tree checkouts is to implement git-index. Doing so requires a serious investment in time and great attention to detail. There may be smaller tasks on the way but ultimately, git-index is what's needed to clone a repository with work tree.

If this is outlook isn't too frightening for you, I'd be happy to get you involved in some capacity.

@Nytelife26
Copy link

I have never contributed to gitoxide so I'm not too familiar with it yet, but I learn things quickly - nothing frightens me :) so yes, I'm more than happy to try things out if you give me some pointers in the right direction.

@Byron
Copy link
Owner Author

Byron commented Sep 30, 2021

Have you had a chance to check out the backlog here? https://github.com/Byron/gitoxide/projects/1

A good way to get acquainted with gitoxide would probably be to use it by further oxidizing some crates that are using git2 ATM but could already use gitoxide. This would inevitably lead to some features being implemented or improved on on the way.

Speaking of feature, I think desperately needed is commit ancestor traversal sorted by commit time.

A way forward would be for you to find something you are comfortable to get started, then we could kick it off in a 1:1 even.

Just let me know.

PS: I connected to you on keybase, a way to reach out to me in a more realtime and private fashion, as needed.

@pwnorbitals
Copy link

@Nytelife26 @Byron Had the chance to get progress on this one ? :)

@Byron
Copy link
Owner Author

Byron commented Mar 21, 2022

All building blocks for a bare clone exist, they haven't been put into a cohesive package though.

A non-bare clone is in the works which will include the bare one by its very nature.

@Byron Byron added the C-tracking-issue An issue to track to track the progress of multiple PRs or issues label May 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue An issue to track to track the progress of multiple PRs or issues
Projects
Development

No branches or pull requests

3 participants