Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop using libgit2 #2679

Open
StefanKarpinski opened this issue Jul 30, 2021 · 15 comments
Open

stop using libgit2 #2679

StefanKarpinski opened this issue Jul 30, 2021 · 15 comments

Comments

@StefanKarpinski
Copy link
Sponsor Member

Using libgit2 has not been a walk in the park. The original motivation for using it is was that Pkg2 worked with git repos for all package installs and did a lot of small git operations which were quite slow when using CLI git to spawn an external process for each operation (especially on Windows). So at the time using libgit2 seemed like a git win. However, libgit2 has a complex sprawling API that's hard to wrap and use. We've mostly struggled through that. But there are still problems: it's very common for people to find themselves in situations where CLI git can do something fine but Pkg can't because libgit2 can't — this is typically an issue with authentication, but not always. Some example problem issue for libgit2: #1247, #911, #2329, #2485 — there are more. Each of these could be fixed, but still, it's a drag that libgit2 so often doesn't work when git just works.

Now we face a seemingly insurmountable security problem with continuing to use libgit2: it depends on libssh2 for SSH functionality, which is unmaintained and has multiple open CVEs. Moreover, libgit2 has seemingly no intention of supporting a different SSH engine like libssh.

So the time has come to stop using libgit2+libssh2 entirely and switch back to only using an external git CLI tool. This issue is to discuss the strategy for doing that.

@StefanKarpinski
Copy link
Sponsor Member Author

High level, at this point we pretty much only use git for two things (that I can think of):

  1. To dev a package. For this we can use git clone instead.

  2. To install specific trees for ] add with a commit/tree hash, including in cases where we can't install via package server or github tarball download. For this we can make a bare clone of the repo and pipe the output of git archive to Tar.extract in order to install a specific tree hash the same way.

There's also the issue that some users might not have git installed. For the dev case it would be reasonable to instruct them to just install git themselves since without they're not going to be able to work with the dev'ed repo. For the add case, however, we may want something more automatic. One option would be to ship Julia with git but on Windows that's pretty heavyweight since it includes a whole UNIX emulation system. But maybe we could, instead of shipping with it, ask the user if they want to install it via artifacts in the relatively rare case that they (a) can't install using the pkg server, (b) can't install via GitHub/GitLab tarball downloads and (c) don't already have a git binary somewhere in their path. I.e. in that situation prompt the user if they want to automatically install and use git_jll or give up and if they say yes, install git_jll for them and use it. Or we could just install and use it.

@KristofferC
Copy link
Sponsor Member

KristofferC commented Jul 30, 2021

Just swap the default in

use_cli_git() = get(ENV, "JULIA_PKG_USE_CLI_GIT", "") == "true"
? We can still use libgit2 for everything else that is just handling repos locally.

Out of curiosity, it would be interesting to see the Rust peoples view on libgit2 since I know they take security pretty seriously and also use it for Cargo.

@StefanKarpinski
Copy link
Sponsor Member Author

That would be an easy first cut of the change, but I'd rather not maintain two very different sets of code and I'd love to ditch LibGit2 as a dependency of Julia (although we may have to keep it around as a compatibility relic). If we can do everything we need to do with CLI git and we can either use the user's git or install our own via BinaryBuilder, then we can use one code base for everything and just decide which git to use.

@DilumAluthge
Copy link
Member

DilumAluthge commented Aug 4, 2021

I'm a big fan of removing libgit2 and only having a single set of code, which would use command-line Git.

I'd suggest a user API like this:

We look at an environment variable with a name like e.g. JULIA_PKG_USE_SYSTEM_GIT, and based on the value of that environment variable:

  1. If the environment variable is set to true or 1, we use the system Git.
  2. If the environment variable is set to false or 0, we automatically download Git_jll the first time that we need to do a Git operation.
  3. If the environment variable is set to something other than the above values, or if the environment variable is not set, then we try to auto-detect whether or not the system has Git installed. If the system has Git installed (e.g. if Sys.which("git") !== nothing), then we use the system Git, otherwise we automaticallydownload Git_jll the first time that we need to do a Git operation.

I think for most users, the "automatically download Git_jll the first time that we need to do a Git operation" approach will work best, because then we don't need to ship Git_jll with Julia. In fact, we have precedence in Julia/Pkg for "automatically download X the first time we need to do a Y operation". Specifically, we don't ship the General registry with Julia. Instead, we automatically download the General registry the first time we need to do a Pkg operation.

However, it would be nice to have some option (maybe an option in Make.user) to build Julia with Git_jll already included. That way, if you want to e.g. install Julia on a machine that doesn't have internet access and doesn't have a system Git, you can do so, and then you can Pkg.Registry.add local registries on that machine and Pkg.add local Git package repos on that machine without needing to worry about the internet access for downloading Git_jll. This is a use case that I use a lot.

@DilumAluthge
Copy link
Member

DilumAluthge commented Aug 4, 2021

Out of curiosity, it would be interesting to see the Rust peoples view on libgit2 since I know they take security pretty seriously and also use it for Cargo.

Yeah it would be good to hear what the Cargo people think about the libssh libssh2 security issues.

@DilumAluthge
Copy link
Member

But maybe we could, instead of shipping with it, ask the user if they want to install it via artifacts in the relatively rare case that they (a) can't install using the pkg server, (b) can't install via GitHub/GitLab tarball downloads and (c) don't already have a git binary somewhere in their path. I.e. in that situation prompt the user if they want to automatically install and use git_jll or give up and if they say yes, install git_jll for them and use it. Or we could just install and use it.

I would just automatically download and use Git_jll, instead of prompting the user.

@KristofferC
Copy link
Sponsor Member

Just to get a reference, what Cargo does is https://doc.rust-lang.org/cargo/appendix/git-authentication.html for HTTPS + SSH authentication, and for more complicated authentication you opt-in to use a git CLI (https://doc.rust-lang.org/cargo/reference/config.html#netgit-fetch-with-cli) which is also what we do right now.

Regarding completely swapping out LibGit2, that feels like a lot of extra churn for no real reason. Having to rewrite all the nice library API calls into parsing ugly text from CLI calls will be a maintenance burden and it also prevents using nice features like custom progress bars. Having the option to use the CLI git exactly at the point where libgit2 has issues (fancy authentication when downloading something) sounds good enough to me.

@StefanKarpinski
Copy link
Sponsor Member Author

Note that it's libssh2 that has outstanding security issues and isn't maintained; libssh is fine but isn't what libgit2 uses.

@mkitti
Copy link
Contributor

mkitti commented Jan 7, 2022

There is a PR to add libssh to libgit2: libgit2/libgit2#5253

@KristofferC
Copy link
Sponsor Member

I've been against this in the past because of the API being kind of nice when dealing with git repos internally and it also allows you to get a nice progress bar but the fact of the matter is that every time someone has a problem with git it seems that using the external git fixes it...

So I'm coming around that maybe we should try to get rid of libgit2. This would also allow to move LibGit2.jl from an stdlib to a normal package.

I think this can be done in multiple steps:

  1. Use git if it is found, otherwise fall back to libgit2.
  2. Use git if it is found, otherwise download git (e.g. Git_jll)
  3. Move over functions using LibGit2 to Git one by one.
  4. Stop depending on LibGit2.

@ericphanson
Copy link
Contributor

2. Use git if it is found, otherwise download git (e.g. Git_jll)

Just as a small practical matter, my understanding is that Git_jll directly doesn’t work well and one needs to use Git.jl, but that also doesn’t work well with private packages on MacOS: JuliaVersionControl/Git.jl#40

@giordano
Copy link
Contributor

giordano commented Feb 1, 2023

but that also doesn’t work well with private packages on MacOS: JuliaVersionControl/Git.jl#40

Side note, I attempted to work around the problem on macOS in JuliaPackaging/Yggdrasil#4987, but last time I tried it, it didn't seem to actually solve that problem.

@GunnarFarneback
Copy link
Contributor

The Git.jl issue on MacOS was resolved in JuliaVersionControl/Git.jl#45.

@StefanKarpinski
Copy link
Sponsor Member Author

Since this is a fair bit of work and in the mean time we have a broken default, how about instead of requiring JULIA_PKG_USE_CLI_GIT=true we could automatically use CLI git if it exists? That way libgit2 would only be a fallback for users who don't have CLI git. We could still allow opting out of CLI git with JULIA_PKG_USE_CLI_GIT=false.

@ethomson
Copy link

👋 libgit2 maintainer here. I found this issue linked from a different issue, and I’m obviously sad to see it. I certainly won’t begrudge any decisions that you make but I’d be very pleased to work with you to overcome any challenges that you have with libgit2.

Concretely, we agree that libssh2 is … not an ideal choice for many users. We recently added support to use the command line ssh which may be a good choice for many CLIs that use libgit2. But I’d love feedback on what we should improve, and if there’s ways that I can help you continue to use libgit2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants