Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yarn Berry: run git lfs pull before running yarn #5833

Merged
merged 11 commits into from
Nov 1, 2022

Conversation

jtbandes
Copy link
Contributor

@jtbandes jtbandes commented Oct 5, 2022

cc @jurre @defunctzombie
Run git lfs pull --include .yarn before using yarn for dependency updating. Yarn plugins stored with LFS can now be used.

Resolves LFS issues described in #1297, see example Dependabot PR at foxglove/studio#4528

Tested with bin/dry-run.rb npm_and_yarn foxglove/studio --clone --dep rc-tree, which now updates yarn.lock instead of failing with an error.

Ran unit test with cd npm_and_yarn && bundle exec rspec spec/dependabot/npm_and_yarn/file_fetcher_spec.rb -e "with .yarn data stored in git-lfs"

@defunctzombie
Copy link
Contributor

Besides storing pre-built plugins using LFS. Another good use case for LFS with yarn projects is the offline cache: https://yarnpkg.com/features/offline-cache

@jurre
Copy link
Member

jurre commented Oct 5, 2022

Thanks for opening a PR so we have a place to discuss the approach.

I am curious if this setup is a common one, do you know of more projects following this approach of storing yarn assets in LFS?

@jtbandes
Copy link
Contributor Author

jtbandes commented Oct 5, 2022

I am curious if this setup is a common one, do you know of more projects following this approach of storing yarn assets in LFS?

You can check out renovatebot/renovate#6842 for some other people who are interested in LFS support for bumping dependencies.

We would probably want to inspect the yarn config in order to determine what path to pull, as the cache and plugin folders are configurable I believe

This is a good point. However, it seems like it would add a lot of complexity to this otherwise simple patch. Do you think it would make more sense to add a config option to dependabot.yml that allows the user to specify which paths need to be git lfs pulled? I would guess that for most yarn users, the default of pulling only .yarn would suffice. Would it be reasonable to only pull .yarn for now and wait to see if users actually request the behavior of pulling arbitrary custom paths?

@defunctzombie
Copy link
Contributor

Not having a cache does seem to allow yarn to continue unlike not having the plugins.

FWIW there are open issues on the yarn repo around installing plugins automatically: yarnpkg/berry#4464 but no PR has been accepted yet.

@jurre
Copy link
Member

jurre commented Oct 7, 2022

Do you think it would make more sense to add a config option to dependabot.yml that allows the user to specify which paths need to be git lfs pulled?

No we're really trying to keep the config file as simple as possible and avoid any ecosystem specific configuration, especially if like in this case the setting can be inferred from the repo.

We have a bit of code that checks the config file for the cache path already, so I think we could just run some similar code in the repo content path we just fetched, parse the yaml and check what the correct path to pull is?

YAML.load_file(".yarnrc.yml").fetch("cacheFolder", "./.yarn/cache")

Would it be reasonable to only pull .yarn for now and wait to see if users actually request the behavior of pulling arbitrary custom paths?

That doesn't sound unreasonable, but it would be nice if we can make it work out of the box?

Not having a cache does seem to allow yarn to continue unlike not having the plugins.

And plugins always go in .yarn/plugins? Not pulling the yarn cache correctly would cause some broken behavior I think though, because Dependabot now wouldn't add the right entry to the yarn cache.

Overall I think the approach here seems reasonable and I can't see it leading to any problems so I'd be inclined to accept this , it would be really nice if you could take a look at inferring the yarn cache path from the config file. Would also be great if we could add a test case for this somehow, but I'd have to dig into how those are set up for cloning currently

Going to run CI to ensure this doesn't break things unexpectedly

@jtbandes jtbandes requested a review from jurre October 7, 2022 16:52
@jtbandes
Copy link
Contributor Author

jtbandes commented Oct 11, 2022

@jurre any further thoughts on this change? I've implemented some of your feedback, please let me know if there are any more changes you'd like to see.

@jurre
Copy link
Member

jurre commented Oct 11, 2022

Hey, so first of all thanks for those changes, I think this should do the trick!

One last thing I would like to request is that we add some tests around this, here is an example of the existing tests we have around cloning:

context "with repo_contents_path" do
let(:repo_contents_path) { Dir.mktmpdir }
after { FileUtils.rm_rf(repo_contents_path) }
describe "#files" do
subject(:files) { file_fetcher_instance.files }
let(:contents) { "foo=1.0.0" }
# `git clone` against a file:// URL that is filled by the test
let(:repo_path) { Dir.mktmpdir }
after { FileUtils.rm_rf(repo_path) }
let(:fill_repo) { nil }
before do
Dir.chdir(repo_path) do
`git init .`
fill_repo
`git add .`
`git commit --allow-empty -m'fake clone source'`
end
allow(source).
to receive(:url).and_return("file://#{repo_path}")
allow(file_fetcher_instance).to receive(:commit).and_return("sha")
end

Let me know if you need any help setting that up!

@jtbandes
Copy link
Contributor Author

@jurre Yeah, some guidance around this would be helpful, since I am not familiar with rspec or this repo's test organization. For instance, are there any existing tests for file fetchers of specific package ecosystems? If I were to add this one would you imagine adding it to the base_spec.rb you linked, or making a new spec file specific to npm_and_yarn? Would you like to see a fixture that actually includes data stored in LFS or just a test that verifies that the git lfs pull command is run?

@jurre
Copy link
Member

jurre commented Oct 13, 2022

For instance, are there any existing tests for file fetchers of specific package ecosystems?

Yep, they all have ecosystem specific file fetcher specs, they'll be in <ecosystem>/spec/dependabot/<ecosystem>/file_fetcher_spec.rb. The go_mod ecosystem is somewhat relevant since it also uses a cloning strategy (vs using the GitHub API which we do for most ecosystems), but we don't perform any checks around this.

If I were to add this one would you imagine adding it to the base_spec.rb you linked, or making a new spec file specific to npm_and_yarn?

I'd add it to the npm_and_yarn one: https://github.com/dependabot/dependabot-core/blob/8d60d39e19bca9ba738e497fae31fbe0323386f3/npm_and_yarn/spec/dependabot/npm_and_yarn/file_fetcher_spec.rb

Would you like to see a fixture that actually includes data stored in LFS or just a test that verifies that the git lfs pull command is run?

I think the best way to go would be to set up a fixture that includes lfs data, and then verify that it is included in the fetched files

Here's a snippet of useful context on when the file fetcher uses the cloning strategy:

# Files are typically grabbed individually via the source's API.
# repo_contents_path is an optional empty directory that will be used
# to clone the entire source repository on first read.
#
# If provided, file _data_ will be loaded from the clone.
, so that means that when passing the FileFetcher a repo_contents_path, it'll clone the repo, otherwise use the API. In the case of Yarn Berry, we already make sure to always set that when actually running Dependabot, but for these tests you'll have to pass it a value (a tmpdir works best) manually

@jtbandes
Copy link
Contributor Author

jtbandes commented Oct 13, 2022

Thanks for the pointers, I will try it out tomorrow. Is it okay for the test to require network access? Otherwise I'm not sure if it will be possible to set up a repo that uses LFS but the files are hosted fully locally (probably possible, maybe not easy).

@jurre
Copy link
Member

jurre commented Oct 13, 2022

Is it okay for the test to require network access?

If we can avoid it then that'll be preferred, but if there's no way around it we do have some other tests accessing the network. Our existing cloning specs do use a local repo I believe, so hopefully we can repurpose that!

@jtbandes
Copy link
Contributor Author

jtbandes commented Oct 13, 2022

Since the whole point of LFS is that git is supposed to fetch the large files from a file server, making it work offline may require a custom adapter like this: https://github.com/sinbad/lfs-folderstore And setting this up requires some git config commands after cloning.

@jurre
Copy link
Member

jurre commented Oct 13, 2022

Ah I see, well, if it's not feasible to do it locally then we can pull from a remote. I'll move the repo into a dedicated org we have for these sort of fixtures once you have a working setup.

Thanks for working on this by the way! 🎉 🙇

@jtbandes
Copy link
Contributor Author

@jurre I've created a fixture repo at https://github.com/jtbandes/dependabot-yarn-lfs-fixture and added the test. Please critique! 😃

@jurre
Copy link
Member

jurre commented Oct 13, 2022

Looks good at a cursory glance! It's getting a bit late on my end so I'll give this a more thorough look tomorrow 🙇

@jtbandes
Copy link
Contributor Author

I've also initiated a transfer of the fixture repo to your account so you can move it to the appropriate place. Feel free to update the PR with any changes necessary.

@jtbandes
Copy link
Contributor Author

@jurre just a quick ping on this, let me know if there is anything else I can do. Thanks for your repeated reviews!

@jurre jurre requested a review from pavera October 18, 2022 08:04
@jtbandes
Copy link
Contributor Author

@jurre The new repo does not have the files properly tracked with Git LFS, so they're just text files: https://github.com/dependabot-fixtures/yarn-berry-lfs-fixture/blob/main/.yarn/releases/yarn-3.2.4.cjs (compare with https://github.com/jtbandes/dependabot-yarn-lfs-fixture/blob/main/.yarn/releases/yarn-3.2.4.cjs)

Luckily the test caught it 😃 I'll re-initiate the transfer so you can just use my repo.

@jtbandes
Copy link
Contributor Author

Looks like the rest of the test failures are some intermittent network issues.

@jurre jurre requested a review from a team October 27, 2022 13:54
end
path
end
rescue StandardError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to return the path and carry on for any error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What other options do you have in mind? I’m not sure what meaningful recovery could be done here…

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me first better understand, which errors are you anticipating?

Here are the ones I can think of:

  • Dependabot::SharedHelpers::HelperSubprocessFailed raised by run_shell_command in super or here
  • NoMethodError if .yarnrc.yml is empty, in which case YAML.load_file will return false, which does not respond to fetch
  • Psych::ParseError if the YAML in the .yarnrc.yml is invalid

If we encounter an exception outside of these, I think we'd want to let it fail? @jurre might know better 😅

Copy link
Member

@jurre jurre Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's reasonable to continue without lfs pull in those cases rather than fail the update entirely?

You do raise a few good error cases (the empty or invalid .yarnrc.yml for example) that we should probably handle?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related:

It took a bit to track down why this started failing on main since the error gets swallowed... that means users who are hitting this failure scenario won't be told why...

@@ -32,6 +32,26 @@ def self.required_files_message

private

# Overridden to pull any yarn data or plugins which may be stored with Git LFS.
def _clone_repo_contents(target_directory:)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least at one time in history, methods like this one in Dependabot::FileFetchers::Base were not meant to be overridden:

#################################################
# INTERNAL METHODS (not for use by sub-classes) #
#################################################

This would be the first instance of a file fetcher subclass overriding it. Could we override public method #clone_repo_contents instead?

@jurre
Copy link
Member

jurre commented Oct 28, 2022

I'll try to find a bit of time to make the suggestions @bdragon shared today and then hopefully we can get this merged. Thanks for your work and patience on this @jtbandes!

@jurre jurre force-pushed the yarn-git-lfs-pull branch 3 times, most recently from 2aaabad to 1808542 Compare October 28, 2022 11:46
@jurre jurre requested a review from bdragon October 28, 2022 11:46
@jurre jurre merged commit 868ef7d into dependabot:main Nov 1, 2022
@jurre
Copy link
Member

jurre commented Nov 1, 2022

Thanks, this is now live

@jtbandes jtbandes deleted the yarn-git-lfs-pull branch November 1, 2022 21:03
@jtbandes
Copy link
Contributor Author

jtbandes commented Nov 1, 2022

Thanks for all your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants