Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/shard: new command #15513

Closed

Conversation

razvanazamfirei
Copy link
Member

@razvanazamfirei razvanazamfirei commented Jun 4, 2023

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

This PR adds brew shard, a command used to shard repositories and, more importantly, audit that files are in the correct place.

Still needs tests written and making sure flags work as intended, but I appreciate any feedback.

Currently I'm using git mv for moving files in order to preserve history. It's a bit slow, but this should be noticeable only on the initial sharding.

@razvanazamfirei razvanazamfirei changed the title brew shard: new command cmd/shard: new command Jun 4, 2023
@razvanazamfirei razvanazamfirei added the features New features label Jun 4, 2023
Copy link
Member

@carlocab carlocab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you describe your intended sharding scheme/strategy here more explicitly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should be in dev-cmd?

Comment on lines +21 to +22
switch "--check",
description: "Audit the directory for sharding."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this just be done by default?

end

def move_file(source, destination)
success = system("git mv '#{source}' '#{destination}' 2>/dev/null")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably use system_command here

Library/Homebrew/sharder.rb Outdated Show resolved Hide resolved
Library/Homebrew/sharder.rb Outdated Show resolved Hide resolved
Library/Homebrew/sharder.rb Outdated Show resolved Hide resolved
Library/Homebrew/sharder.rb Outdated Show resolved Hide resolved
Library/Homebrew/sharder.rb Outdated Show resolved Hide resolved
Library/Homebrew/sharder.rb Outdated Show resolved Hide resolved
Library/Homebrew/sharder.rb Outdated Show resolved Hide resolved
@razvanazamfirei
Copy link
Member Author

@carlocab, at least for homebrew-casks the scheme would be:

#{first-letter}/#{first-letter}#{second-letter}

For sharding purposes, all non-alphabetical characters are replaced by _. This does create a case where 0-ad.rb ends up in _/__/0-ad.rb. If there's a better character, happy to change it.

Not a fan of using

"git mv '#{source}' '#{destination}' 2>/dev/null"

The rationale behind it was to try and preserve the file history. system_command is the fallback (and much faster).

I'll integrate the rest of the comments.

Co-authored-by: Carlo Cabrera <30379873+carlocab@users.noreply.github.com>
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @razvanazamfirei, it's much appreciated and I love the forward progress on sharding but I don't think it's quite the right approach for us for a few reasons:

  • I don't think we need either a user-focused or developer-focused command for something that will be run only once per tap
  • the auditing process should be moved into brew audit (or brew style properly but we should mostly do the right thing if they run brew create for something that's going to end up in the right tap
  • similarly, there may be a need to teach other Homebrew internals how to generate the right shard name and this sharding logic will likely vary per-tap and is only needed on two taps (homebrew-core and homebrew-cask)

I'll reply to a couple of other things in the issue thread too.

Thanks again ❤️

@MikeMcQuaid
Copy link
Member

@carlocab, at least for homebrew-casks the scheme would be:

#{first-letter}/#{first-letter}#{second-letter}

Let's have more discussion on the sharding scheme, please. As discussed in Slack: I don't think this sharding scheme makes sense.

I would like to avoid:

  • having more than one additional layer of directory hierarchy
  • having repeated information in the hierarchy
  • having a sharding algorithm that's not incredibly obvious to someone with no experience about where something should live
  • any sharding discussion that doesn't show the breakdown per-directory

For example, my proposed scheme for homebrew/core would be:

hash = {}

CoreTap.instance.formula_names.each do |formula_name|
  first_letter = formula_name[0]

  key = if formula_name.start_with?("lib")
    "lib"
  else
    first_letter
  end 

  hash[key] ||= []
  hash[key] << formula_name
end

hash.each_key {|key| puts "#{key}: #{h[key].size}" }

Which, for each key, gives you (at the time of calculation), this many formulae per directory:

a: 330
b: 244
c: 509
d: 303
e: 148
f: 249
g: 523
h: 158
i: 155
j: 130
k: 126
l: 216
lib: 487
m: 386
n: 201
o: 209
p: 454
q: 59
r: 232
s: 536
t: 311
u: 83
v: 126
w: 129
x: 104
y: 51
z: 75

For sharding purposes, all non-alphabetical characters are replaced by _. This does create a case where 0-ad.rb ends up in _/__/0-ad.rb. If there's a better character, happy to change it.

Don't want to be too harsh here but: _/__/ does seem particularly silly to me and suggests a better scheme e.g. 0/

git mv '#{source}' '#{destination}' 2>/dev/null

git mv doesn't preserve history any more than mv foo bar && git add . does, FYI. Git has rename detection that kicks in whenever a file is deleted and added that is sufficiently similar.

@razvanazamfirei razvanazamfirei marked this pull request as draft June 5, 2023 14:43
@carlocab
Copy link
Member

Just leaving a note here: brew formulae seems to be broken by sharding. If you have homebrew/test-bot tapped, you see homebrew/test-bot/t/testbottest in the output.

@github-actions github-actions bot added the outdated PR was locked due to age label Aug 6, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 6, 2023
@razvanazamfirei razvanazamfirei deleted the add-shard-cmd branch September 11, 2023 13:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
features New features outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants