New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keg_relocate: cache files rewritten during brew bottle #1253

Merged
merged 7 commits into from Oct 24, 2016

Conversation

Projects
None yet
7 participants
@jawshooah
Copy link
Contributor

jawshooah commented Oct 10, 2016

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew tests with your changes locally?

brew bottle replaces instances of the Homebrew prefix, cellar, and repository with placeholders in all text files. Cache these files in INSTALL_RECEIPT.json so that we don't have to check every single text file for placeholders on install.

Also add a couple of other optimizations to speed up building bottles.

Fixes #1250.

@jawshooah

This comment has been minimized.

Copy link
Contributor

jawshooah commented Oct 10, 2016

@zmwangx This cuts the install time for mysql on my machine down from ~100s to ~10s.

@@ -777,13 +777,18 @@ def pour
end

keg = Keg.new(formula.prefix)
tab_file = formula.prefix.join(Tab::FILENAME)
# Skip the cache since the receipt will be rewritten

This comment has been minimized.

@MikeMcQuaid

MikeMcQuaid Oct 10, 2016

Member

Not sure I understand this comment, can you elaborate a bit?

This comment has been minimized.

@jawshooah

jawshooah Oct 10, 2016

Contributor

Sure. Tab.from_file and Tab.for_keg both cache the resulting Tab instance so that when either is called with the same path, the same instance is returned. Tab.from_file_content bypasses the cache. This is desirable here because keg.relocate_text_files will rewrite the tab file, and we want to load in the rewritten file before modifying it and writing it out again a few lines down.

This comment has been minimized.

@MikeMcQuaid

MikeMcQuaid Oct 10, 2016

Member

I think it would be clearer to nuke the cache before rereading rather than use a method that's not using the cache (and should probably be private as a result).

def relocate_text_files(old_prefix, new_prefix, old_cellar, new_cellar,
old_repository, new_repository)
files = text_files | libtool_files
def relocate_text_files(old_prefix, new_prefix, old_cellar, new_cellar, # rubocop:disable Metrics/ParameterLists

This comment has been minimized.

@MikeMcQuaid

MikeMcQuaid Oct 10, 2016

Member

Maybe worth passing in a hash instead of disabling this cop.

This comment has been minimized.

@jawshooah

jawshooah Oct 10, 2016

Contributor

Agreed, I'll clean this up.

This comment has been minimized.

@MikeMcQuaid
@@ -75,8 +78,8 @@ def text_files
path.find do |pn|
next if pn.symlink? || pn.directory?
next if Metafiles::EXTENSIONS.include? pn.extname
if Utils.popen_read("/usr/bin/file", "--brief", pn).include?("text") ||
pn.text_executable?
if pn.text_executable? ||

This comment has been minimized.

@MikeMcQuaid

MikeMcQuaid Oct 10, 2016

Member

Good idea on flipping these 👍

class Metafiles
# https://github.com/github/markup#markups
EXTENSIONS = %w[
.adoc .asc .asciidoc .creole .html .markdown .md .mdown .mediawiki .mkdn
.org .pod .rdoc .rst .rtf .textile .txt .wiki
].freeze
].to_set.freeze

This comment has been minimized.

@MikeMcQuaid

MikeMcQuaid Oct 10, 2016

Member

If this is a frozen array I'm not sure I see the advantages of making it a set?

This comment has been minimized.

@jawshooah

jawshooah Oct 10, 2016

Contributor

O(1) lookup versus O(n). If order doesn't matter and there are no duplicates, why use an array when you could use a set?

This comment has been minimized.

@jawshooah

jawshooah Oct 10, 2016

Contributor

Keep in mind that Keg#text_files queries this collection once for every file, so bringing the complexity of that search down is worthwhile if we care about brew bottle runtime.

This comment has been minimized.

@MikeMcQuaid

MikeMcQuaid Oct 10, 2016

Member

The algorithmic speedup is always nice but given Sets aren't strictly interchangeable with arrays I figure it's usually worth using arrays unless profiling shows using sets produces a measurable speedup. Based on your previous comments it sounds like it does so 👍

@@ -205,6 +206,7 @@ def bottle_formula(f)
tab.poured_from_bottle = false
tab.HEAD = nil
tab.time = nil
tab.changed_files = changed_files

This comment has been minimized.

@MikeMcQuaid

MikeMcQuaid Oct 10, 2016

Member

My only concern with putting it in the tab is that this may be a very long list. How many values are there for e.g. mysql?

This comment has been minimized.

@jawshooah

jawshooah Oct 10, 2016

Contributor

Here's my INSTALL_RECEIPT.json for mysql:

{
  "used_options": [],
  "unused_options": [
    "--with-test",
    "--with-embedded",
    "--with-archive-storage-engine",
    "--with-blackhole-storage-engine",
    "--with-local-infile",
    "--with-debug"
  ],
  "built_as_bottle": true,
  "poured_from_bottle": true,
  "changed_files": [
    "INSTALL_RECEIPT.json",
    "bin/mysql_config",
    "bin/mysqld_multi",
    "bin/mysqld_safe",
    "homebrew.mxcl.mysql.plist",
    "include/mysql/my_config.h",
    "lib/pkgconfig/mysqlclient.pc",
    "mysql-test/Makefile",
    "mysql-test/suite/ndb/include/have_java.inc",
    "share/doc/mysql/INFO_BIN",
    "share/man/man1/myisam_ftdump.1",
    "share/man/man1/mysql.server.1",
    "share/man/man1/mysql_config.1",
    "share/man/man1/mysql_install_db.1",
    "share/man/man1/mysqld_multi.1",
    "share/man/man1/mysqld_safe.1",
    "share/man/man1/mysqldumpslow.1",
    "support-files/mysql-log-rotate",
    "support-files/mysql.server",
    "support-files/mysqld_multi.server"
  ],
  "time": 1476101424,
  "source_modified_time": 1472126415,
  "HEAD": null,
  "stdlib": "libcxx",
  "compiler": "clang",
  "runtime_dependencies": [],
  "source": {
    "path": "/usr/local/Library/Taps/homebrew/homebrew-core/Formula/mysql.rb",
    "tap": "homebrew/core",
    "spec": "stable",
    "versions": {
      "stable": "5.7.15",
      "devel": null,
      "head": null,
      "version_scheme": 0
    }
  }
}

Not too bad, I think. Would you rather output to a separate file?

This comment has been minimized.

@MikeMcQuaid

MikeMcQuaid Oct 10, 2016

Member

Nah that seems fine, thanks 👍

@MikeMcQuaid

This comment has been minimized.

Copy link
Member

MikeMcQuaid commented Oct 10, 2016

Loving the work here so far, great job @jawshooah!

if Utils.popen_read("/usr/bin/file", "--brief", pn).include?("text") ||
pn.text_executable?
if pn.text_executable? ||
Utils.popen_read("/usr/bin/file", "--brief", pn).include?("text")

This comment has been minimized.

@reitermarkus

reitermarkus Oct 10, 2016

Member

Could it be faster running this only once for all files?

files = path.find.reject { |pn| 
  pn.symlink? || pn.directory || Metafiles::EXTENSIONS.include?(pn.extname)
}

text_file_indices = Utils.popen_read("/usr/bin/file", "--no-dereference", "--brief", *files)
                         .lines
                         .map.with_index { |line, i| line.include?("text") ? i : nil }
                         .compact

text_files = files.values_at(*text_file_indices)

This comment has been minimized.

@MikeMcQuaid

MikeMcQuaid Oct 10, 2016

Member

@reitermarkus Some local testing looks like it should be, good catch. There's probably a maximum number of arguments we can specify but batching them may well produce a speedup 👍

This comment has been minimized.

@reitermarkus

reitermarkus Oct 10, 2016

Member

I tried to get rid of

once before by using getconf ARG_MAX, but somehow got sidetracked. 😄

@jawshooah jawshooah referenced this pull request Oct 10, 2016

Merged

keg_relocate: run /usr/bin/file in batches #1258

4 of 5 tasks complete

@ilovezfs ilovezfs added cask and removed cask labels Oct 11, 2016

@jawshooah

This comment has been minimized.

Copy link
Contributor

jawshooah commented Oct 14, 2016

@MikeMcQuaid are you still interested in this now that #1258 and friends have been merged? It still offers a significant boost in install time, ~4x from my rough testing.

@MikeMcQuaid

This comment has been minimized.

Copy link
Member

MikeMcQuaid commented Oct 15, 2016

It still offers a significant boost in install time, ~4x from my rough testing.

This was going to be my next question! In that case: yes, definitely! Please also check a bottle with/without this data can still be poured/relocated correctly both without/with this change (i.e. that all four combinations work fine).

Great work here!

@MikeMcQuaid

This comment has been minimized.

Copy link
Member

MikeMcQuaid commented Oct 23, 2016

Any news here @jawshooah? Shout if I can help at all.

old_repository, new_repository)
files = text_files | libtool_files
def replace_locations_with_placeholders
relocate_dynamic_linkage(HOMEBREW_PREFIX.to_s, PREFIX_PLACEHOLDER,

This comment has been minimized.

@MikeMcQuaid

MikeMcQuaid Oct 23, 2016

Member

Tiny nit: might want to make the two prefix args a key/value of a hash or something? Doesn't really matter but I always get a little 😨 about things that rely on arg ordering like this.

This comment has been minimized.

@MikeMcQuaid

MikeMcQuaid Oct 23, 2016

Member

In fact: like you did below with replacements 😆

This comment has been minimized.

@jawshooah

jawshooah Oct 23, 2016

Contributor

Will do!

This comment has been minimized.

@jawshooah

jawshooah Oct 24, 2016

Contributor

How about a Struct? f798ba4

@jawshooah

This comment has been minimized.

Copy link
Contributor

jawshooah commented Oct 23, 2016

@MikeMcQuaid commented on Oct 23, 2016, 4:34 AM EDT:

Any news here @jawshooah? Shout if I can help at all.

Just need to get around to re-verifying that everything works in the scenarios you mentioned. Nothing blocking me but time 😄

jawshooah added some commits Oct 9, 2016

keg_relocate: cache files rewritten during brew bottle
`brew bottle` replaces instances of the Homebrew prefix, cellar, and
repository with placeholders in all text files. Cache these files in
INSTALL_RECEIPT.json so that we don't have to check every single text
file for placeholders on install.
keg_relocate: refactor relocate_text_files
Replace relocate_text_files with three methods that clarify intent:
replace_locations_with_placeholders, replace_placeholders_with_locations
and replace_text_in_files, the first two calling the third.
@jawshooah

This comment has been minimized.

Copy link
Contributor

jawshooah commented Oct 24, 2016

@MikeMcQuaid I've verified that there are no problems installing with/without the data and with/without the change. Merging now as CI is 💚 and you've already approved the changes.

@jawshooah jawshooah merged commit 79e8cdd into Homebrew:master Oct 24, 2016

4 checks passed

codecov/patch 82.92% of diff hit (target 60.78%)
Details
codecov/project 60.78% (+<.01%) compared to e6bce5e
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
default Build finished.
Details

@jawshooah jawshooah deleted the jawshooah:perf/relocate-text-files branch Oct 24, 2016

@scpeters

This comment has been minimized.

Copy link
Contributor

scpeters commented Oct 25, 2016

this may have broken the mercurial bottle for me, still investigating...

@scpeters

This comment has been minimized.

Copy link
Contributor

scpeters commented Oct 25, 2016

$ head -1 /usr/local/opt/mercurial/bin/hg
#!@@HOMEBREW_PREFIX@@/opt/python/bin/python2.7
@scpeters

This comment has been minimized.

Copy link
Contributor

scpeters commented Oct 25, 2016

I reverted to e6bce5e, and the problem went away, so I think it's an issue with this PR.

Here's what I see when I install mercurial:

brew install mercurial
==> Downloading https://homebrew.bintray.com/bottles/mercurial-3.9.2.yosemite.bottle.tar.gz
Already downloaded: /Users/jenkins/Library/Caches/Homebrew/mercurial-3.9.2.yosemite.bottle.tar.gz
==> Pouring mercurial-3.9.2.yosemite.bottle.tar.gz
Error: No such file or directory - /usr/local/opt/mercurial/bin/hg

There is a file at that location though:

$ head /usr/local/opt/mercurial/bin/hg
#!@@HOMEBREW_PREFIX@@/opt/python/bin/python2.7
#
# mercurial - scalable distributed SCM
#
# Copyright 2005-2007 Matt Mackall <mpm@selenic.com>
#
# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.

import os

@scpeters scpeters referenced this pull request Oct 25, 2016

Closed

Major regression in #1253 #1372

3 of 3 tasks complete
@zmwangx

This comment has been minimized.

Copy link
Contributor

zmwangx commented Oct 25, 2016

There's a bug here with deleting pyc files in the wrong order. Causing a failure: https://bot.brew.sh/job/Homebrew%20Core/10080/version=el_capitan/testReport/junit/brew-test-bot/el_capitan/bottle_buku/. Fix in #1375.

ilovezfs added a commit that referenced this pull request Oct 25, 2016

Merge pull request #1376 from jawshooah/fix-relocate-regression
formula_installer: fix regression in #1253

aw1621107 added a commit to aw1621107/brew that referenced this pull request Oct 27, 2016

formula_installer: fix regression in Homebrew#1253
Apparently `cellar :any_skip_relocation` doesn't actually mean we
can skip relocation, at least for text files.

@Homebrew Homebrew locked and limited conversation to collaborators May 3, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.