Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #233 add append and exclude rules to assembly #309

Merged
merged 11 commits into from
Jun 1, 2018

Conversation

rockjam
Copy link
Collaborator

@rockjam rockjam commented May 2, 2018

This PR introduces ability to define rules for duplicated files in assembly. Currently, there are two rules: Append and Exclude

TODO:

  • Tests on append/exclude rules
  • Update docs about duplication rules

@rockjam
Copy link
Collaborator Author

rockjam commented May 2, 2018

Special thanks to @alexarchambault . Code mostly based on assembly code from coursier

@rockjam
Copy link
Collaborator Author

rockjam commented May 2, 2018

@mlerner Want to join the review?


object Assembly {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it right place to have this rules?
For people who want to define their own rules it will be:

import mill.modules.Assembly._
val myRules = Seq(Rule.Append("foo.conf"), Rule.Exclude("bar.txt"))

@rockjam
Copy link
Collaborator Author

rockjam commented May 2, 2018

I found on use-case that won't work in the current implementation. Since we make assembly in two steps: upstreamAssembly and then assembly itself, in assembly step we don't keep track on files that were already written in upstreamAssembly. So for instance, if reference.conf is present in module we are making assembly for, it will rewrite reference.conf written(and concatenated) in upstreamAssembly. I will later write test for this case and try to fix it

@rockjam rockjam force-pushed the assembly-rules branch 3 times, most recently from 310cf99 to 2ead219 Compare May 9, 2018 18:03
@rockjam rockjam force-pushed the assembly-rules branch 2 times, most recently from 2dfd893 to ab00a06 Compare May 13, 2018 23:19
@@ -108,6 +108,8 @@ trait JavaModule extends mill.Module with TaskModule { outer =>
}
}

def assemblyRules: T[Seq[Assembly.Rule]] = T { Assembly.defaultRules }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep it as Target, or we can go with plain def here? Cause if we can go with def, we can remove serialization code for Assembly.Rule

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any real preference. If we go with def, we could also include rules that are not serializable, e.g. containing lambdas. I leave that up to you

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make it def, instead of T, and the users change them while working on their build, will it reevaluate for sure? I tend to think that it will, because we invalidate caches on build change

@rockjam rockjam changed the title [WIP]: fix #233 add append and exclude rules to assembly fix #233 add append and exclude rules to assembly May 13, 2018
@rockjam
Copy link
Collaborator Author

rockjam commented May 13, 2018

PR is ready for review. I will add section to docs if the API for assembly rules is fine.

object core extends HelloWorldModuleWithMain
}

val akkaHttpDeps = Agg(ivy"com.typesafe.akka::akka-http:10.0.13")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need to break up the tests in this file :/ Maybe not in this PR, but if you could do that in a subsequent one that would be great

Copy link
Collaborator Author

@rockjam rockjam May 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with you

case Some(_:Assembly.Rule.Append) =>
append()
case Some(_:Assembly.Rule.Exclude) =>
ctx.log.info(s"Excluding entry: ${mapping}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can skip the logging here; I find the logging from SBT's assembly task more annoying than useful

@@ -257,20 +261,52 @@ object Jvm {
manifest.write(manifestOut)
manifestOut.close()

def isSignatureFile(mapping: String): Boolean =
Set("sf", "rsa", "dsa").exists(ext => mapping.toLowerCase.endsWith(s".$ext"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maintain this behavior in the new default assemblyRules? IIRC we added this because these files were causing problems for someone (there's an issue)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -295,25 +331,45 @@ object Jvm {
PathRef(output)
}

sealed trait AssemblyEntry {
def mapping: String
def is: InputStream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call this inputStream rather than .is, for better greppability =)

}

concatEntries.foreach { case (mapping, entries) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to unify the Some(_: Assembly.Rule.Append) logic with the None logic, rather than having them go down a totally different code path? Apart from complexity, I think this would also re-arrange the order of files in the jar, which while harmless is unnecessary and makes debugging harder (e.g. when you jar tf two jars to compare contents)

Perhaps we could make the None case go through concatEntries as well, just over-writing a key rather than appending to it. If we then make concatEntries a LinkedHashMap, that should let us preserve the original ordering of keys

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is - we process concatEntries outside the for comprehension that goes through all the classpath entries. If we process both "write once" and append outside for that means we have to keep all the files in memory, before we write them. I guess we want to avoid this, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rockjam aren't we're only keeping the list of files in concatEntries, and not the file contents themselves, since each AssemblyEntry is just a thin wrapper that allows you to open an InputStream on-demand? As long as we're not keeping their byte contents, and not holding open all their file-handles at the same time, I think keeping the wrappers in memory should be acceptable. e.g. Ammonite only has 16710 file within it's assembly, and even an order of magnitude more

Or we could just leave this as is; I guess a bit of reshuffling isn't the end of the world either.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sure. I forgot that. It was before this PR - we opened input streams in a for loop, but now we don't have to. Yes, that sounds reasonable.

@lihaoyi
Copy link
Member

lihaoyi commented May 14, 2018

@rockjam left a review

case None =>
val path = zipFs.getPath(mapping)
if(!excludePatterns.exists(_(mapping))) {
if(appendPatterns.exists(_(mapping))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic is somewhat incorrect; we appear to be checking both for AssemblyEntry#mapping == Entry#mapping, and also for AssemblyEntry#mapping.asPredicate().test(Entry#mapping). We should treat AssemblyEntry#mapping properly as a regex and only do one test. Otherwise files that don't match the regex, but do match the literal path of the regex, will be unexpected matched (e.g. for a regex hello [world] which should match hello w or hello d but not the entire string hello [world])

object core extends HelloWorldModuleWithMain {
def moduleDeps = Seq(model)

def assemblyRules = T { Seq(Assembly.Rule.Exclude("reference.conf")) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have one unit test for the behavior of anchored regexes? e.g. making sure^reference.conf matches only reference.conf and not foo/reference.conf, and that reference.conf$ matches only foo/reference.conf but not foo/reference.confetti

@lihaoyi
Copy link
Member

lihaoyi commented May 14, 2018

Whatever we end up doing, it's probably worth running some basic benchmarks using System.currentTimeMillis to e.g. assemble the Mill jar and make sure we didn't regress the speed

@rockjam
Copy link
Collaborator Author

rockjam commented May 30, 2018

@lihaoyi Updated PR according to your review. Also moved all that append and exclude logic from Jvm to Assembly object - seems to simplify createAssembly code a little bit.

@lihaoyi
Copy link
Member

lihaoyi commented May 31, 2018

@rockjam looks good to me

@rockjam
Copy link
Collaborator Author

rockjam commented May 31, 2018

@lihaoyi By the way, made some primitive benchmarking. Did three runs for each version.

Before append rules:

  1. 9119 millis
  2. 6620 millis
  3. 6719 millis

With append rules PR:

  1. 9417 millis
  2. 7322 millis
  3. 7770 millis

So, there is some slowdown introduced by this PR. Do we keep it as is, or it's better to find a way to reduce this slowdown?

@rockjam
Copy link
Collaborator Author

rockjam commented May 31, 2018

Hm, travis shows build as failed, even though it's all steps are fine. Release steps are canceled, but it always happens, if I remember correctly.
image

@lihaoyi
Copy link
Member

lihaoyi commented Jun 1, 2018

@rockjam those benchmark numbers look fine; a 10%-ish slowdown is still several times faster than SBT due to our layered jar-building.

Travis is probably just being flaky; feel free to merge

@rockjam rockjam merged commit baf2295 into com-lihaoyi:master Jun 1, 2018
@rockjam rockjam deleted the assembly-rules branch June 1, 2018 08:18
@lefou lefou added this to the 0.2.3 milestone May 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants