Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement script level cache #403

Merged
merged 1 commit into from Jul 3, 2016
Merged

Implement script level cache #403

merged 1 commit into from Jul 3, 2016

Conversation

@cosmo-kramer
Copy link
Contributor

@cosmo-kramer cosmo-kramer commented Jun 16, 2016

Program flow is like:

Speed improvement is achieved by making processModule cache anything( scripts, predefs, imports including $ivy and $file,etc.) it processes at script level( at block level it already gets) and tries to retrieve from cache next time it is asked to process same code.

Process Module checks if the script is available in cacheFolder i.e. scriptCaches. If it is not found, processModule0 is called which keeps the rest of program flow same as before this diff but passes back importHooks and other imports accumulated in processCorrectScript function. This data is stored by classFilesListSave which takes list of pkgName concatenated with wrapperName and hash values of blocks along with imports and stores them.

If script is found in cache, classFilesListLoad reads the list of names and hash values of blocks and loads those blocks from their cache folders made by compileCacheLoad function. Loaded import hooks are resolved by resolveSingleImportHook. Along with retrieved imports this data is passed to eval.evalCacheClassFiles which loads each classFile and evals the main function thus executing the code.

The final imports are returned by processModule function whether evaled or loaded from cache, thus processModule is opaque from outside whether cache HIT or MISS, it behaves in exactly same way in return as well as side effects except cache save.

@cosmo-kramer cosmo-kramer force-pushed the cosmo-kramer:master branch 7 times, most recently from 14b2ed5 to 52514bf Jun 17, 2016
@@ -70,21 +80,104 @@ case class Main(predef: String = "",
res
}


def runScriptLevelCache(path: Path, args: Seq[String] = Vector.empty,

This comment has been minimized.

@cosmo-kramer

cosmo-kramer Jun 18, 2016
Author Contributor

starting point for running scripts
Tries to load from cache else calls runscript

case x: Res.Failing => x
case Res.Success(imports) =>
case Res.Success(data) =>

This comment has been minimized.

@cosmo-kramer

cosmo-kramer Jun 18, 2016
Author Contributor

data is a 2-tuple of final imports and a list of block hash values and wrapperNames
List is saved using classFilesListSave having only wrapperNames and hashes as actual classFiles were already stored by compileCacheLoad

wrapper: String,
dataList: List[(String, String)],
tag: String): Unit = {
val dir = pkg + "." + wrapper + "0"

This comment has been minimized.

@cosmo-kramer

cosmo-kramer Jun 18, 2016
Author Contributor

make a file with "0" appended to wrapperName having hashValue and wrapperName of each block

@lihaoyi
Copy link
Member

@lihaoyi lihaoyi commented Jun 19, 2016

Great that you got tests passing!

Some high level feedback:

  • Would it be possible to move all the logic into the processModule function or some helper called by it? It seems you are duplicating a bunch of logic between runScriptLevelCache and cachedModule, as called through loadModule and load.module. Given that all these code paths end up going through processModule0 anyway, this would help keep the logic in one place.
  • If we did that, we could probably get rid of runScriptLevelCache entirely, since runScript will eventuall call processModule0 which will take care of the caching etc.. After all, the "first" script you run is no different from any other script you load.module; they all go through processModule0
  • We could probably get rid of the withCompiler flag; if all the script-caching logic is included as part of processModule0, check like https://github.com/lihaoyi/Ammonite/pull/403/files#diff-bcdb6a0282e047eba770bc309743e114R126 become un-necessary since that code calls processModule0, which should do the right thing by default

That's the high-level review; your code looks great. Leaving some other feedback in-line. Now that you've got this working and tests passing, let's iterate on this diff until it's great!

@@ -44,7 +46,7 @@ object BasicTests extends TestSuite{
|cd! 'repl/'src
|@
|println(wd relativeTo x)""".stripMargin
)
)

This comment has been minimized.

@lihaoyi

lihaoyi Jun 19, 2016
Member

Try not to re-format irrelevant things as part of this diff.

Things like this, and this (adding indentation to the whole for-comprehension) may or may not be the "right" formatting, but they have no place in an already-very-large diff like this one. Given that this diff is already large and hard to review for correctness, you should aim to avoid all this sort of minor/irrelevant changes so we can focus on the script-level-caching. If we care enough, we can put these changes into a separate diff later.

This comment has been minimized.

val (pkg, wrapper) = Util.pathToPackageWrapper(path, wd)
val cacheTag = "cache" + Util.md5Hash(Iterator(code.getBytes))
.map("%02x".format(_)).mkString
if(!code.contains("load(")) {

This comment has been minimized.

@lihaoyi

lihaoyi Jun 19, 2016
Member

Why do we need to check if the code contains load(?

I don't think this is necessary; we should cache a module regardless of what it does. Even if we did want to do this, it would immediately break once someone called the function via load (...) or load.apply (...)

@@ -81,7 +81,8 @@ object Preprocessor{
indexedWrapperName: String,
imports: Imports,
printerTemplate: String => String) = for{
Preprocessor.Expanded(code, printer) <- expandStatements(stmts, resultIndex)
Preprocessor.Expanded(code, printer) <- expandStatements(stmts, resultIndex)

This comment has been minimized.

@lihaoyi

lihaoyi Jun 19, 2016
Member

Don't reformat things unnecessarily

case postSplit =>
complete(stmts.mkString(""), wrapperIndex, postSplit)

case postSplit => complete(stmts.mkString(""), wrapperIndex, postSplit)

This comment has been minimized.

@lihaoyi

lihaoyi Jun 19, 2016
Member

Don't reformat things unnecessarily

@@ -3,6 +3,6 @@ package ammonite.repl

object TestMain{
def main(args: Array[String]): Unit = {
Main.main(args ++ Array("--home", "target/tempAmmoniteHome"))
Main.main(args)

This comment has been minimized.

@lihaoyi

lihaoyi Jun 19, 2016
Member

This shouldn't need to change, should it?

val scripts = List('scriptLevelCaching/"testLoadModule.scala",
'scriptLevelCaching/"scriptTwo.scala",
'scriptLevelCaching/"scriptOne.scala",
'scriptLevelCaching/"QuickSort.scala")

This comment has been minimized.

@lihaoyi

lihaoyi Jun 19, 2016
Member

Format this (and this and this) consistently with other multi-line functions:

FUNCTION(
  arg0,
  arg1,
  arg2,
)
val outBuffer = mutable.Buffer.empty[String]
val warningBuffer = mutable.Buffer.empty[String]
val errorBuffer = mutable.Buffer.empty[String]
val infoBuffer = mutable.Buffer.empty[String]

This comment has been minimized.

@lihaoyi

lihaoyi Jun 19, 2016
Member

Not sure why we're building up these buffers and throwing them away, but it doesn't matter anyway if we're getting rid of runScriptLevelCache as described in my high-level feedback

}

def module(file: Path): Unit = {
if (!withCompiler)

This comment has been minimized.

@lihaoyi

lihaoyi Jun 19, 2016
Member

We shouldn't need this check if loadModule (or processModule0) helps us check whether a loaded script has been fully-cached every single time

pkgName: Seq[Name]
): Res[Imports] = if(scriptCaching) {
val cacheTag = "cache" + Util.md5Hash(Iterator(code.getBytes)).map("%02x".format(_)).mkString
storage.asInstanceOf[Storage.Folder].classFilesListLoad(pkgName.map(_.backticked).mkString("."), wrapperName.backticked, cacheTag) match {

This comment has been minimized.

@lihaoyi

lihaoyi Jun 22, 2016
Member

We should need to asInstanceOf to check if something is a Storage.Folder; instead, Storage should has classfilesListLoad and classfilesListLoad as part of it's interface, and Storage.InMemory should just keep things in an in-memory Map or something when saved and read from that Map on load

val hardcodedPredef =
"import ammonite.repl.frontend.ReplBridge.repl.{pprintConfig, derefPPrint}"

// Should script as a whole be cached or not
val scriptCaching = storage match {

This comment has been minimized.

@lihaoyi

lihaoyi Jun 22, 2016
Member

This should not be necessary, as described in the comment below. Storage.InMemory should be able to cache things just fine, in memory.

(argString, Name("ArgsPredef"))
)
var predefImports = Imports(Nil)
initPredef()

This comment has been minimized.

@lihaoyi

lihaoyi Jun 22, 2016
Member

What is the purpose of moving this into a function?

val prompt = Ref("@ ")
val colors = Ref[Colors](Colors.Default)
val frontEnd = Ref[FrontEnd](AmmoniteFrontEnd(Filter.empty))
val printer = Printer(

This comment has been minimized.

@lihaoyi

lihaoyi Jun 22, 2016
Member

This is probably incorrect; we don't print stuff much when running scripts, but anything we do print should be printed "normally" e.g. via https://github.com/lihaoyi/Ammonite/blob/master/repl/src/main/scala/ammonite/repl/Repl.scala#L31-L36, and not aggregated into a bunch of random buffers to be thrown away

This comment has been minimized.

@lihaoyi

lihaoyi Jun 27, 2016
Member

@CoderAbhishek bump, this is all dead code, can you clean it up?

Res.Exception(e, s)
case x => x
}
case Res.Success(_) => Res.

This comment has been minimized.

@lihaoyi

lihaoyi Jun 22, 2016
Member

Don't reformat things unnecessarily; it makes code difficult to review

This comment has been minimized.

processModule0(source, code, wrapperName, pkgName, predefImports)
def evalCachedClassFiles(cachedData: Seq[CompileCache], pkg: String, wrapper: String): Unit = {

def evalMain(cls: Class[_]) =

This comment has been minimized.

@lihaoyi

lihaoyi Jun 22, 2016
Member

This is already a method in the Evaluator class; perhaps we can move it out into the Evaluator companion object so this method can all it? Rather than copy-pasting

@@ -581,7 +687,7 @@ class Interpreter(prompt0: Ref[String],
Interpreter.this.compiler.search(target)
}
def compiler = Interpreter.this.compiler.compiler
def newCompiler() = init()
def newCompiler() = null

This comment has been minimized.

@lihaoyi

lihaoyi Jun 22, 2016
Member

Not sure what you are doing here, but returning null is probably not the right thing to do

This comment has been minimized.

@lihaoyi
Copy link
Member

@lihaoyi lihaoyi commented Jun 22, 2016

The unit tests aren't testing the right thing. Our goal isn't to get the compilationCount to zero, but instead it is to ensure that the compiler is never initialized.

Realistically, what you should do is:

  • Introduce a new compilerInitialized boolean on the Interpreter that starts as false and gets set to true when/every-time the compiler is initialized
  • Move the tests from integration tests into "normal" unit tests, in the repl/ project
  • Extract the body of runScript into runScriptInternal, with runScriptInternal private[ammonite] since it's only to be used for tests and not part of the public API
  • Make runScriptInternal return a Res[(Seq[ImportData], Boolean)] with the boolean coming from the `compilerInitialized
  • Make runScript call runScriptInternal, and discard the boolean.
  • Make your unit tests instantiate the REPL and call scripts through the normal Main() call, except calling runScriptInternal rather than runScript, and validating that the second time (?) the same script is run, the compilerInitialized: Boolean that gets returned is false
// blockNumber keeps track of blockIndex
cachedData.foreach { d =>
for {
cls <- eval.loadClass(pkg + "." + wrapper + getBlockNumber, d._1)

This comment has been minimized.

@lihaoyi

lihaoyi Jun 22, 2016
Member

What happens if loadClass or evalMain fail? You should probably use Res.map on this to propagate the Res[_] from each individual loadClass into a big Res[List[...]], and make evalCachedClassFiles return a Res[_] to represent the possibility of failure

}
}
}
// wrapperIndex starts off as 1, so that consecutive wrappers can be named
// Wrapper, Wrapper2, Wrapper3, Wrapper4, ...
try loop(blocks, startingImports, Imports(), wrapperIndex = 1 )
try loop(blocks, startingImports, Imports(Nil), wrapperIndex = 1, List[(String, String)]())

This comment has been minimized.

@lihaoyi

lihaoyi Jun 22, 2016
Member

This can be simply List(); the type is inferred

for {
cls <- eval.loadClass(pkg + "." + wrapper + getBlockNumber, d._1)
} yield {
Compiler.addToClasspath(d._1, dynamicClasspath)

This comment has been minimized.

@lihaoyi

lihaoyi Jun 28, 2016
Member

Formatting



def processExec(code: String): Res[Imports] = {
init()

This comment has been minimized.

@cosmo-kramer

cosmo-kramer Jun 28, 2016
Author Contributor

For this code should I put init inside _ <- Res.Success(init()) in for comprehension that will just add one line, instead of changing whole function?

This comment has been minimized.

@lihaoyi

lihaoyi Jun 28, 2016
Member

This is fine

@@ -124,7 +124,8 @@ case class Catching(handler: PartialFunction[Throwable, Res.Failing]) {
}

case class Evaluated(wrapper: Seq[Name],
imports: Imports)
imports: Imports,
tag: String = "")

This comment has been minimized.

@lihaoyi

lihaoyi Jun 30, 2016
Member

Shouldn't use a default argument here; explicitly provide it in the places where it's necessary

@@ -271,7 +271,7 @@ object ScriptTests extends TestSuite{
}
'caching{

def createTestInterp(storage: Storage) = new Interpreter(
def createTestInterp(storage: Storage, predefS: String = "") = new Interpreter(

This comment has been minimized.

@lihaoyi

lihaoyi Jun 30, 2016
Member

Get rid of this predefS argument and just use ammonite.repl.Main.defaultPredefString directly

This comment has been minimized.

@cosmo-kramer

cosmo-kramer Jun 30, 2016
Author Contributor

In some cases it creates interp with empty predef to test

This comment has been minimized.

@lihaoyi

lihaoyi Jun 30, 2016
Member

Oh ok that's fine then, can we remove the default argument and pass in the predefS manually for those cases?

Also, let's just call it predef instead of predefS

) match {
case Res.Success(_) =>
Res.Success(imports)
case r => r.asInstanceOf[Res[Imports]]

This comment has been minimized.

@lihaoyi

lihaoyi Jun 30, 2016
Member

When you changed the snippet 10 lines above, can you change this one too?

@@ -324,6 +328,7 @@ object Util{
object Timer{
var current = 0L
var show = false
var startTime = System.nanoTime()

This comment has been minimized.

@lihaoyi

lihaoyi Jun 30, 2016
Member

What is this?

This comment has been minimized.

@cosmo-kramer

cosmo-kramer Jun 30, 2016
Author Contributor

This I was using for testing, if someone wants to check time elapsed between two points in program flow, one can set the startTime and print by subtracting from the System.Nanotime at the end. I will remove this if you think it won't be necessary in future otherwise I will put a comment explaining its use

This comment has been minimized.

@lihaoyi

lihaoyi Jun 30, 2016
Member

Let's remove this for now

read(codeCacheDir/"imports.list")
}.toOption
val (loadedTag, classFilesList) =
upickle.default.read[(String, Seq[(String, String)])](metadataJson.getOrElse(""))

This comment has been minimized.

@lihaoyi

lihaoyi Jun 30, 2016
Member

Why is there a getOrElse("") here? That will just cause read to throw an exception. What's the point?

}.flatten
val imports = upickle.default.read[Imports](impFile.getOrElse(""))
Some((res.unzip._1, imports))
}

This comment has been minimized.

@lihaoyi

lihaoyi Jun 30, 2016
Member

This if-statement is copy-pasted from above. Can you do something about it?

}
}
}
// wrapperIndex starts off as 1, so that consecutive wrappers can be named
// Wrapper, Wrapper2, Wrapper3, Wrapper4, ...
try loop(blocks, startingImports, Imports(), wrapperIndex = 1 )
try loop(blocks, startingImports, Imports(Nil), wrapperIndex = 1, List())

This comment has been minimized.

@lihaoyi

lihaoyi Jun 30, 2016
Member

Why the additional Nil here?

@@ -627,7 +723,7 @@ object Interpreter{
Util.md5Hash(imports.iterator.map(_.toString.getBytes)),
classpathHash
))
"cache" + bytes.map("%02x".format(_)).mkString //add prefix to make sure it begins with a letter
bytes.map("%02x".format(_)).mkString //add prefix to make sure it begins with a letter

This comment has been minimized.

@lihaoyi

lihaoyi Jun 30, 2016
Member

This comment is out of date


val cls = Res.map(cachedData.zipWithIndex)(
(data: (ClassFiles, Int)) => {
val (clsFiles, index) = data

This comment has been minimized.

@lihaoyi

lihaoyi Jun 30, 2016
Member

This should be

case (clsFiles, index) =>

without the curlies

@@ -117,7 +125,7 @@ class Interpreter(prompt0: Ref[String],
var predefImports = Imports()
for( (sourceCode, wrapperName) <- predefs) {
val pkgName = Seq(Name("ammonite"), Name("predef"))
processModule0(ImportHook.Source.File(wd/"<console>"), sourceCode, wrapperName, pkgName, predefImports) match{

This comment has been minimized.

@lihaoyi

lihaoyi Jun 30, 2016
Member

Why the change from <console> to Main.scala?


def evalCachedClassFiles(cachedData: Seq[ClassFiles],
pkg: String,
wrapper: String): Res[Seq[Unit]] = {

This comment has been minimized.

@lihaoyi

lihaoyi Jun 30, 2016
Member

This should be a Res[Unit], not Res[Seq[Unit]]

eval.loadClass(pkg + "." + wrapper + getBlockNumber(index + 1), clsFiles)
}
)
cls.map(_.map(eval.evalMain(_)))

This comment has been minimized.

@lihaoyi

lihaoyi Jun 30, 2016
Member

You should use a for-comprehension instead of .map on the Res.

The name cls is also misleading, since this isn't a Class, it is a res.

Furthermore, this is incorrect, as an exception inside evalMain will get thrown and not caught. You need to use a _ <- Catching{userCodeExceptionHandler} inside your for-comprehension to catch that.

Lastly, this stuff which has to do with evaluating compiled class files should live in Evaluator, not in Interpreter

res.map(_._1)
case r: Res.Failing => r
}
case Some((cachedData, imports)) =>

This comment has been minimized.

@lihaoyi

lihaoyi Jun 30, 2016
Member

You are failing to cache import hooks, and thus any necessary magic imports like import $file.foo to load secondary files or Ivy dependencies will not run on cached scripts, which will cause failures.

@@ -258,23 +274,24 @@ class Interpreter(prompt0: Ref[String],
)

}
// empty quotes to serve as a dummy cacheTag for repl commands which are not to be cached

This comment has been minimized.

@lihaoyi

lihaoyi Jun 30, 2016
Member

Stale comment

This comment has been minimized.

@cosmo-kramer

cosmo-kramer Jun 30, 2016
Author Contributor

Removed

init()
val res = processModule0(source, code, wrapperName, pkgName, predefImports)
res match{
case Res.Success(data) =>

This comment has been minimized.

@lihaoyi

lihaoyi Jun 30, 2016
Member

This should be

case Res.Success((imports, files)) => 

Also, why are they called files? Are they not CachedData tuples? Everywhere else you call them cachedData

@lihaoyi
Copy link
Member

@lihaoyi lihaoyi commented Jun 30, 2016

Looks like we're not done yet.

You are missing two sets of unit tests, given the potential failures I am seeing in your code:

  • Tests than runtime exceptions within cached scripts are properly caught and returned in a Res.Failing result
  • Tests that magic imports such as import $file.Foo or import $ivy.com.lihaoyi::scalatags:0.5.4` work in cached scripts.
@@ -170,25 +172,28 @@ object Evaluator{
// Exhaust the printer iterator now, before exiting the `Catching`
// block, so any exceptions thrown get properly caught and handled

val iter = evalMain(cls).asInstanceOf[Iterator[String]]
val iter = cls.getDeclaredMethod("$main").invoke(null).asInstanceOf[Iterator[String]]

This comment has been minimized.

@lihaoyi

lihaoyi Jun 30, 2016
Member

I am not convinced that this is doing anything. Can you revert it?

This comment has been minimized.

@cosmo-kramer

cosmo-kramer Jul 2, 2016
Author Contributor

I don't know why but it is necessary, without it that weird BoxedUniterror starts coming for repl cmds
Maybe some return type problem might be there, I tried to investigate it earlier but was not much fruitful and left once I got this solution, if you want I will try checking where the culprit is.

upickle.default.write((tag, dataList.reverse), indent = 4)
)
write(codeCacheDir/"imports.list", upickle.default.write(imports))
write(codeCacheDir/"importTrees.list", upickle.default.write(importTreesList))

This comment has been minimized.

@cosmo-kramer

cosmo-kramer Jul 1, 2016
Author Contributor

Store the importTree also and load it and pass each importTree to ResolveSingleImportHook, to load imports

This comment has been minimized.

@lihaoyi

lihaoyi Jul 2, 2016
Member

These should both be .json files since they contain JSON, and you should pass in indent = 4 to make the pretty-printed and human readable

assert(interp2.compiler == null &&
res.toString == "java.lang.ArithmeticException:/by zero")
}

This comment has been minimized.

@cosmo-kramer

cosmo-kramer Jul 1, 2016
Author Contributor

runTimeExceptions.scala reads a number from file num.val and prints 5/num.val, first it is 1 and then changed to zero by the script, so when next time it reads it causes a run-time exception

@@ -0,0 +1,4 @@
import ammonite.ops._
println(5/read(cwd/'repl/'src/'test/'resources/'scriptLevelCaching/"num.value").trim.toInt)

This comment has been minimized.

@lihaoyi

lihaoyi Jul 2, 2016
Member

You should make this read and write from repl/target/scriptCachingTests/, not the src/ folder which is for source code

imports: Imports,
tag: String,
importTreesList: Seq[ImportTree]): Unit = {
val dir = pkg.replace("/", "$div") + "." + wrapper.replace("/", "$div") + "0"

This comment has been minimized.

@lihaoyi

lihaoyi Jul 2, 2016
Member

As discussed, we should not randomly prefix things with 0: these should go in a separate folder from the cache/compile/, e.g. cache/scriptLevel/

importTreesList: Seq[ImportTree]): Unit = {
val dir = pkg.replace("/", "$div") + "." + wrapper.replace("/", "$div") + "0"
val codeCacheDir = compileCacheDir/dir
if(!exists(codeCacheDir)) {

This comment has been minimized.

@lihaoyi

lihaoyi Jul 2, 2016
Member

This exists check is not necessary; mkdir does it for you already

'testTwo - check('scriptLevelCaching/"scriptOne.scala")
'testThree - check('scriptLevelCaching/"QuickSort.scala")
'testLoadModule - check('scriptLevelCaching/"testLoadModule.scala")
'testFileImport - check('scriptLevelCaching/"testFileImport.scala")

This comment has been minimized.

@lihaoyi

lihaoyi Jul 2, 2016
Member

We should have another test here to use import $ivy to verify that that keeps working inside cached scripts

This comment has been minimized.

@cosmo-kramer

cosmo-kramer Jul 2, 2016
Author Contributor

Added!

@cosmo-kramer cosmo-kramer force-pushed the cosmo-kramer:master branch from 2db34d8 to eba766e Jul 2, 2016
@lihaoyi
Copy link
Member

@lihaoyi lihaoyi commented Jul 3, 2016

Looks good to me!

@lihaoyi lihaoyi merged commit 9eabbd6 into com-lihaoyi:master Jul 3, 2016
1 of 2 checks passed
1 of 2 checks passed
continuous-integration/travis-ci/pr The Travis CI build failed
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants