Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement script level cache #403

Merged
merged 1 commit into from
Jul 3, 2016
Merged

Conversation

cosmo-kramer
Copy link
Contributor

@cosmo-kramer cosmo-kramer commented Jun 16, 2016

Program flow is like:

Speed improvement is achieved by making processModule cache anything( scripts, predefs, imports including $ivy and $file,etc.) it processes at script level( at block level it already gets) and tries to retrieve from cache next time it is asked to process same code.

Process Module checks if the script is available in cacheFolder i.e. scriptCaches. If it is not found, processModule0 is called which keeps the rest of program flow same as before this diff but passes back importHooks and other imports accumulated in processCorrectScript function. This data is stored by classFilesListSave which takes list of pkgName concatenated with wrapperName and hash values of blocks along with imports and stores them.

If script is found in cache, classFilesListLoad reads the list of names and hash values of blocks and loads those blocks from their cache folders made by compileCacheLoad function. Loaded import hooks are resolved by resolveSingleImportHook. Along with retrieved imports this data is passed to eval.evalCacheClassFiles which loads each classFile and evals the main function thus executing the code.

The final imports are returned by processModule function whether evaled or loaded from cache, thus processModule is opaque from outside whether cache HIT or MISS, it behaves in exactly same way in return as well as side effects except cache save.

@cosmo-kramer cosmo-kramer force-pushed the master branch 7 times, most recently from 14b2ed5 to 52514bf Compare June 18, 2016 18:33
@@ -70,21 +80,104 @@ case class Main(predef: String = "",
res
}


def runScriptLevelCache(path: Path, args: Seq[String] = Vector.empty,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starting point for running scripts
Tries to load from cache else calls runscript

@lihaoyi
Copy link
Member

lihaoyi commented Jun 19, 2016

Great that you got tests passing!

Some high level feedback:

  • Would it be possible to move all the logic into the processModule function or some helper called by it? It seems you are duplicating a bunch of logic between runScriptLevelCache and cachedModule, as called through loadModule and load.module. Given that all these code paths end up going through processModule0 anyway, this would help keep the logic in one place.
  • If we did that, we could probably get rid of runScriptLevelCache entirely, since runScript will eventuall call processModule0 which will take care of the caching etc.. After all, the "first" script you run is no different from any other script you load.module; they all go through processModule0
  • We could probably get rid of the withCompiler flag; if all the script-caching logic is included as part of processModule0, check like https://github.com/lihaoyi/Ammonite/pull/403/files#diff-bcdb6a0282e047eba770bc309743e114R126 become un-necessary since that code calls processModule0, which should do the right thing by default

That's the high-level review; your code looks great. Leaving some other feedback in-line. Now that you've got this working and tests passing, let's iterate on this diff until it's great!

@@ -44,7 +46,7 @@ object BasicTests extends TestSuite{
|cd! 'repl/'src
|@
|println(wd relativeTo x)""".stripMargin
)
)
Copy link
Member

@lihaoyi lihaoyi Jun 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try not to re-format irrelevant things as part of this diff.

Things like this, and this (adding indentation to the whole for-comprehension) may or may not be the "right" formatting, but they have no place in an already-very-large diff like this one. Given that this diff is already large and hard to review for correctness, you should aim to avoid all this sort of minor/irrelevant changes so we can focus on the script-level-caching. If we care enough, we can put these changes into a separate diff later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkgName: Seq[Name]
): Res[Imports] = if(scriptCaching) {
val cacheTag = "cache" + Util.md5Hash(Iterator(code.getBytes)).map("%02x".format(_)).mkString
storage.asInstanceOf[Storage.Folder].classFilesListLoad(pkgName.map(_.backticked).mkString("."), wrapperName.backticked, cacheTag) match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should need to asInstanceOf to check if something is a Storage.Folder; instead, Storage should has classfilesListLoad and classfilesListLoad as part of it's interface, and Storage.InMemory should just keep things in an in-memory Map or something when saved and read from that Map on load

@lihaoyi
Copy link
Member

lihaoyi commented Jun 22, 2016

The unit tests aren't testing the right thing. Our goal isn't to get the compilationCount to zero, but instead it is to ensure that the compiler is never initialized.

Realistically, what you should do is:

  • Introduce a new compilerInitialized boolean on the Interpreter that starts as false and gets set to true when/every-time the compiler is initialized
  • Move the tests from integration tests into "normal" unit tests, in the repl/ project
  • Extract the body of runScript into runScriptInternal, with runScriptInternal private[ammonite] since it's only to be used for tests and not part of the public API
  • Make runScriptInternal return a Res[(Seq[ImportData], Boolean)] with the boolean coming from the `compilerInitialized
  • Make runScript call runScriptInternal, and discard the boolean.
  • Make your unit tests instantiate the REPL and call scripts through the normal Main() call, except calling runScriptInternal rather than runScript, and validating that the second time (?) the same script is run, the compilerInitialized: Boolean that gets returned is false

// blockNumber keeps track of blockIndex
cachedData.foreach { d =>
for {
cls <- eval.loadClass(pkg + "." + wrapper + getBlockNumber, d._1)
Copy link
Member

@lihaoyi lihaoyi Jun 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if loadClass or evalMain fail? You should probably use Res.map on this to propagate the Res[_] from each individual loadClass into a big Res[List[...]], and make evalCachedClassFiles return a Res[_] to represent the possibility of failure


'blocks{
val cases = Seq("OneBlock.scala" -> 2, "TwoBlocks.scala" -> 3, "ThreeBlocks.scala" -> 4)
for((fileName, expected) <- cases){
val storage = Storage.InMemory()
val interp = createTestInterp(storage)
val n0 = storage.compileCache.size

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting...

@lihaoyi
Copy link
Member

lihaoyi commented Jun 30, 2016

Looks like we're not done yet.

You are missing two sets of unit tests, given the potential failures I am seeing in your code:

  • Tests than runtime exceptions within cached scripts are properly caught and returned in a Res.Failing result
  • Tests that magic imports such as import $file.Foo or import $ivy.com.lihaoyi::scalatags:0.5.4` work in cached scripts.

@@ -170,25 +172,28 @@ object Evaluator{
// Exhaust the printer iterator now, before exiting the `Catching`
// block, so any exceptions thrown get properly caught and handled

val iter = evalMain(cls).asInstanceOf[Iterator[String]]
val iter = cls.getDeclaredMethod("$main").invoke(null).asInstanceOf[Iterator[String]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced that this is doing anything. Can you revert it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why but it is necessary, without it that weird BoxedUniterror starts coming for repl cmds
Maybe some return type problem might be there, I tried to investigate it earlier but was not much fruitful and left once I got this solution, if you want I will try checking where the culprit is.

@lihaoyi
Copy link
Member

lihaoyi commented Jul 3, 2016

Looks good to me!

@lihaoyi lihaoyi merged commit 9eabbd6 into com-lihaoyi:master Jul 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants