Disk cache outside webroot #521

Merged
merged 2 commits into from Dec 14, 2016

Projects

None yet

3 participants

@Shazwazza
Contributor

The PR does quite a few things:

  • Allows for having the IP disk cache to exist outside of the web root
  • The paths configured for the IP disk cache can now be absolute paths or virtual paths, virtual paths can also be relative to go beyond the web root (i.e. "~/../OutsideOfWebRoot")
  • Adds unit tests for resolving the cache folder, absolute and virtual paths
  • Better performance for the disk cache - it now only validates and resolves the cache paths once (lazily) per application, before it did this for every file
  • Adds a new configuration option to opt-in to automatically creating file watchers for every cached image -> this is actually the main cause for FCN problems
  • Updates the _Layout.cshtml page to show more FCN details
  • Adds lots of notes regarding what has been changed and a few more TODOs (though these TODOs are outside of this PR scope)

Ok, so I'll try to explain as much as I can here :)

  • By default in ASP.NET with FcnMode="NotSet", ASP.NET will create a watcher for every sub folder in your application that is inside of the web root. It will create these watchers lazily, so once an IP cached image is fetched a watcher would be created for the folder it exists in
  • Using FcnMode="Single", ASP.NET will create a single watcher with a larger buffer but will still monitor all sub folders in your application that is inside the web root.

So why choose one over another? Well there's a couple of reasons. When running on Azure Web apps, the file server is a remote file server (i.e. UNC Share). When you have FcnMode="NotSet" then having all of these watchers will actually slow down the file server quite a lot... this is almost true... Azure Web apps runs a custom windows kernel so MS is doing something rather special to ensure that these watchers don't slow down the file servers, however other hosts like Umbraco Cloud aren't so lucky and run a normal Windows kernel. So running FcnMode="Single" on Umbraco Cloud makes the file system much happier, same would probably be true for any traditionally hosted IIS site that uses remote file system UNC file paths. However, the downfall of "Single" mode is that the buffer can overflow more easily because there's only one watcher watching a ton of folders. Since Image Processor creates quite a lot of folders and if people use Image Processor extensively then they may see this buffer overflow with errors like "Overwhelming change notifications" (among others).

Being able to store Image Processor cache files outside of the web root will reduce the amount of directories watched which this PR enables. However what I discovered is that even if we do this, there are still file watchers being created for every Image Process cache file! This is because by default when Image Process adds the result to the memory cache, it does so by adding a file dependency. This is so that the in memory cache can be invalidated if the cache files on disk changes. I don't really think it's a big requirement for people to have that functionality and it drastically increases the number of files watchers. In actual fact, I believe this to be the main root cause of FCN issues with Image Processor.

There is a new config option: useFileChangeMonitors and by default this will be false. This is sort of a breaking change (but is it?!) since it was true before. This one change could in fact reduce the number of file watchers so that FCN is no longer an issue. You can test this with the test web app by togging this value. But, even if this is false, you'll see that the "FileChangesMonitor aliases hash table" on the test website still grows anytime an Image Process cache file is accessed when it's in the web root so I'm fairly sure that the "Single" watcher is still watching those.

So when the cache is configured outside of the web root like with "~/../OutsideOfWebRoot" (make sure that folder exists and that the IIS user has access to it!), then load up and browse around the test website and you'll notice a vast reduction in the amount of image processor cache folders being watched, along with the default false setting for useFileChangeMonitors, the number of folders monitored is now very low.

Shazwazza added some commits Dec 12, 2016
@Shazwazza Shazwazza Creates new config section to enable/disable the cached file callback…
…s, updates the DiskCache so that the cache path validation logic executes once instead of for every file, updates DiskCache to support specifying an absolute location, updates DiskCache to support storing files outside of the web root.
c060311
@Shazwazza Shazwazza Allows for having both relative and absolute paths configured for the…
… image cache folder, adds comments and unit tests for resolving the paths.
322a451
+ /// </remarks>
+ private static string GetValidatedAbsolutePath(string originalPath, out string virtualPath)
+ {
+ var absoluteCachePath = LazyInitializer.EnsureInitialized(
@JimBobSquarePants
JimBobSquarePants Dec 12, 2016 Owner

I didn't know about the static LazyInitializer class. This should certainly help with performance.

+ catch (HttpException)
+ {
+ //need to user our own logic
+ return s.Replace("~/", HttpRuntime.AppDomainAppPath).Replace("/", "\\");
@JimBobSquarePants
JimBobSquarePants Dec 12, 2016 Owner

I was wondering how this would be done.

@@ -92,6 +243,10 @@ public DiskCache(string requestPath, string fullPath, string querystring)
/// </returns>
public override async Task<bool> IsNewOrUpdatedAsync()
{
+ //TODO: Before this check is performed it should be throttled. For example, only perform this check
@JimBobSquarePants
JimBobSquarePants Dec 12, 2016 Owner

What would you do to throttle this?

@Shazwazza
Shazwazza Dec 13, 2016 Contributor

I have code in a few projects that we do a very similar thing. We do this in Umbraco core and in Examine too. Using a thread timer we start the timer whenever this is called, there's 2 values that need to be checked: timer timeout (i.e. 5 seconds) and the max timeout (i.e. 30 seconds). This ends up being a sliding offset, the logic won't be executed if this method keeps getting executed until the timer reaches the max (30 seconds) or if the last request time is larger than 5 seconds.

This means that under high load the perf of this would be much better since it wouldn't have to read from the file system on every request just to check if its out of date. Of course there would be a max of 30 seconds under load if the file did change but that is OK.

I can make another PR for this when I get time - I'll create an issue for it.

+ // `ApplicationInstance.CompleteRequest();` instead of Response.End() but in this case I believe it is
+ // correct to use Response.End(), a good write-up of why this is can be found here:
+ // see: http://stackoverflow.com/a/36968241/694494
+ context.Response.End();
@JimBobSquarePants
JimBobSquarePants Dec 12, 2016 Owner

Did you notice any performance implications or excess logging due to the ThreadAbortException?

@Shazwazza
Shazwazza Dec 13, 2016 Contributor

I did read that that could happen but I didn't see anything like this in my tests. If you have seen this in the past do you know how to replicate? Maybe I should test a build of this with Umbraco and see what it does.

@Nicholas-Westby
Nicholas-Westby Dec 19, 2016

Not sure if it relates to this, but IIS will recycle the app pool after a configurable number (default of 5) uncaught exceptions in a configurable period (default 5 minutes). Pretty sure ThreadAbortException would be one example of that.

@Shazwazza
Shazwazza Dec 19, 2016 Contributor

Yes this needs to be tested to see if this actually causes this. From some reading it seems that older versions of .Net may have thrown this but more recent ones don't... I'm not quite sure, needs to be tested

@JimBobSquarePants
JimBobSquarePants Dec 19, 2016 Owner

I didn't see anything in my logs when doing initial testing but definitely wise to test it thoroughly

+ /// * ASP.Net app domain restarts if using FCNMode="Single" since the FCN buffer can overflow
+ /// </remarks>
+ [ConfigurationProperty("useFileChangeMonitors", DefaultValue = false, IsRequired = false)]
+ public bool UseFileChangeMonitors
@JimBobSquarePants
JimBobSquarePants Dec 12, 2016 Owner

False is good here, I think this would be the best for 99% installs.

+namespace ImageProcessor.UnitTests.ImageCache
+{
+ [TestFixture]
+ public class DiskCacheTests
@JimBobSquarePants
JimBobSquarePants Dec 12, 2016 Owner

Good tests... I'm terrible for this in this project.

@JimBobSquarePants

@Shazwazza This is all great!

As you spotted the only reason I have the monitoring was to keep the memcache in sync. As you say though most people won't need that level of tracking. I wonder whether the memcache is of that much benefit?

@Shazwazza
Contributor

Hrm, I'm not really sure if the mem cache is of any use beside just occupying a lot of extra mem? The file cache handling should be ultra fast

@JimBobSquarePants

Maybe for the Azure Cache since that reduces network requests... DiskCache will definitely be fast.

@JimBobSquarePants JimBobSquarePants merged commit 322a451 into JimBobSquarePants:develop Dec 14, 2016

1 check passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
@JimBobSquarePants

Merged. Much appreciated!

@Shazwazza
Contributor

I've tested the latest with Umbraco and storing outside of the wwwroot and there's no error logs output with regards to using Response.End so I'm pretty sure it's not an issue

@JimBobSquarePants

Great stuff. It occurred to me actually that it would have reset IIS during my initial testing if it was an issue.

@Shazwazza
Contributor

Just gonna test on Umbraco Cloud now to see what happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment