Skip to content

Examine with Azure Directory (Blob Storage)

Steve Temple edited this page Aug 16, 2018 · 11 revisions

This is alpha stuff!

Overview

Working with Lucene on Azure means that you need to host your Lucene files on the local fast drive (%temp%) since Lucene doesn't work well when it's read from a remote network drive which is how the storage works on Azure Web Apps. Examine has a few options to support working on the local fast drive with options like SyncTempEnvDirectoryFactory. When using Lucene on Azure it would be ideal to have the 'master' (write-only) index stored in Blob Storage.

Azure can move your site to a new server anytime. This is why SyncTempEnvDirectoryFactory exists, so that the 'master' index is stored in App_Data but the local read index is lazily built with the required files it needs from the 'master'. Without this syncing, it would mean that the index wouldn't be there at all when Azure moves your site and the index would need to be rebuilt on startup. Instead of storing the master index on the remove file share in App_Data, it could be stored in Blob Storage which would mean that the same index could be shared between multiple Web Apps, which means that Load Balancing would be much nicer to do with Lucene Indexes. When scaling, the new worker would just lazily build it's local index based on the Blob Storage 'master' index.

In order for this to work however it means that only a single worker can ever write to indexes. In Umbraco load balancing this is achieved by having a single Web App that is not scaled designated as the master CMS server and all other Web Apps are for serving front-end requests only.

There IS an already existing package called AzureDirectory but this is built only for Lucene 3.x not Lucene 2.9. I have contributed to the original project and have also found some bugs with it. Examine's version of AzureDirectory is a port of the original code but brought up to date with various bug fixes and built against 2.9. Examine's version also only implements simple file directory locking and not a native FS file lock which the original AzureDirectory attempts to do by using Blob leases. The simple file directory locking works fine (at least with Umbraco) because it is guaranteed that only a single process is ever writing to the index at one time.

To set this up

You can get the Nuget package from here: https://ci.appveyor.com/project/Shandem/examine/build/artifacts

Or use the Nuget feed: https://ci.appveyor.com/nuget/examine-f73l6qv0oqfh (update your Nuget.config file with this)

and then

 Install-Package Examine.AzureDirectory -version 1.0.0-beta05 -Pre

To activate it, you need to add these settings to your web.config

<add key="examine:AzureStorageConnString" value="YOUR-STORAGE-CONNECTION-STRING" />
<add key="examine:AzureStorageContainer" value="YOUR-CONTAINER-NAME" />

Then this directoryFactory attribute needs to be added to each of your indexes in the ExamineIndexProviders section:

directoryFactory="Examine.AzureDirectory.AzureDirectoryFactory, Examine.AzureDirectory"

For example:

<add name="InternalIndexer" directoryFactory="Examine.AzureDirectory.AzureDirectoryFactory, Examine.AzureDirectory"/>

Warning/Disclaimer!

I have this running in production but on a small site that doesn't get a lot of updates. I have not tested this in a Load Balanced environment!

... I would love it if someone could test this out though :)