-
Notifications
You must be signed in to change notification settings - Fork 6
Building your first index
In order to get started with Lucinq, you can simply utilise the NuGet Lucinq package from any .net project
#Building Your First Index
Index building with lucinq is a breeze
Firstly, we must open a folder for indexing - in this case we are going to use the static Open() method off the native lucene FSDirectory object.
var indexFolder = FSDirectory.Open(new DirectoryInfo(GeneralConstants.Paths.BBCIndex));
Now we have the directory to work with, we need to set the analyzer we are going to use:
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_29);
In order to write to an index, we are required to create ourselves an indexwriter object. Note, these are disposable, so we will wrap it in a using to keep things neat.
using (IndexWriter indexWriter = new IndexWriter(indexFolder, analyzer, IndexWriter.MaxFieldLength.UNLIMITED))
{
// our code goes here
}
Ok - so now we are ready to consider writing to our index, but, what to write? For the purposes of this tutorial, I have downloaded every rss feed from the BBC news website that I could find - this allows me some free content that I can index and later search on.
string[] rssFiles = Directory.GetFiles(GeneralConstants.Paths.RSSFeed);
foreach (var rssFile in rssFiles)
{
// do something with our rss feed
}
Ok - so now for the important bit! We need to actually write the lucene document
List<NewsArticle> newsArticles = ReadFeed(rssFile); // gets a list of news
newsArticles.ForEach(
newsArticle =>
indexWriter.AddDocument
(
x => x.AddAnalysedField(BBCFields.Title, newsArticle.Title, true), // adds an analysed & stored field to the index (thanks to the overload)
x => x.AddAnalysedField(BBCFields.Description, newsArticle.Description, true),// adds an analysed & non-stored field to the index
x => x.AddAnalysedField(BBCFields.Copyright, newsArticle.Copyright),
x => x.AddStoredField(BBCFields.Link, newsArticle.Link),// adds a non-analyzed & stored field to the index for later retrieval
x => x.AddNonAnalysedField(BBCFields.PublishDate, TestHelpers.GetDateString(newsArticle.PublishDateTime), true)) // adds a non-analyzed field to the index for querying for exact matches.
);
Finally - we need to optimize and close our index
indexWriter.Optimize();
indexWriter.Close();
Great - so now your index is ready to query! So here is the final sample code:
var indexFolder = FSDirectory.Open(new DirectoryInfo(GeneralConstants.Paths.BBCIndex));
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_29);
using (IndexWriter indexWriter = new IndexWriter(indexFolder, analyzer, IndexWriter.MaxFieldLength.UNLIMITED))
{
string[] rssFiles = Directory.GetFiles(GeneralConstants.Paths.RSSFeed);
foreach (var rssFile in rssFiles)
{
var newsArticles = ReadFeed(rssFile);
newsArticles.ForEach(
newsArticle =>
indexWriter.AddDocument
(
x => x.AddAnalysedField(BBCFields.Title, newsArticle.Title, true),
x => x.AddAnalysedField(BBCFields.Description, newsArticle.Description, true),
x => x.AddAnalysedField(BBCFields.Copyright, newsArticle.Copyright),
x => x.AddStoredField(BBCFields.Link, newsArticle.Link),
x => x.AddNonAnalysedField(BBCFields.PublishDate, TestHelpers.GetDateString(newsArticle.PublishDateTime), true))
);
}
indexWriter.Optimize();
indexWriter.Close();
}
The sample shows a basic index being built from an folder of downloaded rss feeds from the bbc website.