Skip to content

Building your first index

cardinal252 edited this page Sep 24, 2013 · 13 revisions

In order to get started with Lucinq, you can simply utilise the NuGet Lucinq package from any .net project

#Building Your First Index

Index building with lucinq is a breeze

Firstly, we must open a folder for indexing - in this case we are going to use the static Open() method off the native lucene FSDirectory object.

var indexFolder = FSDirectory.Open(new DirectoryInfo(GeneralConstants.Paths.BBCIndex));

Now we have the directory to work with, we need to set the analyzer we are going to use:

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_29);

In order to write to an index, we are required to create ourselves an indexwriter object. Note, these are disposable, so we will wrap it in a using to keep things neat.

using (IndexWriter indexWriter = new IndexWriter(indexFolder, analyzer, IndexWriter.MaxFieldLength.UNLIMITED))
{
// our code goes here
}

Ok - so now we are ready to consider writing to our index, but, what to write? For the purposes of this tutorial, I have downloaded every rss feed from the BBC news website that I could find - this allows me some free content that I can index and later search on.

string[] rssFiles = Directory.GetFiles(GeneralConstants.Paths.RSSFeed);
foreach (var rssFile in rssFiles)
{
// do something with our rss feed
}

Ok - so now for the important bit! We need to actually write the lucene document

List<NewsArticle> newsArticles = ReadFeed(rssFile); // gets a list of news
newsArticles.ForEach(
	newsArticle => 
		indexWriter.AddDocument
		(
			x => x.AddAnalysedField(BBCFields.Title, newsArticle.Title, true), // adds an analysed & stored field to the index (thanks to the overload)
			x => x.AddAnalysedField(BBCFields.Description, newsArticle.Description, true),// adds an analysed & non-stored field to the index
			x => x.AddAnalysedField(BBCFields.Copyright, newsArticle.Copyright),
			x => x.AddStoredField(BBCFields.Link, newsArticle.Link),// adds a non-analyzed & stored field to the index for later retrieval
			x => x.AddNonAnalysedField(BBCFields.PublishDate, TestHelpers.GetDateString(newsArticle.PublishDateTime), true)) // adds a non-analyzed field to the index for querying for exact matches.
		);

Finally - we need to optimize and close our index

indexWriter.Optimize();
indexWriter.Close();

Great - so now your index is ready to query! So here is the final sample code:

var indexFolder = FSDirectory.Open(new DirectoryInfo(GeneralConstants.Paths.BBCIndex));
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_29);
using (IndexWriter indexWriter = new IndexWriter(indexFolder, analyzer, IndexWriter.MaxFieldLength.UNLIMITED))
{
	string[] rssFiles = Directory.GetFiles(GeneralConstants.Paths.RSSFeed);
	foreach (var rssFile in rssFiles)
	{
		var newsArticles = ReadFeed(rssFile);
		newsArticles.ForEach(
			newsArticle => 
				indexWriter.AddDocument
				(
					x => x.AddAnalysedField(BBCFields.Title, newsArticle.Title, true),
					x => x.AddAnalysedField(BBCFields.Description, newsArticle.Description, true),
					x => x.AddAnalysedField(BBCFields.Copyright, newsArticle.Copyright),
					x => x.AddStoredField(BBCFields.Link, newsArticle.Link),
					x => x.AddNonAnalysedField(BBCFields.PublishDate, TestHelpers.GetDateString(newsArticle.PublishDateTime), true))
				);
	}

	indexWriter.Optimize();
	indexWriter.Close();
}

The sample shows a basic index being built from an folder of downloaded rss feeds from the bbc website.