dotJEM JSON Index

!! This is a Work in progress and many lessons has been learned for the first version which the a next version will use.

Handles indexing of any arbitrary JSON objects based in Lucene.NET.

This basically enables the following concept:

IStorageIndex index = new LuceneStorageIndex();
index
    .Write(JObject.Parse("{ $id: '...', $contentType: 'person', name: 'Peter', age: 20 }"))
    .Write(JObject.Parse("{ $id: '...', $contentType: 'person', name: 'Lars', age: 30 }"))
    .Write(JObject.Parse("{ $id: '...', $contentType: 'person', name: 'John', age: 42 }"));

Assert.That(
    index.Search("name: Peter").Select(hit => hit.Json).Single(),
    Is.EqualTo(JObject.Parse("{ $id: '...', $contentType: 'person', name: 'Peter', age: 20 }")));

Assert.That(
    index.Search("age: [40 TO 50]").Select(hit => hit.Json).Single(),
    Is.EqualTo(JObject.Parse("{ $id: '...', $contentType: 'person', name: 'John', age: 42 }")));

Assert.That(
    index.Search("$contentType: person").ToArray().Length,
    Is.EqualTo(3));

Note that $id is a GUID, but this is obmitted with ... in the above for readability. This is a strategy that can be replaced with e.g. a more simple int/long strategy. The name of reserved fields like $id and $contentType can also be configured with other strategies enabling a high degree of flexibility.

Configuration

It is possible to configure the indexing for different ways of doing Document identification etc.

Document identification is important for Updating documents, Content type is what automated schema generation is based around.

IStorageIndex index = new LuceneStorageIndex();
var config = index.Configuration;
config
    // Set how the Type of a document is identified, ContentType, Type, SchemaType or similar, this is used to categorize data
    // and update associated Schemas as data goes into the index.
    .SetTypeResolver("Type")
    // This describes a document source, this is will be deprecated in the future.
    .SetAreaResolver("Area")
    // At this point we begin to target data based on their categorization (Type), we can use "ForAll" to say that this goes for all
    // Documents of any type or For("Type") to target specifit types.
    .ForAll()
    // Sets how documents are identified, this is used to update rather than add documents when they are allready in the index.
    // Specifically.
    .SetIdentity("Id");

What is next?

So far this framework has proved to simplify allot of things for us, but parts of it are still viewed as a prototype from our perspective, the core works well and we use that in production but there are allot of unfinished edges.

The plan going forward is to move up to Lucene 4.8 and with the allot of changes will happen, I am looking into making the framework more mudular in terms of packages and give patterns than provide better means to extend it with own query parser logic, document creation etc...

Name		Name	Last commit message	Last commit date
Latest commit History 250 Commits
.nuget		.nuget
.vs/DotJEM.Json.Index/v15/sqlite3		.vs/DotJEM.Json.Index/v15/sqlite3
DotJEM.Json.Index.Benchmarks		DotJEM.Json.Index.Benchmarks
DotJEM.Json.Index.Playground		DotJEM.Json.Index.Playground
DotJEM.Json.Index.Test		DotJEM.Json.Index.Test
DotJEM.Json.Index		DotJEM.Json.Index
DotJEM.Json.IndexTests		DotJEM.Json.IndexTests
.gitignore		.gitignore
.whitesource		.whitesource
DotJEM.Json.Index.sln		DotJEM.Json.Index.sln
LICENSE		LICENSE
README.md		README.md
appveyor.yml		appveyor.yml
avatar.png		avatar.png

License

dotJEM/json-index

Folders and files

Latest commit

History

Repository files navigation

dotJEM JSON Index

Configuration

What is next?

License

About

Resources

License

Stars

Watchers

Forks

Languages