Skip to content

Elasticsearch

Alex Vigdor edited this page Oct 8, 2018 · 2 revisions

groovity-elasticsearch

Groovity-elasticsearch is a data-source implementation for the groovity-data module that allows an application to leverage Elasticsearch as a datastore for reading and/or writing data.

The Elasticsearch data source REQUIRES knowledge of the elasticsearch base url; by default it attempts to locate elasticsearch on the localhost at the default port

es.baseUrl - defaults to http://localhost:9200/, configure a different value to overrides

e.g. -Des.baseUrl=http://somwhere.else:9200/

This data source defaults to a 60 second timeout connecting to and reading from elasticsearch, you can override this with the http.timeout configuration variables

e.g. -Dhttp.timeout=120

OR to target elasticsearch more exclusively

-D/data/sources/elasticsearch/http.timeout=240

The Elasticsearch data source recognizes 4 special configuration options on a data type:

es.index - the name of the elasticsearch index or alias to query for the type

es.type - the name of the type to search for within the elasticsearch index

es.date - the name of a date field to be used to watch for data changes

es.dateFormat - the date format associated with the date field (not needed if date is represented in raw millis)

This module contains a single trait IsElasticDoc that can be applied to DataModels to automatically capture _index, _type, _id and _version meta fields and ingest the document _source, as well as build the pointer for a newly created model.

For example, from the unit tests

static conf=[
	source:'elasticsearch',
	ttl:60,
	refresh:45,
	'es.index':'unit_test_shoe_inventory',
	'es.type':'shoe',
	'es.date':'modified'
]

public class Shoe implements DataModel, Stored, IsElasticDoc{
	boolean mens
	boolean womens
	boolean kids
	int eyelets
	float size
	Date modified

	def setModified(Date d){
		this.modified = d
	}

	def setModified(Number n){
		this.modified = new Date(n.toLong())
	}
}

new Shoe()

To create and store a new shoe:

factory('shoe').putAll(
  mens:true,
  size:10.5,
  eyelets:6,
  modified: System.currentTimeMillis()
).store()

This module also comes with a default elasticsearch data type that can be used to perform ad-hoc reads and writes against elasticsearch without defining a custom data type. For this to work you have to manually configure the _index and _type on a model.

load '/data/factory'

factory('elasticsearch').putAll(
    _index: 'myIndex',
    _type: 'myType',
    color: 'purple',
    size: 10
).store()

The elasticsearch module allows you to perform queries using native Elasticsearch query string syntax

load '/data/factory'

factory('elasticsearch','myIndex/myType/_search?q=size:>8').each{
  //...
}

The results the come back from using the elasticsearch data type are plain maps. You can also use query string syntax with your custom types and the results will be a list of DataModels of your type:

load '/data/factory'

factory('shoe','_search?q=size:>8&mens:true&sort=size:desc').each{
  // ...
}

Elasticsearch queries may return a mixture of types, and this will be reflected in the results. Let's say you have multiple custom data types configured in the same index and you want to search across all of them; you can define an additional data type that is attached to a specific index, but not to a specific type, and use it to query or watch against all types in that index.

public static conf = [
		source: 'elasticsearch',
		ttl: 30,
		refresh: 15,
		'es.index': 'content',
		'es.date': 'modified'
]

static contentWatcher

static start(){
  contentWatcher = load('/data/factory').watch('content'){ pointer ->
    //aggressively refresh cache entries for all content when modified
    load('/data/factory').refresh(pointer)
	}
}

static destroy(){
	contentWatcher?.cancel(true)
}

class Content implements DataModel, IsElasticDoc {

}

new Content()

Then to query you would call

load '/data/factory'

factory('content','_search?q=weather').each{
  //...
}

As long as the type of each result in elasticsearch exactly corresponds to the groovity type name, the result list from the query will contain an appropriate mixture of your DataModel types based on the elasticsearch types. So it pays to make sure you use the same type names in groovity and elasticsearch. The Content class itself doesn't implement any domain fields as it is only used here as a placeholder for the other data types.

You can perform more complex elasticsearch queries, for example processing aggregations or highlights, by using the elasticsearch "source" parameter on the search query url, or by directly supplying a JSON format query in place of a URL format query string. If you use either of these mechanisms, the target data type must be prepared to ingest the raw elasticsearch response and explicitly process hits and/or aggregations.

Clone this wiki locally