-
Notifications
You must be signed in to change notification settings - Fork 0
Puppet_Elasticsearch
Elasticsearch is an open-source program for performing text queries on large datasets, it powers the search engines of many major websites.
ARCCSS has set up an Elasticsearch instance on the NCI cloud to provide analytics services to administrators and support staff, showing what types of jobs researchers are running at NCI through the Kibana tool.
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_talking_to_elasticsearch.html
Elasticsearch is a web service, it has interfaces for a variety of languages or you can send direct queries with cURL. Elasticsearch stores documents in JSON syntax, organised into 'indices'. Note that access is restricted by IP address, e.g. only accessdev can create records in the UMUI index. The Elasticsearch server name is [to be decided]
.
Below are some use examples taken from the Elasticsearch manual
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_indexing_employee_documents.html
To store a JSON document in Elasticsearch run
$ curl -XPOST elasticdev/megacorp/employee -d '{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}'
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_search_lite.html
You can see all the records in an index using
$ curl elasticdev/megacorp/_search
Narrow it down by specifying fields, e.g. documents with 'first_name': 'John'
$ curl elasticdev/megacorp/_search?q=first_name:John
There is a json interface for more complex queries
There is a decent guide for using Kibana and setting up dashboards available on its website. For the front page I've mostly used the 'Terms' type - this gathers a single variable from all documents and creates a plot showing their distribution - e.g. the UM version number in a basis file.
Apache acts as a gateway to Elasticsearch, since Elasticsearch itself doesn't have any access control mechanisms. Authentication for the Kibana webservice is done using the NCI LDAP account system, so can access it from anywhere. Submitting data to Elasticsearch in an automated fashion means we can't use LDAP auth, instead this is restricted by IP, so for instance only accessdev can write to the 'umui' index.
The Puppet repository used for the server is available at https://github.com/coecms/analytics-server
# Attachments