Skip to content

Puppet_Elasticsearch

Paul Leopardi edited this page Feb 11, 2024 · 10 revisions

Elasticsearch

Elasticsearch is an open-source program for performing text queries on large datasets, it powers the search engines of many major websites.

ARCCSS has set up an Elasticsearch instance on the NCI cloud to provide analytics services to administrators and support staff, showing what types of jobs researchers are running at NCI through the Kibana tool.

Using Elasticsearch

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_talking_to_elasticsearch.html

Elasticsearch is a web service, it has interfaces for a variety of languages or you can send direct queries with cURL. Elasticsearch stores documents in JSON syntax, organised into 'indices'. Note that access is restricted by IP address, e.g. only accessdev can create records in the UMUI index. The Elasticsearch server name is [to be decided].

Below are some use examples taken from the Elasticsearch manual

Sending data to Elasticsearch

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_indexing_employee_documents.html

To store a JSON document in Elasticsearch run

$ curl -XPOST elasticdev/megacorp/employee -d '{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}'

Querying Elasticsearch

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_search_lite.html

You can see all the records in an index using

$ curl elasticdev/megacorp/_search

Narrow it down by specifying fields, e.g. documents with 'first_name': 'John'

$ curl elasticdev/megacorp/_search?q=first_name:John

There is a json interface for more complex queries

Using Kibana

There is a decent guide for using Kibana and setting up dashboards available on its website. For the front page I've mostly used the 'Terms' type - this gathers a single variable from all documents and creates a plot showing their distribution - e.g. the UM version number in a basis file.

Server Structure

Apache acts as a gateway to Elasticsearch, since Elasticsearch itself doesn't have any access control mechanisms. Authentication for the Kibana webservice is done using the NCI LDAP account system, so can access it from anywhere. Submitting data to Elasticsearch in an automated fashion means we can't use LDAP auth, instead this is restricted by IP, so for instance only accessdev can write to the 'umui' index.

The Puppet repository used for the server is available at https://github.com/coecms/analytics-server

elasticdev.png# Attachments

Clone this wiki locally