-
Notifications
You must be signed in to change notification settings - Fork 61
Elasticsearch
5.1 Install Python Elasticsearch Client
5.2 Install git and clone the repository
5.3 Put the movie_db data onto the Elasticsearch cluster
5.4 Perform a simple search in python
5.5 Additional notes
## IntroductionElasticsearch is a distributed NoSQL JSON document database derived from Lucene. Elasticsearch provides a full-text search service and is used quite extensively with websites such as Quora, Github, StackExchange and many more. The The RESTful API provides a simple to use interface with the distributed database allowing simple integration with websites. In this dev-op we will be deploying Elasticsearch on an AWS cluster and perform a simple query.
## Spin up AWS instancesWe would recommend using t2.micro instances with Ubuntu Server 14.04 LTS (HVM), SSD Volume Type and take advantage of Amazon’s Free Tier program. Be sure to terminate the instances when you are finished to prevent AWS charges if you go over the 700 hour limit. For practice you can try spinning up 3 nodes for Elasticsearch.
## Setup ElasticsearchElasticsearch will be installed on all nodes with the same configuration.
Run the following on the all nodes by SSH-ing into each node:
node$ sudo apt-get update Install java-development-kit node$ sudo apt-get install openjdk-7-jdk Install Elasticsearch node$ wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.5.2.tar.gz -P ~/Downloads node$ sudo tar -xvf ~/Downloads/elasticsearch-1.5.2.tar.gz -C /usr/local node$ sudo mv /usr/local/elasticsearch-1.5.2 /usr/local/elasticsearch Set the ELASTICSEARCH_HOME environment variable and add to PATH in .profile node$ nano ~/.profile # Add the following export ELASTICSEARCH_HOME=/usr/local/elasticsearch export PATH=$PATH:$ELASTICSEARCH_HOME/bin node$ source ~/.profile Install AWS Cloud Plugin for Elasticsearch node$ sudo $ELASTICSEARCH_HOME/bin/plugin install elasticsearch/elasticsearch-cloud-aws/2.5.0 Configure Elasticsearch for node discovery node$ sudo nano $ELASTICSEARCH_HOME/config/elasticsearch.yml
Change the access_key (AWS access key id), secret_key (AWS secret access key), region (cluster region), and group (security group name) to your AWS settings. Also change the name of your cluster to be something specific to you (otherwise Elasticsearch will assume all the nodes on your EC2 are yours.) Warning: BE CAREFUL NOT TO COMMIT THIS SCRIPT TO GITHUB SINCE IT HAS YOUR AWS CREDENTIALS.
cloud.aws.access_key: AKIAJVKQLSNIFBFH66EA cloud.aws.secret_key: d79HExZf1tyy9xl7IPNXogDfdc4lQR92scWQIZ+H cloud.aws.region: us-west-2 discovery.type: ec2 discovery.ec2.groups: your-security-group ################### Elasticsearch Configuration Example ################### … … cluster.name: my-cluster-name
Start Elasticsearch
node$ sudo $ELASTICSEARCH_HOME/bin/elasticsearch &## Check status of Elasticsearch cluster You can check to see if all nodes are up and running by executing the following on any of the nodes.
node$ curl --user elastic:changeme 'localhost:9200/_cat/health?v'
Output should look like the following with a 3 node cluster
You can also list the nodes from your cluster with the following
node$ curl 'localhost:9200/_cat/nodes?v'
Output should look similar to the following
## Example Search with Python Elasticsearch Client We will be using practice problems and datasets from exploringelasticsearch.com. The tutorial currently is using the popular JSON HTTP API, but we will be going through the exercise using the [python elasticsearch client](https://www.google.com/url?q=https%3A%2F%2Felasticsearch-py.readthedocs.org%2Fen%2Fmaster%2F&sa=D&sntz=1&usg=AFQjCNFPa1A-94IPTczjq_LyeWSvvCO93Q). The following can be executed on any node in the Elasticsearch cluster. ### Install Python Elasticsearch Clientnode$ sudo apt-get install python-pip node$ sudo pip install elasticsearch
<a name="install-git/>
node$ sudo apt-get install git node$ git clone https://github.com/andrewvc/ee-datasets### Put the movie_db data onto the Elasticsearch cluster
node$ cd ee-datasets node$ java -jar elastic-loader.jar http://localhost:9200 datasets/movie_db.eloader### Perform a simple search in python Here we will simply look for a movie that contains the word CIA in its description field. In the previous step when we loaded the movie_db data, we actually created an index called movie_db in Elasticsearch.
node$ python >>> from elasticsearch import Elasticsearch >>> import json >>> es = Elasticsearch(http_auth=('elastic','changeme')) >>> result = es.search(index="movie_db", body={'query': {'match': {'description': 'CIA'}}}) >>> print json.dumps(result, indent=2) { "hits": { "hits": [ { "_score": 0.067124054, "_type": "movie", "_id": "3", "_source": { "description": "A cast of characters challenge society's commonly held view that computer experts are not the beautiful people. Somehow, the CIA is hacked in under 5 minutes.", "title": "Swordfish", "actors": [ "John Travolta", "Hugh Jackman", "Halle Berry" ], "genre": [ "Action", "Crime" ], "_id": 3, "release_year": 2001 }, "_index": "movie_db" } ], "total": 1, "max_score": 0.067124054 }, "_shards": { "successful": 5, "failed": 0, "total": 5 }, "took": 25, "timed_out": false }### Additional notes If you are planning to shut down the Elasticsearch nodes you can try the following:
# Shutdown local node
$ curl --user elastic:changeme -XPOST 'http://localhost:9200/_cluster/nodes/_local/_shutdown'
# Shutdown all nodes in the cluster
$ curl --user elastic:changeme -XPOST 'http://localhost:9200/_shutdown'
Find out more about the Insight Data Engineering Fellows Program in New York and Silicon Valley, apply today, or sign up for program updates.
You can also read our engineering blog here.