# **1. Getting Start with ElasticSearch**
Following - "All you need to know about using Elasticsearch in Python."
<br>
https://medium.com/codex/all-you-need-to-know-about-using-elasticsearch-in-python-b9ed00e0fdf0


In [1]:
# ------------------------- Create an ES Client -------------------------
# First, we need to create an Elasticsearch client:
from elasticsearch import Elasticsearch
es_client = Elasticsearch()
type(es_client)

elasticsearch.client.Elasticsearch

In [2]:
# ------------------------- Create an ES Client -------------------------
# Here, we didn’t specify anything to create the client and all default settings are used. 
# In your practical work, you would need to specify the host, user name and password to create 
# a valid client. To simulate this case, in the local development environment, you can also 
# create the client by specifying all the default settings:

es_client = Elasticsearch(
    "localhost:9200",
    http_auth=["elastic", "changeme"], 
) # “elastic, changeme, 9200” are the default user name, password and port for Elasticsearch.

# You can also create the client in this way:
    # es_client = Elasticsearch(
    #     hosts=[{"host": "localhost", "port": 9200}],
    #     http_auth=["elastic", "changeme"],
    # ) 
# Here “hosts” is list of nodes, or a single node we should connect to. Node should be a 
# dictionary ({"host": "localhost", "port": 9200}). Most of the time we would only 
# connect to a single node and it’s more convenient to use this format. http_auth is 
# a list or tuple where the first element is the user name and the second one is the password.

# If you want to have more advanced settings for authentication, you can check the official documentation.

In [26]:
# ------------------------- Create an ES Index Client -------------------------
# To work with indices, we need to use IndicesClient. To create an index client, we need to pass 
# in the Elasticsearch client created above:
from elasticsearch.client import IndicesClient
es_index_client = IndicesClient(es_client)
type(es_index_client)

elasticsearch.client.indices.IndicesClient

In [27]:
# ------------------------- Define the Settings & Mappings for an Index -------------------------
# Before we create an index, we need to define the settings and mappings for it. The settings 
# and mappings are not required to create an index. However, in practical usage, you always 
# need to define settings and mappings which can make your search engine more robust, more 
# efficient and more powerful. In this article, we will use this demo configuration:
configurations = {
    "settings": {
        "index": {"number_of_replicas": 2},
        "analysis": {
            "filter": {
                "ngram_filter": {
                    "type": "edge_ngram",
                    "min_gram": 2,
                    "max_gram": 15,
                },
            },
            "analyzer": {
                "ngram_analyzer": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": ["lowercase", "ngram_filter"],
                },
            },
        },
    },
    "mappings": {
        "properties": {
            "id": {"type": "long"},
            "name": {
                "type": "text",
                "analyzer": "standard",
                "fields": {
                    "keyword": {"type": "keyword"},
                    "ngrams": {"type": "text", "analyzer": "ngram_analyzer"},
                },
            },
            "brand": {
                "type": "text",
                "fields": {
                    "keyword": {"type": "keyword"},
                },
            },
            "price": {"type": "float"},
            "attributes": {
                "type": "nested",
                "properties": {
                    "attribute_name": {"type": "text"},
                    "attribute_value": {"type": "text"},
                },
            },
        }
    },
}
# 【1】If you want to be an expert in Elasticsearch, you would need to know more about the settings and mappings 
# for an index: https://www.elastic.co/guide/en/elasticsearch/reference/current/analyzer.html
# 【2】In this example, we define the "number_of_replicas" for our Elasticsearch =2, which will make no difference 
# in a local development environment, but in production multiple replicas can improve availability and fault tolerance.
# 【3】Besides, we define the fields for our document in the "mappings" section. Elasticsearch supports dynamic mapping, 
# which means we don’t need to define the field types in advance and Elasticsearch will create them automatically. 
# However, we should always define the mapping whenever possible. It is better to be explicit about the mapping than 
# implicit. The more you know about your data, the more robust the search engine can be.
# 【4】Finally, we define an "ngram_filter" and "analyzer" in the "settings" section which supports searching by 
# partial input or autocompletion, which will be demonstrated later.

print(configurations,'\n','The datatype of configurations is:',type(configurations))

{'settings': {'index': {'number_of_replicas': 2}, 'analysis': {'filter': {'ngram_filter': {'type': 'edge_ngram', 'min_gram': 2, 'max_gram': 15}}, 'analyzer': {'ngram_analyzer': {'type': 'custom', 'tokenizer': 'standard', 'filter': ['lowercase', 'ngram_filter']}}}}, 'mappings': {'properties': {'id': {'type': 'long'}, 'name': {'type': 'text', 'analyzer': 'standard', 'fields': {'keyword': {'type': 'keyword'}, 'ngrams': {'type': 'text', 'analyzer': 'ngram_analyzer'}}}, 'brand': {'type': 'text', 'fields': {'keyword': {'type': 'keyword'}}}, 'price': {'type': 'float'}, 'attributes': {'type': 'nested', 'properties': {'attribute_name': {'type': 'text'}, 'attribute_value': {'type': 'text'}}}}}} 
 The datatype of configurations is: <class 'dict'>


In [28]:
# ------------------------- Create an ES Index -------------------------
# To create an Elasticsearch index with above settings, run:
es_index_client.create(index="laptops-demo", body=configurations)
# "index": The name of the index; in Kibana, the Index's ID is automatically set to be "_id" : "1",
# "body": The configuration for the index (settings and mappings) defined in the previous cell
# Check: Successfully created an "Index" in this way!
# Check the created index in Kibana: http://localhost:5601 --> choose "explore on my own" --> Dev tools (top right) 
# --> Then run the following queries to check the settings and mappings of the created index:
    # GET _cat/indices             #Check running indeces
    # GET laptops-demo/_settings   #Check index settings
    # GET laptops-demo/_mapping    #Check index mapping

{'acknowledged': True, 'shards_acknowledged': True, 'index': 'laptops-demo'}

In [29]:
# ------------------------- Create & Get Alias of ES Index(es) -------------------------
# 【1】we can create an alias for our index with the following command. You can use an alias to access an index just 
# as index itself:
es_index_client.put_alias(index="laptops-demo", name="laptops")
# There can be multiple aliases for an index and there can be multiple indices with the same alias, which can be 
# useful to group relevant indices together.

# 【2】To get the aliases of an index:
es_index_client.get_alias(index="laptops-demo")

#【3】To get all the indices with the same alias, just specify the alias name as the index name:
es_index_client.get_alias(index="laptops", allow_no_indices=True, ignore_unavailable=True)
    # "allow_no_indices=True": No error will be raised if there are indices with the specified alias.
    # "ignore_unavailable=True": No error will be raised if the specified index or alias does not exist.

{'laptops-demo': {'aliases': {'laptops': {}}}}

In [None]:
# ------------------------- DELETE ES Index(es) / Aliases -------------------------
# 【1】You can delete an index with the index client:
# es_index_client.delete(index="laptops-demo", ignore=404)
# "ignore=404": if the index to be deleted does not exist, no error will be raised.

# 【2】You can also delete an alias for an index:
# es_index_client.delete_alias(index="laptops-demo", name="laptops")


# **2. Creating ES Documents**

In [13]:
# ------------------------- Create a Single Document -------------------------
# Now that we have an index created with proper settings and mappings, we can start to add documents to it. 
# To create documents in Python, we need to use the client (es_client) created in the beginning of this article. 
# To create a single document manually, we can use the index method of the client:
doc = {
    "id": 1,   ##### This is document id
    "name": "HP EliteBook 820 G2",
    "brand": "HP",
    "price": 38842.00,
    "attributes": [
        {"attribute_name": "cpu", "attribute_value": "Intel Core i7-5500U"},
        {"attribute_name": "memory", "attribute_value": "8GB"},
        {"attribute_name": "storage", "attribute_value": "256GB"},
    ],
}
es_client.index(index="laptops-demo", id=1, body=doc) 
##### Here "id" is document id under index "laptops-demo" which has an "_id"=1

{'_index': 'laptops-demo',
 '_type': '_doc',
 '_id': '1',
 '_version': 3,
 'result': 'updated',
 '_shards': {'total': 3, 'successful': 1, 'failed': 0},
 '_seq_no': 2,
 '_primary_term': 1}

In [21]:
# ------------------------- Inspect a Single Document -------------------------
#【1】Check the results in Kibana, because the index name, field names, and commands can be auto-completed and 
# formatted. Besides, the results are also nicely formatted for easy readability. In Kibana, run:
    # "GET laptops-demo/_doc/1" in Kibana Dev Tools Console
    
#【2】Of course you can also check the result in Python if you prefer:
es_client.get(index="laptops-demo", id=1) 

{'_index': 'laptops-demo',
 '_type': '_doc',
 '_id': '1',
 '_version': 3,
 '_seq_no': 2,
 '_primary_term': 1,
 'found': True,
 '_source': {'id': 1,
  'name': 'HP EliteBook 820 G2',
  'brand': 'HP',
  'price': 38842.0,
  'attributes': [{'attribute_name': 'cpu',
    'attribute_value': 'Intel Core i7-5500U'},
   {'attribute_name': 'memory', 'attribute_value': '8GB'},
   {'attribute_name': 'storage', 'attribute_value': '256GB'}]}}

In [None]:
# ------------------------- Create Documents in Bulk -------------------------
# Python is not that useful if you just want to create one or two documents. Kibana can be more useful if you just 
# want to do CRUD operations on a couple of documents manually. The real power of Python is batch processing. 
# When you have a large number of documents to create, you can write some script to do it. Suppose you have a csv 
# feed file for the laptops which need to be indexed. To create documents in bulk, you need to use the bulk method 
# of the client to process documents shown in the next cell below. 
# The format to be used is the same the bulk API: https://elasticsearch-py.readthedocs.io/en/master/helpers.html 


In [None]:
# ------------------------- Documents Processing Actions -------------------------
#【1】Both the "index" and "create" Actions: would create a new index and expect a source(="field") on the next line. 
# The difference is that "create" fails if a document with the same ID already exists in the target, while 
# "index" adds or replaces a document as necessary.
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }

#【2】"update" Action: updates an existing index and expects that the "fields" to be updated on the next line.
{ "create" : { "_index" : "test", "_id" : "2" } }
{ "field1" : "value3" }

{ "update" : {"_index" : "test", "_id" : "1" } }
{ "doc" : {"field2" : "value2"} }

#【3】"delete" Action: deletes a document and does not expect a source on the next line.
{ "delete" : { "_index" : "test", "_id" : "2" } }


In [49]:
# Check the csv file to be used
import pandas as pd
data = pd.read_csv('csv_files/laptops_demo.csv')
data.head()

Unnamed: 0,id,name,price,brand,cpu,memory,storage
0,1,HP EliteBook 820 G2,38842,HP,Intel Core i7-5500U,8GB,256GB
1,2,Lenovo IdeaPad Y700-15,9405,Lenovo,Intel Core i5-6300HQ,8GB,256GB
2,3,HP ProBook 640 G2,17589,HP,Intel Core i5-6200U,8GB,128GB
3,4,HP EliteBook 840 G3,31905,HP,Intel Core i5-6200U,8GB,256GB
4,5,HP EliteBook 820 G3,34890,HP,Intel Core i7-6500U,8GB,256GB


In [81]:
# ------------------------- Read Data and Bulk Process Docs -------------------------
# To create documents in bulk in Python, we need read data from the csv file and convert the data into the format 
# which the bulk API expects. We can use the following code to read the data, convert the data and create the 
# documents in Python:

import csv
import json

colunms = ["id", "name", "price", "brand", "cpu", "memory", "storage"]
index_name = "laptops-demo"

with open("csv_files/laptops_demo.csv", "r") as f:
    reader = csv.DictReader(f, fieldnames=colunms, delimiter=",", quotechar='"') 
    # Create an object(="reader") that operates like a regular reader but maps the information in each row to 
    # a "dict" whose keys are given by the optional "fieldnames" parameter which is a sequence. If the 
    # "fieldnames" is omitted, the values in the first row of file "f" will be used as the "fieldnames"
    
    next(reader)  #This skips the first row of the CSV file which is the header(="colunms").

    action_list = []  #Empty list for actions to be taken for bulk processing
    
    for row in reader:
        
        action = {"index": {"_index": index_name, "_id": int(row["id"])}}  
        # We are using the "index" keyword (the 1st "index" in outer {}) to create the documents. The index action 
        # can add or replace a document as necessary. Therefore, you can run the code multiple times and will get the same result.
        # Every action is under the "index" defined as "laptops-demo"
    
    
        doc = {
                "id": int(row["id"]), #Every row is a "dict", need to use "keys" defined in "colunms" to get "values"
                "name": row["name"],
                "price": float(row["price"]),
                "brand": row["brand"],
                "attributes": [
                                {"attribute_name": "cpu", "attribute_value": row["cpu"]},
                                {"attribute_name": "memory", "attribute_value": row["memory"]},
                                {"attribute_name": "storage", "attribute_value": row["storage"],},
                                ],
                }
        # For each "index" action, there should be a document immediately after it. The document should be formatted 
        # according to the mappings defined at the beginning of this article.
        
        
        action_list.append(json.dumps(action))  
        action_list.append(json.dumps(doc)) 
        # The "json.dumps()" module converts dictionaries in Python to JSON objects which are required by the bulk API. 
        # You can read this article to read more about Python dictionary and JSON and the caveats related:
        # https://lynn-kwong.medium.com/python-json-tricks-how-to-deal-with-jsondecodeerror-2353464814bc 
    
    
    print(action_list[0],"\n","\n",
          action_list[1],"\n","\n",
          action_list[2],"\n","\n",
          action_list[3],"\n","\n", 
          type(action_list[0]), type(action_list[1]))
    # The resulting "action_list" is a list of "json strings formed by '{}'", with each 2 strings being a pair of 
    # "Action string + Doc info string". Each action is targeted at "Index"="laptops-demo"
    # Each corresponding "Doc" contains info to be feed into ES as we did with single "Doc"


{"index": {"_index": "laptops-demo", "_id": 1}} 
 
 {"id": 1, "name": "HP EliteBook 820 G2", "price": 38842.0, "brand": "HP", "attributes": [{"attribute_name": "cpu", "attribute_value": "Intel Core i7-5500U"}, {"attribute_name": "memory", "attribute_value": "8GB"}, {"attribute_name": "storage", "attribute_value": "256GB"}]} 
 
 {"index": {"_index": "laptops-demo", "_id": 2}} 
 
 {"id": 2, "name": "Lenovo IdeaPad Y700-15", "price": 9405.0, "brand": "Lenovo", "attributes": [{"attribute_name": "cpu", "attribute_value": "Intel Core i5-6300HQ"}, {"attribute_name": "memory", "attribute_value": "8GB"}, {"attribute_name": "storage", "attribute_value": "256GB"}]} 
 
 <class 'str'> <class 'str'>


In [82]:
# ------------------------- Create JSON Action+Doc File -------------------------
# Write the "action_list" into a json file
with open("laptops_demo.json", "w") as write_file:
    write_file.write("\n".join(action_list))

In [83]:
# ------------------------- Feed the JSON File to ES - Bulk Upload!!! -------------------------
es_client.bulk(body="\n".join(action_list))

{'took': 28,
 'errors': False,
 'items': [{'index': {'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '1',
    '_version': 1,
    'result': 'created',
    '_shards': {'total': 3, 'successful': 1, 'failed': 0},
    '_seq_no': 0,
    '_primary_term': 1,
    'status': 201}},
  {'index': {'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '2',
    '_version': 1,
    'result': 'created',
    '_shards': {'total': 3, 'successful': 1, 'failed': 0},
    '_seq_no': 1,
    '_primary_term': 1,
    'status': 201}},
  {'index': {'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '3',
    '_version': 1,
    'result': 'created',
    '_shards': {'total': 3, 'successful': 1, 'failed': 0},
    '_seq_no': 2,
    '_primary_term': 1,
    'status': 201}},
  {'index': {'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '4',
    '_version': 1,
    'result': 'created',
    '_shards': {'total': 3, 'successful': 1, 'failed': 0},
    '_seq_no': 3,
    '_primary_term': 1,
    'status':

In [86]:
# ------------------------- Inspect Documents -------------------------
#【1】Check "Docs" using Kibana:
GET laptops-demo/_search 
or
GET laptops-demo/_doc/135
# In Kibana Console you can see: 
# {
#   "took" : 1, ------------------------------> is the overall ES search level, "took" 1 sec
#   "timed_out" : false,
#   "_shards" : {
#     "total" : 1,
#     "successful" : 1,
#     "skipped" : 0,
#     "failed" : 0
#   },
#   "hits" : { -------------------------------> shows total "Docs" retrived: 200 "Docs" have been created successfully!!
#     "total" : {
#       "value" : 200,
#       "relation" : "eq"
#     },
#     "max_score" : 1.0,
#     "hits" : [
#       {
#         "_index" : "laptops-demo",----------> "Index" level info
#         "_type" : "_doc",
#         "_id" : "1",
#         "_score" : 1.0,
#         "_source" : {
#           "id" : 1, ------------------------> "Doc" level info, this is "Doc with id=1"
#           "name" : "HP EliteBook 820 G2",
#           "price" : 38842.0,
#           "brand" : "HP",
#           "attributes" : [
#             {
#               "attribute_name" : "cpu",
#               "attribute_value" : "Intel Core i7-5500U"
#             },
#             {
#               "attribute_name" : "memory",
#               "attribute_value" : "8GB"
#             },
#             {
#               "attribute_name" : "storage",
#               "attribute_value" : "256GB"
#             }
#           ]
#         }
#       },  to here, it's the fist Doc, Kibana by default only shows 10 Docs

#【2】Check "Docs" in Python:
es_client.get(index="laptops-demo", id=3) ###Here the "id" is the id of a "Doc" in ES or a row in original file

{'_index': 'laptops-demo',
 '_type': '_doc',
 '_id': '3',
 '_version': 1,
 '_seq_no': 2,
 '_primary_term': 1,
 'found': True,
 '_source': {'id': 3,
  'name': 'HP ProBook 640 G2',
  'price': 17589.0,
  'brand': 'HP',
  'attributes': [{'attribute_name': 'cpu',
    'attribute_value': 'Intel Core i5-6200U'},
   {'attribute_name': 'memory', 'attribute_value': '8GB'},
   {'attribute_name': 'storage', 'attribute_value': '128GB'}]}}

# **3. Using ES to SEARCH!**

In [87]:
# ------------------------- Search a field - "Name" of Documents -------------------------
# We can search for documents based on different conditions. 
# For examples, search for the "name" of a Doc. In Kibana, the query to use is:
# GET laptops-demo/_search
# {
#   "query": {
#     "match": {
#       "name": "Apple" 
#     }
#   }
# }
### ES seaches the "name"="Apple" in both lower and upper case for each letter

# The corresponding Python code is:
search_query = {
    "query": {
        "match": {
        "name": "Apple"
        }
    }
}
es_client.search(index="laptops-demo", body=search_query)
# "hits"-"total"-"value"=6 shows the total number of "Docs" that matches "Apple"


{'took': 0,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 6, 'relation': 'eq'},
  'max_score': 3.8337994,
  'hits': [{'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '131',
    '_score': 3.8337994,
    '_source': {'id': 131,
     'name': 'Apple MacBook Air',
     'price': 16795.0,
     'brand': 'Apple',
     'attributes': [{'attribute_name': 'cpu',
       'attribute_value': 'Intel Core i5-8210Y'},
      {'attribute_name': 'memory', 'attribute_value': '8GB'},
      {'attribute_name': 'storage', 'attribute_value': '256GB'}]}},
   {'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '132',
    '_score': 3.8337994,
    '_source': {'id': 132,
     'name': 'Apple MacBook Pro',
     'price': 18990.0,
     'brand': 'Apple',
     'attributes': [{'attribute_name': 'cpu',
       'attribute_value': 'Intel Core i5-8279U'},
      {'attribute_name': 'memory', 'attribute_value': '8GB'},
      {'attribute_name': '

In [88]:
# ------------------------- Search a Part Text using "ngram" -------------------------
# Since we use "ngram" in our "filter" and "analyzer" for the "name" field, we can do search-as-you-type search, 
# or autocompletion search, namely we can search by queries which are part of the exact data. For example:
search_query = {
    "query": {
        "match": {
        "name.ngrams": "Appl"
        }
    }
}
es_client.search(index="laptops-demo", body=search_query)
# With query Apple and Appl, you get the same results. This is the power of ngram in Elasticsearch, 
# which can be really helpful in many scenarios.

{'took': 8,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 6, 'relation': 'eq'},
  'max_score': 6.4380374,
  'hits': [{'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '131',
    '_score': 6.4380374,
    '_source': {'id': 131,
     'name': 'Apple MacBook Air',
     'price': 16795.0,
     'brand': 'Apple',
     'attributes': [{'attribute_name': 'cpu',
       'attribute_value': 'Intel Core i5-8210Y'},
      {'attribute_name': 'memory', 'attribute_value': '8GB'},
      {'attribute_name': 'storage', 'attribute_value': '256GB'}]}},
   {'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '132',
    '_score': 6.4380374,
    '_source': {'id': 132,
     'name': 'Apple MacBook Pro',
     'price': 18990.0,
     'brand': 'Apple',
     'attributes': [{'attribute_name': 'cpu',
       'attribute_value': 'Intel Core i5-8279U'},
      {'attribute_name': 'memory', 'attribute_value': '8GB'},
      {'attribute_name': '