# **2. ES Search & Analyze**
In this article, we will use the Console of the Dev tools in Kibana to build the queries in Elasticsearch Domain Specific Language (DSL). Kibana is a free and open frontend application that sits on top of the Elastic Stack, providing search and data visualization capabilities for data indexed in Elasticsearch. Kibana also acts as the user interface for monitoring and managing an Elastic Stack cluster. It is very convenient to write Elasticsearch queries in Kibana because there are hints and autocompletion for indices, fields, and commands. The queries built-in Kibana can be used directly in other languages like Python. Therefore, it is always a good idea to write and test Elasticsearch queries in Kibana and then implement them in other languages.

If you have installed Elasticsearch and Kibana on your computer or have started the corresponding Docker containers as in the previous article, you can open your browser and navigate to http://127.0.0.1:5601 to open the UI for Kibana. On the first page opened, click Explore on my own to work with our own data. If you don’t want to follow along, you can also learn by reading the queries and explanations in this article.

Following: https://medium.com/codex/learn-elasticsearch-from-practical-examples-495f2f8db83e

## **2.1. Creating ES Client & Index**
In the remaining part of this article, we will focus on how to search an Elasticsearch index with basic and advanced queries. To get started, we need to have some data in our index. If you haven’t followed the Elasticsearch and Python article and generated the laptops-demo index with Python, you need to create the index and populate it with some data as demonstrated below.

In [None]:
# ------------------------- DELETE Existing ES Client to rerun this notebook -------------------------
# In Kibana, use the following code to delete previously created Index:
DELETE laptops-demo

# ------------------------- Check Current State of ES in Kibana Console -------------------------
GET _cat/indices

In [2]:
# ------------------------- Create an ES Client -------------------------
from elasticsearch import Elasticsearch
es_client = Elasticsearch(
    "localhost:9200",
    http_auth=["elastic", "changeme"], 
) 
# ------------------------- Create an ES Index Client -------------------------
from elasticsearch.client import IndicesClient
es_index_client = IndicesClient(es_client)
type(es_index_client)

# ------------------------- Define the Settings & Mappings (Different from "1.ES Basics"!!!)) -------------------------
configurations = {
  "settings": {
    "index": {
      "number_of_replicas": 1},
    "analysis": {
      "filter": {
        "ngram_filter": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 15}},
      "analyzer": {
        "ngram_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "ngram_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "long"},
      "name": {
        "type": "text",
        "analyzer": "standard",
        "fields": {
          "keyword": {
            "type": "keyword"},
          "ngrams": {
            "type": "text",
            "analyzer": "ngram_analyzer"}
            }
      },
      "brand": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"}
        }
      },
      "price": {
        "type": "float"},
      "attributes": {
        "type": "nested",
        "properties": {
          "attribute_name": {
            "type": "text"},
          "attribute_value": {
            "type": "text"}
        }
      }
    }
  }
}

# ------------------------- Create an ES Index -------------------------
es_index_client.create(index="laptops-demo", body=configurations)

# In this query, we create an index called laptops-demo. The "settings" section specifies that a "filter" 
# called "ngram_filter" and an "analyzer" called "ngram_analyzer" is created. The "mappings" section specifies 
# the "schema" for the documents to be created. Specially, the "name" field has multiple fields and the "ngrams" 
# field is analyzed with ngram_analyzer created in the settings section. More details will be introduced for 
# the settings and mappings later in this article.

# ------------------------- Read Data and Bulk Process Docs -------------------------
import csv
import json

colunms = ["id", "name", "price", "brand", "cpu", "memory", "storage"]
index_name = "laptops-demo"

with open("csv_files/laptops_demo.csv", "r") as f:
    reader = csv.DictReader(f, fieldnames=colunms, delimiter=",", quotechar='"') 
    
    next(reader)  

    action_list = []  
    
    for row in reader:
        action = {"index": {"_index": index_name, "_id": int(row["id"])}}  
        doc = {
                "id": int(row["id"]), 
                "name": row["name"],
                "price": float(row["price"]),
                "brand": row["brand"],
                "attributes": [
                                {"attribute_name": "cpu", "attribute_value": row["cpu"]},
                                {"attribute_name": "memory", "attribute_value": row["memory"]},
                                {"attribute_name": "storage", "attribute_value": row["storage"],},
                                ],
                }
        action_list.append(json.dumps(action))  
        action_list.append(json.dumps(doc)) 

# ------------------------- Feed the JSON File to ES - Bulk Upload!!! -------------------------
with open("laptops_demo.json", "w") as write_file:
    write_file.write("\n".join(action_list))
    
es_client.bulk(body="\n".join(action_list))


  es_index_client.create(index="laptops-demo", body=configurations)


{'took': 24,
 'errors': False,
 'items': [{'index': {'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '1',
    '_version': 1,
    'result': 'created',
    '_shards': {'total': 2, 'successful': 1, 'failed': 0},
    '_seq_no': 0,
    '_primary_term': 1,
    'status': 201}},
  {'index': {'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '2',
    '_version': 1,
    'result': 'created',
    '_shards': {'total': 2, 'successful': 1, 'failed': 0},
    '_seq_no': 1,
    '_primary_term': 1,
    'status': 201}},
  {'index': {'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '3',
    '_version': 1,
    'result': 'created',
    '_shards': {'total': 2, 'successful': 1, 'failed': 0},
    '_seq_no': 2,
    '_primary_term': 1,
    'status': 201}},
  {'index': {'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '4',
    '_version': 1,
    'result': 'created',
    '_shards': {'total': 2, 'successful': 1, 'failed': 0},
    '_seq_no': 3,
    '_primary_term': 1,
    'status':

## **2.2. ES Search & Analyze!!!**

In [3]:
# ------------------------- 【1】Search: all Documents - check if upload is done -------------------------
# If the curl command (here we use python) finishes successfully, the documents would be added to the laptops-demo index. 
# To check all the documents in an index, run this command:
        # GET laptops-demo/_search
        # # It is the short version for:
        # GET laptops-demo/_search
        # {
        #   "query": {
        #     "match_all": {}
        #   }
        # }
# The corresponding Python code is:
search_query = {
    "query": {
        "match_all": {}
    }
}
es_client.search(index="laptops-demo", body=search_query)


  es_client.search(index="laptops-demo", body=search_query)


{'took': 0,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 200, 'relation': 'eq'},
  'max_score': 1.0,
  'hits': [{'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '1',
    '_score': 1.0,
    '_source': {'id': 1,
     'name': 'HP EliteBook 820 G2',
     'price': 38842.0,
     'brand': 'HP',
     'attributes': [{'attribute_name': 'cpu',
       'attribute_value': 'Intel Core i7-5500U'},
      {'attribute_name': 'memory', 'attribute_value': '8GB'},
      {'attribute_name': 'storage', 'attribute_value': '256GB'}]}},
   {'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '2',
    '_score': 1.0,
    '_source': {'id': 2,
     'name': 'Lenovo IdeaPad Y700-15',
     'price': 9405.0,
     'brand': 'Lenovo',
     'attributes': [{'attribute_name': 'cpu',
       'attribute_value': 'Intel Core i5-6300HQ'},
      {'attribute_name': 'memory', 'attribute_value': '8GB'},
      {'attribute_name': 'storage', 'attribut

In [4]:
# ------------------------- 【2】Search: Specify the "_sources" Keyword -------------------------
# By default, all the fields are returned which can be difficult to read. To show only specific fields, 
# for example, we only what to see the "name" field and ignore the fields: "id", "price", "brand", "attributes",
# we can specify the _sources keyword: 
        # GET laptops-demo/_search
        # {
        #   "query": {
        #     "match_all": {}
        #   },
        #   "_source": ["name"]
        # }
# If you don’t want to see the source at all, you can set _source to false:
        # GET laptops-demo/_search
        # {
        #   "query": {
        #     "match_all": {}
        #   },
        #   "_source": false
        # }

# The corresponding Python code is:
search_query = {
    "query": {
        "match_all": {}
    },
    "_source": ["name"] 
}
es_client.search(index="laptops-demo", body=search_query)

  es_client.search(index="laptops-demo", body=search_query)


{'took': 1,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 200, 'relation': 'eq'},
  'max_score': 1.0,
  'hits': [{'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '1',
    '_score': 1.0,
    '_source': {'name': 'HP EliteBook 820 G2'}},
   {'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '2',
    '_score': 1.0,
    '_source': {'name': 'Lenovo IdeaPad Y700-15'}},
   {'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '3',
    '_score': 1.0,
    '_source': {'name': 'HP ProBook 640 G2'}},
   {'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '4',
    '_score': 1.0,
    '_source': {'name': 'HP EliteBook 840 G3'}},
   {'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '5',
    '_score': 1.0,
    '_source': {'name': 'HP EliteBook 820 G3'}},
   {'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '6',
    '_score': 1.0,
    '_source': {'name': 'HP EliteBook x360 1030 G2'}},

In [5]:
# ------------------------- 【3】Search: Specify Values using "match" Keyword -------------------------
# As the name implies match_all matches everything. To search based on some conditions, we can use the 
# "match" keyword. For example, to search for all laptops whose name contains Apple,
# we need to search the value "Apple" in the "name" field of a Doc. 
# ES seaches the "name"="Apple" in both lower and upper case for each letter.
# In Kibana, the query to use is:
        # GET laptops-demo/_search
        # {
        #   "query": {
        #     "match": {
        #       "name": "Apple" 
        #     }
        #   }
        # }

# The corresponding Python code is:
search_query = {
    "query": {
        "match": {
        "name": "Apple"
        }
    }
}
es_client.search(index="laptops-demo", body=search_query)
# "hits"-"total"-"value"=6 shows the total number of "Docs" that matches "Apple"


  es_client.search(index="laptops-demo", body=search_query)


{'took': 2,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 6, 'relation': 'eq'},
  'max_score': 3.8337994,
  'hits': [{'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '131',
    '_score': 3.8337994,
    '_source': {'id': 131,
     'name': 'Apple MacBook Air',
     'price': 16795.0,
     'brand': 'Apple',
     'attributes': [{'attribute_name': 'cpu',
       'attribute_value': 'Intel Core i5-8210Y'},
      {'attribute_name': 'memory', 'attribute_value': '8GB'},
      {'attribute_name': 'storage', 'attribute_value': '256GB'}]}},
   {'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '132',
    '_score': 3.8337994,
    '_source': {'id': 132,
     'name': 'Apple MacBook Pro',
     'price': 18990.0,
     'brand': 'Apple',
     'attributes': [{'attribute_name': 'cpu',
       'attribute_value': 'Intel Core i5-8279U'},
      {'attribute_name': 'memory', 'attribute_value': '8GB'},
      {'attribute_name': '

In [7]:
# ------------------------- 【4】Analyze: How the Standard Analyzer Works? -------------------------
# We know from the "mappings" above that the "name" field is a text field and will be analyzed. Here we use 
# a "standard analyzer", which as mentioned above, will lower case the text, split it into tokens and 
# remove punctuations. You can read more about analyzing in this link: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis.html 
# To see how a text will be analyzed, we can use the analyze endpoint:
        # GET laptops-demo/_analyze
        # {
        #   "text": "Apple MackBook!",
        #   "analyzer": "standard"
        # }

# The corresponding Python code is:
analyze_query = {
    "text": "Apple MackBook!",
    "analyzer": "standard"
    }
es_index_client.analyze(index="laptops-demo", body=analyze_query)
# Expalin: "start_offset" of the second word "Macbook!" is 6 (0-indexed) and "end_offset" of it is 14.
# At position=14 is the "!", since tokenizer is "standard", punctuation is removed and therefore this
# position is the ending position of the word "Macbook"


{'tokens': [{'token': 'apple',
   'start_offset': 0,
   'end_offset': 5,
   'type': '<ALPHANUM>',
   'position': 0},
  {'token': 'mackbook',
   'start_offset': 6,
   'end_offset': 14,
   'type': '<ALPHANUM>',
   'position': 1}]}

In [9]:
# ------------------------- 【5-1】Analyze: How the "ngram_analyzer" Analyzer Works? -------------------------
# Since we defined a special analyzer "ngram_analyzer" in the "settings" of the index, let’s use this analyzer 
# to analyze our text and see what we will get:
        # GET laptops-demo/_analyze
        # {
        #   "text": "Apple MackBook!",
        #   "analyzer": "ngram_analyzer"
        # }

# The corresponding Python code is:
analyze_query = {
    "text": "Apple MackBook!",
    "analyzer": "ngram_analyzer"
    }
es_index_client.analyze(index="laptops-demo", body=analyze_query)
#Explain: We can see a bunch of N-grams are generated: starting from the first 2 letters of a word: "ap" & "ma"

{'tokens': [{'token': 'ap',
   'start_offset': 0,
   'end_offset': 5,
   'type': '<ALPHANUM>',
   'position': 0},
  {'token': 'app',
   'start_offset': 0,
   'end_offset': 5,
   'type': '<ALPHANUM>',
   'position': 0},
  {'token': 'appl',
   'start_offset': 0,
   'end_offset': 5,
   'type': '<ALPHANUM>',
   'position': 0},
  {'token': 'apple',
   'start_offset': 0,
   'end_offset': 5,
   'type': '<ALPHANUM>',
   'position': 0},
  {'token': 'ma',
   'start_offset': 6,
   'end_offset': 14,
   'type': '<ALPHANUM>',
   'position': 1},
  {'token': 'mac',
   'start_offset': 6,
   'end_offset': 14,
   'type': '<ALPHANUM>',
   'position': 1},
  {'token': 'mack',
   'start_offset': 6,
   'end_offset': 14,
   'type': '<ALPHANUM>',
   'position': 1},
  {'token': 'mackb',
   'start_offset': 6,
   'end_offset': 14,
   'type': '<ALPHANUM>',
   'position': 1},
  {'token': 'mackbo',
   'start_offset': 6,
   'end_offset': 14,
   'type': '<ALPHANUM>',
   'position': 1},
  {'token': 'mackboo',
   'start_

In [None]:
# ------------------------- 【5-2】Analyze: How the "ngram_analyzer" Analyzer Works? -------------------------
# These N-grams are useful for search-as-you-type or autocompletion, let’s try to search by a partial input:
# In Kibana, the query to use is:
        # GET laptops-demo/_search
        # {
        #   "query": {
        #     "match": {
        #       "name.ngrams": "Appl"
        #     }
        #   }
        # }

# The corresponding Python code is:
search_query = {
    "query": {
        "match": {
        "name.ngrams": "Appl"
        }
    }
}
es_client.search(index="laptops-demo", body=search_query)
# With query Apple and Appl, you get the same results. This is the power of ngram in Elasticsearch.
# If you search by "Appl" in the "name" field rather than "name.ngrams" field, you can’t find anything 
# because the "name" field is analyzed by the "standard analyzer" and thus does not have the N-grams.
# Under the "name" field, there are "type": "text", "analyzer": "standard", and "fields" which include
# 2 properties: "keyword": {"type": "keyword"} and "ngrams": {"type": "text", "analyzer": "ngram_analyzer"}
# Therefore, "name" field can utilize the "ngram_analyzer" defined in the "setting", simply use "ngrams" 
# as if it's a method of the "name" field: {"name.ngrams": "Appl"}

You might be wondering what’s the relationship between the "name" field and "name.ngrams" field. Well, this is called <u>**multi-fields**</u> which is to <u>index the same field in different ways for different purposes</u>. As a text field is always analyzed, we often add some additional fields for it. For example, a <u>**"keyword type"**</u> field is often added to a text field. The <u>**"keyword type"**</u> means the text will be treated as it is and won’t be analyzed. <u>Multi-fields</u> for the same field can be analyzed with different analyzer. In this article, the "name" field is analyzed with the "standard analyzer" and the "name.ngrams" field is analyzed with a custom "ngram_analyzer", which can be useful in different cases.


In [10]:
# ------------------------- 【6】Search: Bypass Analyzers -------------------------
# There are also fields of other data types such as "long" and "float", these fields are like the 
# "keyword" type of the text field and won’t be analyzed. If we don’t want the query input to be 
# analyzed in searching, we can use the "term" query:
        # GET laptops-demo/_search
        # {
        #   "query": {
        #     "term": {
        #       "name.keyword": {
        #         "value": "Apple MacBook Air"
        #       }
        #     }
        #   }
        # }

# The corresponding Python code is:
search_query = {
    "query": {
    "term": {
      "name.keyword": {
        "value": "Apple MacBook Air"
      }
    }
  }
}
es_client.search(index="laptops-demo", body=search_query)
# With this query, we get the laptop whose name is exactly “Apple MacBook Air”. If we change any word 
# in the query to lower case or remove any letter, we will get nothing back. This is because with the 
# "term" query, we search the query string as it is and won’t analyze it. If you change "name.keyword" 
# to "name" in this query, you will also get nothing back because the "name" field is a text field and 
# thus is analyzed and cannot be used in "term" query. 
# The data stored in the Elasticsearch search engine for a text field is not the "original string", but 
# a bunch of "tokens" as demonstrated above with the analyzer.

  es_client.search(index="laptops-demo", body=search_query)


{'took': 1,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 1, 'relation': 'eq'},
  'max_score': 4.89784,
  'hits': [{'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '131',
    '_score': 4.89784,
    '_source': {'id': 131,
     'name': 'Apple MacBook Air',
     'price': 16795.0,
     'brand': 'Apple',
     'attributes': [{'attribute_name': 'cpu',
       'attribute_value': 'Intel Core i5-8210Y'},
      {'attribute_name': 'memory', 'attribute_value': '8GB'},
      {'attribute_name': 'storage', 'attribute_value': '256GB'}]}}]}}

In [11]:
# ------------------------- 【7】Search: Multiple Values -------------------------
# If we want to search for multiple values (as they are, not analyzed), we can use the "terms" query. 
# For example, if we want to search for laptops whose ids are 1, 10 and 100:
        # GET laptops-demo/_search
        # {
        #   "query": {
        #     "terms": {
        #       "id": [
        #         1,
        #         10,
        #         100
        #       ]
        #     }
        #   }
        # }

# The corresponding Python code is:
search_query = {
    "query": {
    "terms": {
      "id": [1, 10, 100]
    }
  }
}
es_client.search(index="laptops-demo", body=search_query)


  es_client.search(index="laptops-demo", body=search_query)


{'took': 2,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 3, 'relation': 'eq'},
  'max_score': 1.0,
  'hits': [{'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '1',
    '_score': 1.0,
    '_source': {'id': 1,
     'name': 'HP EliteBook 820 G2',
     'price': 38842.0,
     'brand': 'HP',
     'attributes': [{'attribute_name': 'cpu',
       'attribute_value': 'Intel Core i7-5500U'},
      {'attribute_name': 'memory', 'attribute_value': '8GB'},
      {'attribute_name': 'storage', 'attribute_value': '256GB'}]}},
   {'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '10',
    '_score': 1.0,
    '_source': {'id': 10,
     'name': 'Lenovo ThinkPad P51',
     'price': 26715.0,
     'brand': 'Lenovo',
     'attributes': [{'attribute_name': 'cpu',
       'attribute_value': 'Intel Core i7-7700HQ'},
      {'attribute_name': 'memory', 'attribute_value': '8GB'},
      {'attribute_name': 'storage', 'attribute_

In [12]:
# ------------------------- 【8】Search: Values in a Range -------------------------
# We can use the range query to search for documents that contain terms with a provided range. 
# For example, let’s search for laptops whose prices are between 10,000 and 20,000 Kr:
        # GET laptops-demo/_search
        # {
        #   "query": {
        #     "range": {
        #       "price": {
        #         "gte": 10000,
        #         "lte": 20000
        #       }
        #     }
        #   }
        # }

# The corresponding Python code is:
search_query = {
    "query": {
    "range": {
      "price": {
        "gte": 10000,
        "lte": 20000
      }
    }
  }
}
es_client.search(index="laptops-demo", body=search_query)

  es_client.search(index="laptops-demo", body=search_query)


{'took': 1,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 50, 'relation': 'eq'},
  'max_score': 1.0,
  'hits': [{'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '3',
    '_score': 1.0,
    '_source': {'id': 3,
     'name': 'HP ProBook 640 G2',
     'price': 17589.0,
     'brand': 'HP',
     'attributes': [{'attribute_name': 'cpu',
       'attribute_value': 'Intel Core i5-6200U'},
      {'attribute_name': 'memory', 'attribute_value': '8GB'},
      {'attribute_name': 'storage', 'attribute_value': '128GB'}]}},
   {'_index': 'laptops-demo',
    '_type': '_doc',
    '_id': '27',
    '_score': 1.0,
    '_source': {'id': 27,
     'name': 'HP ProBook 470 G5',
     'price': 12145.0,
     'brand': 'HP',
     'attributes': [{'attribute_name': 'cpu',
       'attribute_value': 'Intel Core i5-8250U'},
      {'attribute_name': 'memory', 'attribute_value': '8GB'},
      {'attribute_name': 'storage', 'attribute_value': 