What is Mapping?
As per Elasticsearch Reference, "Mapping is the process of defining how a document, and the fields it contains, are stored and indexed."

How does it help?
It enables in faster search retrieval and aggregations. Hence, your mapping defines how effectively you can handle your data. A bad mapping can have severe consequences on the performance of your system.

To know more about mappings: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

In [1]:
#documents to insert in the elasticsearch index "cities"
doc1 = {"city":"Bangalore", "country":"India","datetime":"2018,01,01,10,20,00"} #datetime format: yyyy,MM,dd,hh,mm,ss
doc2 = {"city":"London", "country":"England","datetime":"2018,01,02,03,12,00"}
doc3 = {"city":"Los Angeles", "country":"USA","datetime":"2018,04,19,21,02,00"}

In [4]:
from elasticsearch import Elasticsearch

es = Elasticsearch(
    [{'host': 'localhost', 'port': 9200, 'scheme': 'http'}],
    basic_auth=('elastic', '123456')
)

es.index(index = "travel", id = 1, body = doc1)
es.index(index = "travel", id = 2, body = doc2)
es.index(index = "travel", id = 3, body = doc3)


ObjectApiResponse({'_index': 'travel', '_id': '3', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 2, '_primary_term': 1})

Mapping 

In [7]:
es.indices.create(index = 'travel1')

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'travel1'})

In [8]:
es.indices.put_mapping(
    index = "travel1",
    body = {
        "properties": {
            "city": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "country": {
                "type": "text",
                "fields": {
                    "keyword" :{
                        "type": "keyword",
                        "ignore_above" : 256
                    }
                }
            },
            "datetime":{
                "type": "date",
                "format" : "yyyy,MM,dd,hh,mm,ss"
            }
        }
    }
)

ObjectApiResponse({'acknowledged': True})

In [11]:
res = es.indices.get_mapping(index = 'travel1')

res

ObjectApiResponse({'travel1': {'mappings': {'properties': {'city': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'country': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'datetime': {'type': 'date', 'format': 'yyyy,MM,dd,hh,mm,ss'}}}}})

In [14]:
es.index(index = "travel1", id = 1, body = doc1)
es.index(index = "travel1", id = 2, body = doc2)
# es.index(index = "travel1", id = 3, body = doc3)

ObjectApiResponse({'_index': 'travel1', '_id': '2', '_version': 2, 'result': 'updated', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 4, '_primary_term': 1})

In [18]:
es.indices.delete(index = "travel")

ObjectApiResponse({'acknowledged': True})

Aggregations

https://www.elastic.co/blog/aggregate-all-the-things-new-aggregations-in-elasticsearch-7

Aggregations are one of the most important application of Elasticsearch. It provides you with quick powerful analysis of your data! Below we have discussed aggregations over date values.

Date Histogram

A lot of analysis happen on a time-series scales. For example: Quaterly sales of iphone across the world. Therefore it is essential to have an fast aggregation done over large dataset under different granular scales. ES provides such an aggregation via date histogram aggregation. The granularities over which you can do aggregations are:

year
quater
month
hour
week
day
hour
minute
second
milisecond

For more info : https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html

In [19]:
doc1 = {"city":"Bangalore", "country":"India","datetime":"2018,01,01,10,20,00"} #datetime format: yyyy,MM,dd,hh,mm,ss
doc2 = {"city":"London", "country":"England","datetime":"2018,01,02,03,12,00"}
doc3 = {"city":"Los Angeles", "country":"USA","datetime":"2018,04,19,05,02,00"}
doc4 = {"city":"Sydney", "country":"Australia","datetime":"2019,01,01,10,20,00"}

In [20]:
es.index(index="travel", id=1, body=doc1)
es.index(index="travel", id=2, body=doc2)
es.index(index="travel", id=3, body=doc3)
es.index(index="travel", id=4, body=doc4)

ObjectApiResponse({'_index': 'travel', '_id': '4', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 3, '_primary_term': 1})

In [22]:
es.indices.get_mapping(index="travel")

ObjectApiResponse({'travel': {'mappings': {'properties': {'city': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'country': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'datetime': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}}}}})

In [25]:
res = es.search(index='travel',
                body = {'from':0,
                        'size':5,
                        'query':{
                            'match_all':{}
                        }
                       }
               )

In [27]:
res['hits']

{'total': {'value': 4, 'relation': 'eq'},
 'max_score': 1.0,
 'hits': [{'_index': 'travel',
   '_id': '1',
   '_score': 1.0,
   '_source': {'city': 'Bangalore',
    'country': 'India',
    'datetime': '2018,01,01,10,20,00'}},
  {'_index': 'travel',
   '_id': '2',
   '_score': 1.0,
   '_source': {'city': 'London',
    'country': 'England',
    'datetime': '2018,01,02,03,12,00'}},
  {'_index': 'travel',
   '_id': '3',
   '_score': 1.0,
   '_source': {'city': 'Los Angeles',
    'country': 'USA',
    'datetime': '2018,04,19,05,02,00'}},
  {'_index': 'travel',
   '_id': '4',
   '_score': 1.0,
   '_source': {'city': 'Sydney',
    'country': 'Australia',
    'datetime': '2019,01,01,10,20,00'}}]}

In [29]:
res = es.search(index='travel',
                body = {'from':0,
                        'size':5,
                        'query':{
                            'match':{
                                "city":"London"
                            }
                        }
                       }
               )

In [31]:
res['hits']

{'total': {'value': 1, 'relation': 'eq'},
 'max_score': 1.3112575,
 'hits': [{'_index': 'travel',
   '_id': '2',
   '_score': 1.3112575,
   '_source': {'city': 'London',
    'country': 'England',
    'datetime': '2018,01,02,03,12,00'}}]}

In [36]:
# Delete the index if it exists
es.indices.delete(index='travel', ignore=[400, 404])

  es.indices.delete(index='travel', ignore=[400, 404])


ObjectApiResponse({'acknowledged': True})

In [37]:
# Create the index with the correct mappings
es.indices.create(
    index='travel',
    body={
        "mappings": {
            "properties": {
                "city": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "country": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "datetime": {
                    "type": "date",
                    "format": "yyyy,MM,dd,HH,mm,ss"  # Correct the date format
                }
            }
        }
    }
)

#In mapping, first create a mapping and then put the index/ ID 


ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'travel'})

In [42]:
es.index(index='travel', id=1, body=doc1)
es.index(index='travel', id=2, body=doc2)
es.index(index='travel', id=3, body=doc3)

ObjectApiResponse({'_index': 'travel', '_id': '3', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 2, '_primary_term': 1})

In [54]:
# Search query with corrected date histogram aggregation
res = es.search(
    index="travel",
    body={
        "from": 0,
        "size": 4,
        "query": {
            "match_all": {}
        },
        "aggs": {
            "basic": {
                "date_histogram": {
                    "field": "datetime",
                    "calendar_interval": "year"  # Use fixed_interval instead of interval
                }
            }
        }
    }
)

In [55]:
print(res['aggregations'])

{'basic': {'buckets': [{'key_as_string': '2018,01,01,00,00,00', 'key': 1514764800000, 'doc_count': 3}]}}


In [56]:
res = es.search(index="travel",
              body={"from": 0, "size": 0, "query": {"match_all": {}}, "aggs": {
                  "country": {
                      "date_histogram": {"field": "datetime", "calendar_interval": "quarter"}}}})

In [58]:
res['aggregations']

{'country': {'buckets': [{'key_as_string': '2018,01,01,00,00,00',
    'key': 1514764800000,
    'doc_count': 2},
   {'key_as_string': '2018,04,01,00,00,00',
    'key': 1522540800000,
    'doc_count': 1}]}}