# Aggregations

To translate into rough SQL terms:

>SELECT COUNT(color) 

>FROM table

>GROUP BY color 


- COUNT(color) is equivalent to a metric.
- GROUP BY color is equivalent to a bucket.


## 1. Metric
Aggregations that keep track and compute metrics over a set of documents.
## 2. Matrix
Operate on multiple fields and produce a matrix result based on the values extracted from the requested document fields. No support of scripting.
## 3. Pipeline
Aggregations that aggregate the output of other aggregations and their associated metrics


## 4. Bucketing
Each bucket is associated with a key and a document criterion. When the aggregation is executed, all the buckets criteria are evaluated on every document in the context and when a criterion matches, the document is considered to "fall in" the relevant bucket. By the end of the aggregation process, we’ll end up with a list of buckets - each one with a set of documents that "belong" to it.

>Bucketing aggregations can have sub-aggregations (bucketing or metric). The sub-aggregations will be computed for the buckets which their parent aggregation generates. 


## Examples of Bucket Aggregation
- Date Histogram Aggregation
- Date Range Aggregation
- Filter(s) Aggregation
- Geo Distance Aggregation
- Histogram Aggregation
- Missing Aggregation
- Nested Aggregation
- Range Aggregation
- Reverse nested Aggregation
- Sampler Aggregation
- Significant Terms Aggregation
- Terms Aggregation

In [27]:
#Few libraries we will using 
#!pip install ujson requests elasticsearch ipywidgets certifi elasticsearch_dsl
#!pip install -U certifi
#!pip install faker
import faker
import requests
import ujson as json
from elasticsearch import Elasticsearch
from elasticsearch import helpers
from elasticsearch_dsl import Search, DocType, Date, Integer, Keyword, Text
from datetime import datetime
from elasticsearch_dsl.connections import connections
import pandas as pd
from ipywidgets import interact, interactive, fixed, interact_manual, HBox, Dropdown
from IPython.display import clear_output


ES_HOST = 'http://34.205.15.150:9200'
INDEX ='umbrellacorp'
DOC_TYPE = 'user'
es = Elasticsearch(ES_HOST)
print(es)
!pwd


<Elasticsearch([{'port': 9200, 'host': '34.205.15.150'}])>
/home/jovyan/work/ES


In [23]:
# save match all query as python variable
myquery={"query": 
         {"match_all": {}}
        }

# execute the query using body parameter and return total number of records
# select count(*) from table
res = es.search(index=INDEX, body=myquery)  

print("Total records found: {rec}".format(rec=res['hits']['total']))
for x in range(0, res['hits']['total']):
    print("\n" + str(x+1))
    for key, value in res['hits']['hits'][x]['_source'].items():
        print(str(key) + ": " + str(value))
    if x == 1:
        print("-- breaking--")
        break 

Total records found: 4025732

1
country: Solomon Islands
lifecycle: 8
email: nicholasharvey@example.org
r_score: 5
name: Allen Ellis
job: Copy
address: PSC 5253, Box 9321
APO AE 43941-8966
m_score: 0
city: Donnahaven
total_discount_revenue: 58.68
f_score: 3
avg_revenue_per_month: 279.42857142857144
discount_percentage: 3
rfm_score: 21
revenue: 7446
email_unsubscribe: False
num_of_orders: 10.0
signup_date: 2017-01-25T00:19:27
fav_color: ForestGreen

2
country: Saint Kitts and Nevis
lifecycle: 7
email: iayala@example.net
r_score: 7
name: Annette Diaz
job: Higher education careers adviser
address: 3853 Derek Shores Suite 428
Jonathanfurt, SC 84266
m_score: 6
city: West Thomas
total_discount_revenue: 58.68
f_score: 8
avg_revenue_per_month: 244.5
discount_percentage: 28
rfm_score: 21
revenue: 8114
email_unsubscribe: True
num_of_orders: 10.0
signup_date: 2017-04-10T11:03:09
fav_color: DarkGoldenRod
-- breaking--


In [26]:
body_json = {
   "aggs":{
      "revenue_avg":{"avg":{"field":"revenue"}}
   }
}

res = es.search(index=INDEX, body=json.dumps(body_json))
print(res['aggregations'])

{'revenue_avg': {'value': 4836.429386258201}}
