<a href="https://colab.research.google.com/github/michalis0/BigScaleAnalytics/blob/master/week4/elasticsearch_python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Querying Elastic Search engine with elasticsearch-dsl python package

In this week we will try to send queirs to elastic search engine with Python. We will use the [elasticsearch-dsl](https://elasticsearch-dsl.readthedocs.io/en/latest/index.html) package which is a high-level library whose aim is to help with writing and running queries against Elasticsearch. It is built on top of the official low-level client (elasticsearch-py).

It provides a more convenient and idiomatic way to write and manipulate queries. It stays close to the Elasticsearch JSON DSL, mirroring its terminology and structure.

In [16]:
import pandas as pd
import requests

In [17]:
!pip install elasticsearch-dsl



In [18]:
import elasticsearch_dsl
from elasticsearch_dsl import connections
from elasticsearch import Elasticsearch
from elasticsearch_dsl import A, Search, Q

In both code cells below, replace the URL ("https..:9243") and the password ("zMJcA6De12xdU8OiVmOtDCu4") with information from your own deployment.

You can get the endpoint URL by going to your deployment's configuration > Elasticsearch > Copy endpoint. For the password, you should have downloaded a JSON credentials file when you first set up your deployment. If you don't have that file anymore, you can reset the password under the "Security" section of the configuration page.

In [19]:
# connecting to your elastic search deployment
# for the first entry between '' go to Elasticsearch Service > Deployment management> Copy Elasticsearch endpoint of your deployment
# your will need to use the authentication credentials from when you set up your deployment in http_auth
client = Elasticsearch('https://bsa-test.es.europe-west1.gcp.cloud.es.io:9243',
                      http_auth=('elastic','u5IaYDYQLRkADcMGtwRoWX6z'))

Below you can find a few examples of simple queries using the Kibana ecommerce sample data.

In [20]:
# search for all tags related to customer_id 38
# set index to the name under which you uploaded your data in your deployment
s = Search(using=client, index="kibana_sample_data_ecommerce").query("match", customer_id="38")
response = s.execute()

In [21]:
response

<Response: [<Hit(kibana_sample_data_ecommerce/z-pKmH8B53QsrRNeEA9X): {'category': ["Men's Clothing", "Women's Accessories"], 'cur...}>, <Hit(kibana_sample_data_ecommerce/xHxKmH8BiBfco8ywEtWk): {'category': ["Men's Clothing"], 'currency': 'EUR', 'custome...}>, <Hit(kibana_sample_data_ecommerce/23xKmH8BiBfco8ywEtWl): {'category': ["Men's Clothing"], 'currency': 'EUR', 'custome...}>, <Hit(kibana_sample_data_ecommerce/DXxKmH8BiBfco8ywEtal): {'category': ["Men's Clothing", "Men's Shoes"], 'currency': ...}>, <Hit(kibana_sample_data_ecommerce/uepKmH8B53QsrRNeERCI): {'category': ["Men's Accessories", "Women's Accessories", "M...}>, <Hit(kibana_sample_data_ecommerce/IOpKmH8B53QsrRNeERGI): {'category': ["Men's Clothing", "Men's Shoes"], 'currency': ...}>, <Hit(kibana_sample_data_ecommerce/IepKmH8B53QsrRNeERGI): {'category': ["Men's Clothing", "Men's Shoes"], 'currency': ...}>, <Hit(kibana_sample_data_ecommerce/I-pKmH8B53QsrRNeERGI): {'category': ["Men's Clothing", "Men's Accessories"], 'curre...

In [22]:
print(s.to_dict())

{'query': {'match': {'customer_id': '38'}}}


In [23]:
df = []
for h in response.hits.hits:
    df.append(h["_source"].to_dict())
    

In [24]:
# this is how you retrieve the number of hits returned by the query
response.hits.total.value

100

In [25]:
# this is how you retrieve the first 10 entries that correspond to the query
pd.DataFrame(df)

Unnamed: 0,category,currency,customer_first_name,customer_full_name,customer_gender,customer_id,customer_last_name,customer_phone,day_of_week,day_of_week_i,...,products,sku,taxful_total_price,taxless_total_price,total_quantity,total_unique_products,type,user,geoip,event
0,"[Men's Clothing, Women's Accessories]",EUR,Eddie,Eddie Clayton,MALE,38,Clayton,,Wednesday,2,...,"[{'base_price': 22.99, 'discount_percentage': ...","[ZO0279602796, ZO0605006050]",41.98,41.98,2,2,order,eddie,"{'country_iso_code': 'EG', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
1,[Men's Clothing],EUR,Eddie,Eddie Gomez,MALE,38,Gomez,,Saturday,5,...,"[{'base_price': 22.99, 'discount_percentage': ...","[ZO0593805938, ZO0287502875]",82.98,82.98,2,2,order,eddie,"{'country_iso_code': 'EG', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
2,[Men's Clothing],EUR,Eddie,Eddie Mccormick,MALE,38,Mccormick,,Wednesday,2,...,"[{'base_price': 10.99, 'discount_percentage': ...","[ZO0436704367, ZO0455104551]",39.98,39.98,2,2,order,eddie,"{'country_iso_code': 'EG', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
3,"[Men's Clothing, Men's Shoes]",EUR,Eddie,Eddie Perry,MALE,38,Perry,,Saturday,5,...,"[{'base_price': 59.99, 'discount_percentage': ...","[ZO0424204242, ZO0403504035, ZO0506705067, ZO0...",180.96,180.96,4,4,order,eddie,"{'country_iso_code': 'EG', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
4,"[Men's Accessories, Women's Accessories, Men's...",EUR,Eddie,Eddie Thompson,MALE,38,Thompson,,Monday,0,...,"[{'base_price': 20.99, 'discount_percentage': ...","[ZO0609406094, ZO0320003200, ZO0531305313, ZO0...",91.96,91.96,4,4,order,eddie,"{'country_iso_code': 'EG', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
5,"[Men's Clothing, Men's Shoes]",EUR,Eddie,Eddie Holland,MALE,38,Holland,,Thursday,3,...,"[{'base_price': 14.99, 'discount_percentage': ...","[ZO0532805328, ZO0590805908, ZO0279402794, ZO0...",116.96,116.96,4,4,order,eddie,"{'country_iso_code': 'EG', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
6,"[Men's Clothing, Men's Shoes]",EUR,Eddie,Eddie Summers,MALE,38,Summers,,Monday,0,...,"[{'base_price': 22.99, 'discount_percentage': ...","[ZO0590405904, ZO0403904039, ZO0515005150, ZO0...",174.96,174.96,4,4,order,eddie,"{'country_iso_code': 'EG', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
7,"[Men's Clothing, Men's Accessories]",EUR,Eddie,Eddie Massey,MALE,38,Massey,,Friday,4,...,"[{'base_price': 11.99, 'discount_percentage': ...","[ZO0616506165, ZO0284402844, ZO0465104651, ZO0...",93.96,93.96,4,4,order,eddie,"{'country_iso_code': 'EG', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
8,"[Men's Clothing, Men's Accessories, Men's Shoes]",EUR,Eddie,Eddie Hodges,MALE,38,Hodges,,Friday,4,...,"[{'base_price': 14.99, 'discount_percentage': ...","[ZO0118601186, ZO0438904389, ZO0468004680, ZO0...",185.96,185.96,4,4,order,eddie,"{'country_iso_code': 'EG', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
9,[Men's Clothing],EUR,Eddie,Eddie Underwood,MALE,38,Underwood,,Monday,0,...,"[{'base_price': 11.99, 'discount_percentage': ...","[ZO0549605496, ZO0299602996]",36.98,36.98,2,2,order,eddie,"{'country_iso_code': 'EG', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}


#### Combining queries

In [26]:
q = Q("match", customer_gender="FEMALE") | Q("match", category="shoes")

s = Search(using=client, index="kibana_sample_data_ecommerce").query(q)
response = s.execute()

In [27]:
s.to_dict()

{'query': {'bool': {'should': [{'match': {'customer_gender': 'FEMALE'}},
    {'match': {'category': 'shoes'}}]}}}

In [28]:
response.hits.total.value

3354

In [29]:
df = []
for h in response.hits.hits:
    df.append(h["_source"].to_dict())
    
pd.DataFrame(df)

Unnamed: 0,category,currency,customer_first_name,customer_full_name,customer_gender,customer_id,customer_last_name,customer_phone,day_of_week,day_of_week_i,...,products,sku,taxful_total_price,taxless_total_price,total_quantity,total_unique_products,type,user,geoip,event
0,[Women's Shoes],EUR,Elyssa,Elyssa Mccormick,FEMALE,27,Mccormick,,Tuesday,1,...,"[{'base_price': 64.99, 'discount_percentage': ...","[ZO0666606666, ZO0139201392]",97.98,97.98,2,2,order,elyssa,"{'country_iso_code': 'US', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
1,[Women's Shoes],EUR,Elyssa,Elyssa Rodriguez,FEMALE,27,Rodriguez,,Monday,0,...,"[{'base_price': 59.99, 'discount_percentage': ...","[ZO0242302423, ZO0676006760]",134.98,134.98,2,2,order,elyssa,"{'country_iso_code': 'US', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
2,[Women's Shoes],EUR,Rabbia Al,Rabbia Al Jensen,FEMALE,5,Jensen,,Friday,4,...,"[{'base_price': 32.99, 'discount_percentage': ...","[ZO0024300243, ZO0015300153]",74.98,74.98,2,2,order,rabbia,"{'country_iso_code': 'AE', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
3,[Women's Shoes],EUR,Clarice,Clarice Baker,FEMALE,18,Baker,,Wednesday,2,...,"[{'base_price': 24.99, 'discount_percentage': ...","[ZO0009900099, ZO0252202522]",109.98,109.98,2,2,order,clarice,"{'country_iso_code': 'GB', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
4,[Women's Shoes],EUR,Wilhemina St.,Wilhemina St. Jackson,FEMALE,17,Jackson,,Tuesday,1,...,"[{'base_price': 64.99, 'discount_percentage': ...","[ZO0668406684, ZO0023200232]",97.98,97.98,2,2,order,wilhemina,"{'country_iso_code': 'MC', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
5,[Women's Shoes],EUR,Wilhemina St.,Wilhemina St. Strickland,FEMALE,17,Strickland,,Tuesday,1,...,"[{'base_price': 24.99, 'discount_percentage': ...","[ZO0004800048, ZO0011000110]",53.98,53.98,2,2,order,wilhemina,"{'country_iso_code': 'MC', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
6,[Women's Shoes],EUR,Pia,Pia Rivera,FEMALE,45,Rivera,,Saturday,5,...,"[{'base_price': 84.99, 'discount_percentage': ...","[ZO0680206802, ZO0373103731]",134.98,134.98,2,2,order,pia,"{'country_iso_code': 'FR', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
7,[Women's Shoes],EUR,Clarice,Clarice Schultz,FEMALE,18,Schultz,,Thursday,3,...,"[{'base_price': 25.99, 'discount_percentage': ...","[ZO0132301323, ZO0373603736]",105.98,105.98,2,2,order,clarice,"{'country_iso_code': 'GB', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
8,[Women's Shoes],EUR,Pia,Pia Willis,FEMALE,45,Willis,,Friday,4,...,"[{'base_price': 49.99, 'discount_percentage': ...","[ZO0322103221, ZO0373903739]",139.98,139.98,2,2,order,pia,"{'country_iso_code': 'FR', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
9,[Women's Shoes],EUR,Wilhemina St.,Wilhemina St. Tyler,FEMALE,17,Tyler,,Sunday,6,...,"[{'base_price': 28.99, 'discount_percentage': ...","[ZO0028700287, ZO0136201362]",53.98,53.98,2,2,order,wilhemina,"{'country_iso_code': 'MC', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}


#### Filtering 
As opposed to `match` filtering aims to answer the question "how does the record match the query clause?", so the answer is a simple yes or no and there is no score involved (https://www.elastic.co/guide/en/elasticsearch/reference/2.0/query-filter-context.html). 

In [30]:
s = Search(using=client, index='kibana_sample_data_ecommerce').filter('terms', day_of_week=['Tuesday', 'Thursday'])
response = s.execute()

In [31]:
s.to_dict()

{'query': {'bool': {'filter': [{'terms': {'day_of_week': ['Tuesday',
       'Thursday']}}]}}}

In [32]:
df = []
for h in response.hits.hits:
    df.append(h["_source"].to_dict())
    
pd.DataFrame(df)

Unnamed: 0,category,currency,customer_first_name,customer_full_name,customer_gender,customer_id,customer_last_name,customer_phone,day_of_week,day_of_week_i,...,products,sku,taxful_total_price,taxless_total_price,total_quantity,total_unique_products,type,user,geoip,event
0,"[Men's Accessories, Men's Clothing]",EUR,Eddie,Eddie Gregory,MALE,38,Gregory,,Tuesday,1,...,"[{'base_price': 17.99, 'discount_percentage': ...","[ZO0700707007, ZO0459704597, ZO0293702937, ZO0...",68.96,68.96,4,4,order,eddie,"{'country_iso_code': 'EG', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
1,"[Men's Clothing, Men's Shoes, Women's Accessor...",EUR,Sultan Al,Sultan Al Thompson,MALE,19,Thompson,,Tuesday,1,...,"[{'base_price': 32.99, 'discount_percentage': ...","[ZO0125301253, ZO0507105071, ZO0428704287, ZO0...",174.96,174.96,4,4,order,sultan,"{'country_iso_code': 'AE', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
2,"[Men's Clothing, Men's Shoes]",EUR,George,George Hubbard,MALE,32,Hubbard,,Thursday,3,...,"[{'base_price': 16.99, 'discount_percentage': ...","[ZO0580905809, ZO0507105071]",41.98,41.98,2,2,order,george,"{'country_iso_code': 'GB', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
3,[Men's Clothing],EUR,Yahya,Yahya Rivera,MALE,23,Rivera,,Tuesday,1,...,"[{'base_price': 24.99, 'discount_percentage': ...","[ZO0457304573, ZO0562905629]",37.98,37.98,2,2,order,yahya,"{'country_iso_code': 'MA', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
4,"[Women's Clothing, Women's Accessories]",EUR,Brigitte,Brigitte Morris,FEMALE,12,Morris,,Tuesday,1,...,"[{'base_price': 59.99, 'discount_percentage': ...","[ZO0353103531, ZO0079500795]",74.98,74.98,2,2,order,brigitte,"{'country_iso_code': 'US', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
5,"[Women's Shoes, Women's Accessories]",EUR,rania,rania Padilla,FEMALE,24,Padilla,,Thursday,3,...,"[{'base_price': 41.99, 'discount_percentage': ...","[ZO0141301413, ZO0209102091]",55.98,55.98,2,2,order,rani,"{'country_iso_code': 'EG', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
6,[Women's Clothing],EUR,Sonya,Sonya Foster,FEMALE,28,Foster,,Tuesday,1,...,"[{'base_price': 64.99, 'discount_percentage': ...","[ZO0652906529, ZO0104801048]",87.98,87.98,2,2,order,sonya,"{'country_iso_code': 'CO', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
7,[Women's Shoes],EUR,Brigitte,Brigitte King,FEMALE,12,King,,Thursday,3,...,"[{'base_price': 32.99, 'discount_percentage': ...","[ZO0216402164, ZO0666306663]",97.98,97.98,2,2,order,brigitte,"{'country_iso_code': 'US', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
8,[Men's Clothing],EUR,Kamal,Kamal Jenkins,MALE,39,Jenkins,,Thursday,3,...,"[{'base_price': 10.99, 'discount_percentage': ...","[ZO0474204742, ZO0574005740]",31.98,31.98,2,2,order,kamal,"{'country_iso_code': 'TR', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}
9,[Men's Clothing],EUR,Muniz,Muniz Rivera,MALE,37,Rivera,,Thursday,3,...,"[{'base_price': 209.99, 'discount_percentage':...","[ZO0291602916, ZO0292302923]",221.98,221.98,2,2,order,muniz,"{'country_iso_code': 'MA', 'location': {'lon':...",{'dataset': 'sample_ecommerce'}


#### Aggregations

In [33]:
from elasticsearch_dsl import A

a = A('terms', field='customer_gender')

In [34]:
s = Search(using=client, index='kibana_sample_data_ecommerce')
s.aggs.bucket('gender', 'terms', field='customer_gender')\
    .metric('num_customers', 'value_count', field='customer_id')

Terms(aggs={'num_customers': ValueCount(field='customer_id')}, field='customer_gender')

In [35]:
s.to_dict()

{'aggs': {'gender': {'aggs': {'num_customers': {'value_count': {'field': 'customer_id'}}},
   'terms': {'field': 'customer_gender'}}}}

In [36]:
response = s.execute()

In [37]:
response.aggregations.to_dict()

{'gender': {'buckets': [{'doc_count': 2433,
    'key': 'FEMALE',
    'num_customers': {'value': 2433}},
   {'doc_count': 2242, 'key': 'MALE', 'num_customers': {'value': 2242}}],
  'doc_count_error_upper_bound': 0,
  'sum_other_doc_count': 0}}

In [38]:
df = []
for r in response.aggregations.gender.buckets:
    df.append(r.to_dict())
pd.DataFrame(df)

Unnamed: 0,key,doc_count,num_customers
0,FEMALE,2433,{'value': 2433}
1,MALE,2242,{'value': 2242}
