# Elasticsearch scripting with Python.

The following `text` is from the `devtool` activities of the elasticsearch operations using `kibana`.
Examples were extracted from these websites:

- `https://www.elastic.co/blog/an-introduction-to-elasticsearch-sql-with-practical-examples-part-1`
- `https://www.elastic.co/blog/an-introduction-to-elasticsearch-sql-with-practical-examples-part-2`

In order to use the `python` scripts below, make sure you install `elasticsearch` and `pandas` library:

- `pip install elasticsearch`
- `pip install pandas`

```

# Examples extracted from `https://www.elastic.co/blog/an-introduction-to-elasticsearch-sql-with-practical-examples-part-1` website


POST _sql
{
  "query":"DESCRIBE kibana_sample_data_flights"
}


POST _sql?format=txt
{
  "query":"DESCRIBE kibana_sample_data_flights"
}

POST _sql?format=txt 
{
  "query": "SELECT * from kibana_sample_data_flights"
}

POST _sql?format=txt 
{ 
  "query": "show tables"
}

# Column name is `case sensitive`

POST _sql?format=txt
{
  "query": "SELECT FlightNum from kibana_sample_data_flights limit 1"  
}

POST _sql?format=txt
{
  "query": "SELECT OriginCountry, OriginCityName from kibana_sample_data_flights limit 1"  
}

POST _xpack/sql?format=txt
{
  "query":"SELECT OriginCityName, DestCityName FROM kibana_sample_data_flights WHERE FlightTimeHour > 5 AND OriginCountry='US' ORDER BY FlightTimeHour DESC LIMIT 10"
}


POST _xpack/sql?format=txt
{
  "query":"SELECT MONTH_OF_YEAR(timestamp), OriginCityName, DestCityName FROM kibana_sample_data_flights WHERE FlightTimeHour > 5 AND MONTH_OF_YEAR(timestamp) > 6 ORDER BY FlightTimeHour DESC LIMIT 10"
}


POST _sql?format=txt
{
  "query":"SELECT timestamp, FlightNum, AvgTicketPrice FROM kibana_sample_data_flights WHERE AvgTicketPrice > 500 ORDER BY AvgTicketPrice"
}

# Translate to DSL from SQL. This helps elasticsearch user in understanding the actual query itself
# The `WHERE` clause is expressed `range` and `term` keys.

POST _sql/translate
{
  "query":"SELECT OriginCityName, DestCityName FROM kibana_sample_data_flights WHERE FlightTimeHour > 5 AND OriginCountry='US' ORDER BY FlightTimeHour DESC LIMIT 10"
}

# POST _sql/translate
POST _sql?format=txt
{
  "query":"SELECT timestamp, FlightNum, OriginCityName, DestCityName, ROUND(DistanceMiles) AS distance, ROUND(DistanceMiles/CAST(FlightTimeHour as float)) AS speed, DAY_OF_WEEK(timestamp) AS day_of_week FROM kibana_sample_data_flights WHERE DAY_OF_WEEK(timestamp) >= 0 AND DAY_OF_WEEK(timestamp) <= 2 AND HOUR_OF_DAY(timestamp) >=9 AND HOUR_OF_DAY(timestamp) <= 10 ORDER BY speed DESC, distance DESC LIMIT 2"
}
```

```
#
# Examples extracted from `https://www.elastic.co/blog/an-introduction-to-elasticsearch-sql-with-practical-examples-part-2` website
#

POST _sql?format=txt
{
  "query": "SELECT AVG(CAST(FlightTimeHour as float)) Avg_Flight_Time, OriginCountry FROM kibana_sample_data_flights GROUP BY OriginCountry ORDER BY OriginCountry LIMIT 5"
}

POST _sql?format=txt
{
  "query": "SELECT COUNT(*), MONTH_OF_YEAR(timestamp) AS month_of_year, AVG(CAST(FlightTimeHour as float)) AS Avg_Flight_Time FROM kibana_sample_data_flights GROUP BY month_of_year"
}

POST _sql?format=txt
{
  "query": "SELECT AVG(CAST(FlightTimeHour as float)) Avg_Flight_Time, OriginCountry FROM kibana_sample_data_flights GROUP BY OriginCountry ORDER BY Avg_Flight_Time"
}

POST _sql?format=txt
{
  "query": "SELECT OriginCityName, ROUND(AVG(DistanceKilometers)) avg_distance, COUNT(*) c, ROUND(PERCENTILE(DistanceKilometers,95)) AS percentile_distance FROM kibana_sample_data_flights GROUP BY OriginCityName HAVING avg_distance BETWEEN 3000 AND 4000"
}

POST _sql?format=txt
{
  "query": "SELECT timestamp, FlightNum, OriginCityName, DestCityName FROM kibana_sample_data_flights WHERE QUERY('Cologne') AND FlightDelay=true AND timestamp > '2021-10-25' AND timestamp < '2021-10-27' ORDER BY timestamp"
}

# This query is not quite working with the Match of the *Weather field.

POST _sql?format=txt
{
  "query": "SELECT Score(), timestamp, FlightNum, OriginCityName, DestCityName, DestWeather, OriginWeather FROM kibana_sample_data_flights WHERE MATCH('*Weather,*City*', 'Barcelona', 'type=cross_fields;operator=AND') ORDER BY Score() DESC LIMIT 5"
}

POST /_aliases
{
    "actions" : [
       
        { "add" : { "index" : "kibana_sample_data_flights", "alias" : "f_alias" } }
    ]
}

POST _sql?format=txt
{
  "query": "SELECT FlightNum, OriginCityName, DestCityName, DestWeather, OriginWeather FROM kibana_sample_data_flights ORDER BY timestamp DESC LIMIT 1"
}

POST _sql?format=txt
{
  "query": "SELECT FlightNum, OriginCityName, DestCityName, DestWeather, OriginWeather FROM f_alias ORDER BY timestamp DESC LIMIT 1"
}

# No direct support for JOIN at this point from elasticsearch SQL.

####################################################### Examples

GET /_cat/indices

POST _sql?format=txt
{
  "query": "SELECT OriginCityName from kibana_sample_data_flights where OriginCityName like '%Barcelona%'"
}


POST _sql?format=txt
{
  "query": "SELECT DestCityName, DestWeather from kibana_sample_data_flights where DestWeather like '%Lightning%' and DestCityName like '%Vienna%'"
}

```

In [1]:
from elasticsearch import Elasticsearch
from datetime import datetime
import pandas as pd
import json
import warnings
warnings.filterwarnings('ignore')

## Connecting to the elasticsearch server @ port 9200.
Note above that we are ignore security warnings. In real world application at work, you should not do that.


In [2]:
elasticsearch_url = 'http://localhost:9200'
client = Elasticsearch(elasticsearch_url)

## Elasticsearch basic operations

### Display information about the elasticsearch client

In [3]:
resp = client.info()
resp

{'name': '6ea4eaa25fcb',
 'cluster_name': 'docker-cluster',
 'cluster_uuid': 'aKEIeA72RmaMHbIf6VmLgw',
 'version': {'number': '7.14.0',
  'build_flavor': 'default',
  'build_type': 'docker',
  'build_hash': 'dd5a0a2acaa2045ff9624f3729fc8a6f40835aa1',
  'build_date': '2021-07-29T20:49:32.864135063Z',
  'build_snapshot': False,
  'lucene_version': '8.9.0',
  'minimum_wire_compatibility_version': '6.8.0',
  'minimum_index_compatibility_version': '6.0.0-beta1'},
 'tagline': 'You Know, for Search'}

### Basic Operations of elasticsearch

In [4]:
doc = {
    'author': 'kimchy',
    'text': 'Elasticsearch: cool. bonsai cool.',
    'timestamp': datetime.now(),
}
doc

{'author': 'kimchy',
 'text': 'Elasticsearch: cool. bonsai cool.',
 'timestamp': datetime.datetime(2021, 11, 7, 12, 37, 18, 932544)}

In [5]:
res = client.index(index="test-index", id=1, document=doc)
print(res['result'])

updated


In [6]:
res = client.get(index="test-index", id=1)
print(res['_source'])


{'author': 'kimchy', 'text': 'Elasticsearch: cool. bonsai cool.', 'timestamp': '2021-11-07T12:37:18.932544'}


In [7]:
client.indices.refresh(index="test-index")

{'_shards': {'total': 2, 'successful': 1, 'failed': 0}}

In [8]:
res = client.search(index="test-index", query={"match_all": {}})
print("Got %d Hits:" % res['hits']['total']['value'])
for hit in res['hits']['hits']:
    print("%(timestamp)s %(author)s: %(text)s" % hit["_source"])

Got 1 Hits:
2021-11-07T12:37:18.932544 kimchy: Elasticsearch: cool. bonsai cool.


## Elasticsearch SQL 
We are now switching to use elasticsearch SQL.


### Describing the flights table/index
Note that the `kibana_sample_data_flights` instead of just `flights`.

In [9]:
query = 'DESCRIBE kibana_sample_data_flights'
resp = client.sql.query(body={'query':query})

### Converting into Pandas DataFrame
Convert from the response data format into a Pandas DataFrame. 

```json
{
    'columns': [{'name': ..., },],
    'rows': [...]
}
```

In [10]:
column_names = []
for c in resp['columns']:
    column_names.append(c['name'])
    
df = pd.DataFrame(resp['rows'], columns=column_names)
print('About the kibana_sample_data_flights table:')
# print(df)
'''
                column       type    mapping
0       AvgTicketPrice       REAL      float
1            Cancelled    BOOLEAN    boolean
2              Carrier    VARCHAR    keyword
3                 Dest    VARCHAR    keyword
4        DestAirportID    VARCHAR    keyword
5         DestCityName    VARCHAR    keyword
6          DestCountry    VARCHAR    keyword
7         DestLocation   GEOMETRY  geo_point
8           DestRegion    VARCHAR    keyword
9          DestWeather    VARCHAR    keyword
10  DistanceKilometers       REAL      float
11       DistanceMiles       REAL      float
12         FlightDelay    BOOLEAN    boolean
13      FlightDelayMin    INTEGER    integer
14     FlightDelayType    VARCHAR    keyword
15           FlightNum    VARCHAR    keyword
16      FlightTimeHour    VARCHAR    keyword
17       FlightTimeMin       REAL      float
18              Origin    VARCHAR    keyword
19     OriginAirportID    VARCHAR    keyword
20      OriginCityName    VARCHAR    keyword
21       OriginCountry    VARCHAR    keyword
22      OriginLocation   GEOMETRY  geo_point
23        OriginRegion    VARCHAR    keyword
24       OriginWeather    VARCHAR    keyword
25           dayOfWeek    INTEGER    integer
26           timestamp  TIMESTAMP   datetime

'''
df

About the kibana_sample_data_flights table:


Unnamed: 0,column,type,mapping
0,AvgTicketPrice,REAL,float
1,Cancelled,BOOLEAN,boolean
2,Carrier,VARCHAR,keyword
3,Dest,VARCHAR,keyword
4,DestAirportID,VARCHAR,keyword
5,DestCityName,VARCHAR,keyword
6,DestCountry,VARCHAR,keyword
7,DestLocation,GEOMETRY,geo_point
8,DestRegion,VARCHAR,keyword
9,DestWeather,VARCHAR,keyword


### Query:

Get all values from the `kibana_sample_data_flights` table


In [11]:
query = "SELECT * from kibana_sample_data_flights"
resp = client.sql.query(body={'query':query})

In [12]:
column_names = []
for c in resp['columns']:
    column_names.append(c['name'])
    
df = pd.DataFrame(resp['rows'], columns=column_names)
# df.columns
print('Full content of the kibana_sample_data_flights table')

df

Full content of the kibana_sample_data_flights table


Unnamed: 0,AvgTicketPrice,Cancelled,Carrier,Dest,DestAirportID,DestCityName,DestCountry,DestLocation,DestRegion,DestWeather,...,FlightTimeMin,Origin,OriginAirportID,OriginCityName,OriginCountry,OriginLocation,OriginRegion,OriginWeather,dayOfWeek,timestamp
0,841.26560,False,Kibana Airlines,Sydney Kingsford Smith International Airport,SYD,Sydney,AU,POINT (151.177002 -33.94609833),SE-BD,Rain,...,1030.77040,Frankfurt am Main Airport,FRA,Frankfurt am Main,DE,POINT (8.570556 50.033333),DE-HE,Sunny,0,2021-10-25T00:00:00.000Z
1,882.98267,False,Logstash Airways,Venice Marco Polo Airport,VE05,Venice,IT,POINT (12.3519 45.505299),IT-34,Sunny,...,464.38950,Cape Town International Airport,CPT,Cape Town,ZA,POINT (18.60169983 -33.96480179),SE-BD,Clear,0,2021-10-25T18:27:00.000Z
2,190.63690,False,Logstash Airways,Venice Marco Polo Airport,VE05,Venice,IT,POINT (12.3519 45.505299),IT-34,Cloudy,...,0.00000,Venice Marco Polo Airport,VE05,Venice,IT,POINT (12.3519 45.505299),IT-34,Rain,0,2021-10-25T17:11:14.000Z
3,181.69421,True,Kibana Airlines,Treviso-Sant'Angelo Airport,TV01,Treviso,IT,POINT (12.1944 45.648399),IT-34,Clear,...,222.74905,Naples International Airport,NA01,Naples,IT,POINT (14.2908 40.886002),IT-72,Thunder & Lightning,0,2021-10-25T10:33:28.000Z
4,730.04175,False,Kibana Airlines,Xi'an Xianyang International Airport,XIY,Xi'an,CN,POINT (108.751999 34.447102),SE-BD,Clear,...,785.77905,Licenciado Benito Juarez International Airport,AICM,Mexico City,MX,POINT (-99.072098 19.4363),MX-DIF,Damaging Wind,0,2021-10-25T05:13:00.000Z
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,931.95290,False,ES-Air,Sydney Kingsford Smith International Airport,SYD,Sydney,AU,POINT (151.177002 -33.94609833),SE-BD,Hail,...,561.40295,Ministro Pistarini International Airport,EZE,Buenos Aires,AR,POINT (-58.5358 -34.8222),AR-B,Sunny,3,2021-10-28T06:34:53.000Z
996,978.93440,False,Logstash Airways,Tampa International Airport,TPA,Tampa,US,POINT (-82.53320313 27.97550011),US-FL,Cloudy,...,828.49286,OR Tambo International Airport,JNB,Johannesburg,ZA,POINT (28.246 -26.1392),SE-BD,Rain,3,2021-10-28T18:20:29.000Z
997,716.15850,False,Kibana Airlines,Pittsburgh International Airport,PIT,Pittsburgh,US,POINT (-80.23290253 40.49150085),US-PA,Sunny,...,373.98520,Pisa International Airport,PI05,Pisa,IT,POINT (10.3927 43.683899),IT-52,Cloudy,3,2021-10-28T23:38:13.000Z
998,777.72780,False,ES-Air,Sheremetyevo International Airport,SVO,Moscow,RU,POINT (37.4146 55.972599),RU-MOS,Cloudy,...,564.15594,San Diego International Airport,SAN,San Diego,US,POINT (-117.1900024 32.73360062),US-CA,Sunny,3,2021-10-28T02:29:57.000Z


In [13]:
len(df)

1000

### Routines
It is time to make a shared function that does the conversion for us.


In [14]:
def parse_resp(resp):
    column_names = []
    for c in resp['columns']:
        column_names.append(c['name'])    
    df = pd.DataFrame(resp['rows'], columns=column_names)
    return df

In [15]:
query = "SELECT * from kibana_sample_data_flights"
resp = client.sql.query(body={'query':query})
df = parse_resp(resp)
print('Full content of the kibana_sample_data_flights table')
df

Full content of the kibana_sample_data_flights table


Unnamed: 0,AvgTicketPrice,Cancelled,Carrier,Dest,DestAirportID,DestCityName,DestCountry,DestLocation,DestRegion,DestWeather,...,FlightTimeMin,Origin,OriginAirportID,OriginCityName,OriginCountry,OriginLocation,OriginRegion,OriginWeather,dayOfWeek,timestamp
0,841.26560,False,Kibana Airlines,Sydney Kingsford Smith International Airport,SYD,Sydney,AU,POINT (151.177002 -33.94609833),SE-BD,Rain,...,1030.77040,Frankfurt am Main Airport,FRA,Frankfurt am Main,DE,POINT (8.570556 50.033333),DE-HE,Sunny,0,2021-10-25T00:00:00.000Z
1,882.98267,False,Logstash Airways,Venice Marco Polo Airport,VE05,Venice,IT,POINT (12.3519 45.505299),IT-34,Sunny,...,464.38950,Cape Town International Airport,CPT,Cape Town,ZA,POINT (18.60169983 -33.96480179),SE-BD,Clear,0,2021-10-25T18:27:00.000Z
2,190.63690,False,Logstash Airways,Venice Marco Polo Airport,VE05,Venice,IT,POINT (12.3519 45.505299),IT-34,Cloudy,...,0.00000,Venice Marco Polo Airport,VE05,Venice,IT,POINT (12.3519 45.505299),IT-34,Rain,0,2021-10-25T17:11:14.000Z
3,181.69421,True,Kibana Airlines,Treviso-Sant'Angelo Airport,TV01,Treviso,IT,POINT (12.1944 45.648399),IT-34,Clear,...,222.74905,Naples International Airport,NA01,Naples,IT,POINT (14.2908 40.886002),IT-72,Thunder & Lightning,0,2021-10-25T10:33:28.000Z
4,730.04175,False,Kibana Airlines,Xi'an Xianyang International Airport,XIY,Xi'an,CN,POINT (108.751999 34.447102),SE-BD,Clear,...,785.77905,Licenciado Benito Juarez International Airport,AICM,Mexico City,MX,POINT (-99.072098 19.4363),MX-DIF,Damaging Wind,0,2021-10-25T05:13:00.000Z
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,931.95290,False,ES-Air,Sydney Kingsford Smith International Airport,SYD,Sydney,AU,POINT (151.177002 -33.94609833),SE-BD,Hail,...,561.40295,Ministro Pistarini International Airport,EZE,Buenos Aires,AR,POINT (-58.5358 -34.8222),AR-B,Sunny,3,2021-10-28T06:34:53.000Z
996,978.93440,False,Logstash Airways,Tampa International Airport,TPA,Tampa,US,POINT (-82.53320313 27.97550011),US-FL,Cloudy,...,828.49286,OR Tambo International Airport,JNB,Johannesburg,ZA,POINT (28.246 -26.1392),SE-BD,Rain,3,2021-10-28T18:20:29.000Z
997,716.15850,False,Kibana Airlines,Pittsburgh International Airport,PIT,Pittsburgh,US,POINT (-80.23290253 40.49150085),US-PA,Sunny,...,373.98520,Pisa International Airport,PI05,Pisa,IT,POINT (10.3927 43.683899),IT-52,Cloudy,3,2021-10-28T23:38:13.000Z
998,777.72780,False,ES-Air,Sheremetyevo International Airport,SVO,Moscow,RU,POINT (37.4146 55.972599),RU-MOS,Cloudy,...,564.15594,San Diego International Airport,SAN,San Diego,US,POINT (-117.1900024 32.73360062),US-CA,Sunny,3,2021-10-28T02:29:57.000Z


### Query:
List tables/indices

In [16]:
query = "show tables"
resp = client.sql.query(body={'query':query})
df = parse_resp(resp)
print('list all tables/indices')
df

list all tables/indices


Unnamed: 0,name,type,kind
0,.apm-agent-configuration,TABLE,INDEX
1,.apm-custom-link,TABLE,INDEX
2,.async-search,TABLE,INDEX
3,.kibana,VIEW,ALIAS
4,.kibana-event-log-7.14.0,VIEW,ALIAS
5,.kibana-event-log-7.14.0-000001,TABLE,INDEX
6,.kibana_7.14.0,VIEW,ALIAS
7,.kibana_7.14.0_001,TABLE,INDEX
8,.kibana_task_manager,VIEW,ALIAS
9,.kibana_task_manager_7.14.0,VIEW,ALIAS


### Query:
List flight numbers.

In [17]:
query = "SELECT FlightNum from kibana_sample_data_flights limit 1"
query = "SELECT FlightNum from kibana_sample_data_flights"
resp = client.sql.query(body={'query':query})
df = parse_resp(resp)
df

Unnamed: 0,FlightNum
0,9HY9SWR
1,X98CCZO
2,UFK2WIZ
3,EAYQW69
4,58U013N
...,...
995,4YOA44N
996,40RL4H9
997,J45CHYH
998,GJ8AJIY


### Query:
List origin cities and origin city names

In [18]:
query = "SELECT OriginCountry, OriginCityName from kibana_sample_data_flights limit 5"
resp = client.sql.query(body={'query':query})
df = parse_resp(resp)
df

Unnamed: 0,OriginCountry,OriginCityName
0,DE,Frankfurt am Main
1,ZA,Cape Town
2,IT,Venice
3,IT,Naples
4,MX,Mexico City


### Query:
List origin city names and destination city name where flight time is more than 5 hours and the origin country is US
sorted by flight time our by descending order.


In [19]:
query = "SELECT OriginCityName, DestCityName FROM kibana_sample_data_flights "\
      + "WHERE FlightTimeHour > 5 AND OriginCountry='US' " \
      + "ORDER BY FlightTimeHour DESC LIMIT 10"
resp = client.sql.query(body={'query':query})
df = parse_resp(resp)
df

Unnamed: 0,OriginCityName,DestCityName
0,Chicago,Oslo
1,Cleveland,Seoul
2,Denver,Chitose / Tomakomai
3,Nashville,Verona
4,Minneapolis,Tokyo
5,Portland,Treviso
6,Spokane,Vienna
7,Kansas City,Zurich
8,Kansas City,Shanghai
9,Los Angeles,Zurich


### Query:
List time, origin city names, destination city name where flight time is more than 5 hours
and after July sorted flight time by descending order.


In [20]:
query = "SELECT MONTH_OF_YEAR(timestamp) as Month, OriginCityName, DestCityName FROM kibana_sample_data_flights " \
      + "WHERE FlightTimeHour > 5 AND MONTH_OF_YEAR(timestamp) > 6 "\
      + "ORDER BY FlightTimeHour DESC LIMIT 10 "
resp = client.sql.query(body={'query':query})
df = parse_resp(resp)
df

Unnamed: 0,Month,OriginCityName,DestCityName
0,11,Chicago,Oslo
1,11,Osaka,Spokane
2,11,Quito,Tucson
3,11,Shanghai,Stockholm
4,11,Tokyo,Venice
5,11,Tokyo,Venice
6,12,Tokyo,Venice
7,12,Buenos Aires,Treviso
8,12,Amsterdam,Birmingham
9,11,Edmonton,Milan


### Query:
List time, flight numbers, averget ticket price where average ticket price is more than $500 ordered by the
average ticket price.

In [21]:
query = "SELECT timestamp, FlightNum, AvgTicketPrice FROM kibana_sample_data_flights " \
      + "WHERE AvgTicketPrice > 500 " \
      + "ORDER BY AvgTicketPrice"
resp = client.sql.query(body={'query':query})
df = parse_resp(resp)
df

Unnamed: 0,timestamp,FlightNum,AvgTicketPrice
0,2021-11-07T09:04:20.000Z,QG5DXD3,500.05316
1,2021-11-13T23:18:27.000Z,NXA71BT,500.12125
2,2021-10-29T01:55:18.000Z,VU8K9DM,500.15213
3,2021-11-05T08:46:45.000Z,UM8IKF8,500.22598
4,2021-10-31T19:38:41.000Z,J9P7G64,500.38850
...,...,...,...
995,2021-10-26T15:06:39.000Z,640A7I9,566.56490
996,2021-11-10T09:47:25.000Z,I1WK1BY,566.61224
997,2021-11-12T12:08:18.000Z,FDOFRUY,566.71070
998,2021-11-19T04:22:55.000Z,CCSJZVJ,566.71387


### Query:
List time, flight number, origin city name, destination city name, distances in miles, speeds, and day of the week
where data of the week is between Sunday and Tuesday and between 9 o'clock and before 11 o'clock
sorted by speed.

In [22]:
query = "SELECT timestamp, FlightNum, OriginCityName, DestCityName, "\
      + "       ROUND(DistanceMiles) AS distance, ROUND(DistanceMiles/CAST(FlightTimeHour as float)) AS speed, "\
      + "       DAY_OF_WEEK(timestamp) AS day_of_week "\
      + "FROM kibana_sample_data_flights "\
      + "WHERE DAY_OF_WEEK(timestamp) >= 0 "\
      + "  AND DAY_OF_WEEK(timestamp) <= 2 "\
      + "  AND HOUR_OF_DAY(timestamp) >=9 "\
      + "  AND HOUR_OF_DAY(timestamp) <= 10 "\
      + "ORDER BY speed DESC, distance DESC LIMIT 10"
resp = client.sql.query(body={'query':query})
df = parse_resp(resp)
df

Unnamed: 0,timestamp,FlightNum,OriginCityName,DestCityName,distance,speed,day_of_week
0,2021-11-28T10:53:52.000Z,LAJSKLT,Guangzhou,Lima,11398.0,783.0,1
1,2021-11-08T09:30:39.000Z,VLUDO2H,Buenos Aires,Moscow,8377.0,783.0,2
2,2021-11-15T10:35:51.000Z,F0HHFTH,Mexico City,Chitose / Tomakomai,6674.0,783.0,2
3,2021-11-15T09:18:23.000Z,GMVHXOX,Winnipeg,Xi'an,6410.0,783.0,2
4,2021-11-22T10:44:54.000Z,Z8AUAEK,Guangzhou,Rome,5699.0,783.0,2
5,2021-11-29T10:40:04.000Z,4TIXU86,Adelaide,Chitose / Tomakomai,5351.0,783.0,2
6,2021-10-25T10:30:50.000Z,T6CS587,Berlin,Shanghai,5229.0,783.0,2
7,2021-11-07T10:15:02.000Z,PRBKLR5,Edmonton,Seoul,5213.0,783.0,1
8,2021-11-01T09:45:47.000Z,A7P58N6,Barcelona,Hyderabad,4746.0,783.0,2
9,2021-11-29T09:47:41.000Z,RU2W6AO,Rome,Montreal,4102.0,783.0,2


### Query:
List average flight time in hours and origin countries grouped by origin countries and ordered also origin countries.

In [23]:
query = "SELECT AVG(CAST(FlightTimeHour as float)) Avg_Flight_Time, OriginCountry "\
      + "FROM kibana_sample_data_flights "\
      + "GROUP BY OriginCountry "\
      + "ORDER BY OriginCountry "\
      + "LIMIT 5"
resp = client.sql.query(body={'query':query})
df = parse_resp(resp)
df

Unnamed: 0,Avg_Flight_Time,OriginCountry
0,9.34218,AE
1,13.495823,AR
2,4.704097,AT
3,15.081367,AU
4,7.998943,CA


### Query:
List count, month, average flight time in hours group by months

In [24]:
query = "SELECT COUNT(*), MONTH_OF_YEAR(timestamp) AS month_of_year, "\
      + "AVG(CAST(FlightTimeHour as float)) AS Avg_Flight_Time "\
      + "FROM kibana_sample_data_flights "\
      + "GROUP BY month_of_year"
resp = client.sql.query(body={'query':query})
df = parse_resp(resp)
df

Unnamed: 0,COUNT(*),month_of_year,Avg_Flight_Time
0,2202,10,8.568505
1,9358,11,8.516041
2,1499,12,8.462987


### Query:
List average flight times in hours, origin countries grouped by origin countries and ordered by average flight times

In [25]:
query = "SELECT AVG(CAST(FlightTimeHour as float)) Avg_Flight_Time, OriginCountry "\
      + "FROM kibana_sample_data_flights "\
      + "GROUP BY OriginCountry "\
      + "ORDER BY Avg_Flight_Time"
resp = client.sql.query(body={'query':query})
df = parse_resp(resp)
df

Unnamed: 0,Avg_Flight_Time,OriginCountry
0,0.588542,CH
1,4.704097,AT
2,5.594127,SE
3,5.842859,NO
4,5.967528,PL
5,6.485994,IT
6,6.636758,RU
7,6.754419,DK
8,6.764488,GB
9,6.807198,NL


### Query:
List origin city names, average distances in kilometers, counts, percentile of distance that is 95 
grouped by origin city name having average distance between 3000 and 4000 kilometers.


In [26]:
query = "SELECT OriginCityName, ROUND(AVG(DistanceKilometers)) avg_distance, "\
      + "COUNT(*) c, ROUND(PERCENTILE(DistanceKilometers,95)) AS percentile_distance "\
      + "FROM kibana_sample_data_flights "\
      + "GROUP BY OriginCityName "\
      + "HAVING avg_distance "\
      + "BETWEEN 3000 AND 4000"
resp = client.sql.query(body={'query':query})
df = parse_resp(resp)
df

Unnamed: 0,OriginCityName,avg_distance,c,percentile_distance
0,Verona,3078.0,120,7927.0
1,Vienna,3596.0,120,7436.0
2,Xi'an,3842.0,114,7964.0


### Query:
List time, flight numbers, origin city names, destination city names where either origin or destination
city is `Cologne` and the flight was delayed and it took place between `25 and 27 oct 2021` ordered list by time

In [27]:
query = "SELECT timestamp, FlightNum, OriginCityName, DestCityName "\
      + "FROM kibana_sample_data_flights "\
      + "WHERE QUERY('Cologne') "\
      + "  AND FlightDelay=true "\
      + "  AND timestamp > '2021-10-25' "\
      + "  AND timestamp < '2021-10-27' "\
      + "ORDER BY timestamp"
resp = client.sql.query(body={'query':query})
df = parse_resp(resp)
df

Unnamed: 0,timestamp,FlightNum,OriginCityName,DestCityName
0,2021-10-26T22:19:48.000Z,1C0ZWE9,Cologne,Edmonton
1,2021-10-26T22:29:23.000Z,6E5W6T5,Cagliari,Cologne


### Query:
List scores (relevance), time, flight number, origin city names, destination city names, Destination Weather,
and Origin Weather where either destination or origin weather field is `Rain` and either origin or destation city name is `Barcelona`


**Score()**:

`Input`: none

`Output`: double numeric value

`Description`: Returns the relevance of a given input to the executed query. The higher score, the more relevant the data.


In [28]:
query = "SELECT Score(), timestamp, FlightNum, OriginCityName, DestCityName, DestWeather, OriginWeather "\
      + "FROM kibana_sample_data_flights "\
      + "WHERE MATCH('*Weather,*City*', 'Barcelona', 'type=cross_fields;operator=AND') "\
      + "ORDER BY Score() DESC "\
      + "LIMIT 5"
resp = client.sql.query(body={'query':query})
df = parse_resp(resp)
df

Unnamed: 0,Score(),timestamp,FlightNum,OriginCityName,DestCityName,DestWeather,OriginWeather
0,4.653004,2021-10-25T02:22:09.000Z,X25P1R0,Barcelona,Hyderabad,Rain,Sunny
1,4.653004,2021-10-25T11:58:34.000Z,KMBHQOV,Barcelona,Genova,Heavy Fog,Rain
2,4.653004,2021-10-26T05:12:59.000Z,T9I72LF,Barcelona,Buenos Aires,Rain,Cloudy
3,4.653004,2021-10-26T04:43:01.000Z,6SAJBDX,Barcelona,Cagliari,Hail,Hail
4,4.653004,2021-10-26T00:56:52.000Z,LYBY8VP,Barcelona,Seoul,Thunder & Lightning,Rain
