# Elasticsearch python API overview 

In [1]:
import warnings
# from elasticsearch import Elasticsearch, RequestsHttpConnection
from elasticsearch import Elasticsearch
# from elasticsearch.connection import RequestsHttpConnection
warnings.filterwarnings('ignore')

## Avant de commencer 

### Lancer elasticsearch avec docker 

Pour ce faire, on va run un cluster elastic dans un container. Si vous n'avez pas deja l'image elastic dans votre registery local il faut la pull du hub avec la commande suivante: 

```
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.11.1
```

puis on run le container sur le port 9200 tel que: 

```
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.11.1
```

### Lancer elasticsearch avec docker-compose 

On peut aussi lancer plusieurs noeud au sein d'un meme cluster avec docker-compose  tel que 

```Dockerfile 
version: '2.2'
services:
  es01:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.11.1
    container_name: es01
    environment:
      - node.name=es01
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es02,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - data01:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
    networks:
      - elastic
  es02:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.11.1
    container_name: es02
    environment:
      - node.name=es02
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es01,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - data02:/usr/share/elasticsearch/data
    networks:
      - elastic
  es03:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.11.1
    container_name: es03
    environment:
      - node.name=es03
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es01,es02
      - cluster.initial_master_nodes=es01,es02,es03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - data03:/usr/share/elasticsearch/data
    networks:
      - elastic

volumes:
  data01:
    driver: local
  data02:
    driver: local
  data03:
    driver: local

networks:
  elastic:
    driver: bridge
```

Plus d'info sur le doc [ici](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html)

#### 🚧Attention à votre configuration Docker 🚧
Elastic demande beaucoup de ressource à votre docker (et donc à votre machine) il faut avoir au moins configurer 4GB de memoire que Docker peut utiliser. Vous pouvez aussi changer directement la configuration de la JVM des container avec le paramètre `ES_JAVA_OPTS=-Xms512m -Xmx512m` et le passer à `256m` ou bien `128m`. 


### 📟 Exercice [optionnel]

**Ecrire un fichier `docker-compose.yml` avec un service Elasticsearch sur le port 9200 (un seul noeud) et un service Kibana sur le port 5601 ainsi qu'un network elnet**

## Ping du container 

In [2]:
import requests
res = requests.get('http://localhost:9200?pretty')
print(res.content)

b'{\n  "name" : "071480ecdbd1",\n  "cluster_name" : "docker-cluster",\n  "cluster_uuid" : "BSEbwWMhQaSpGiZkNxDYNA",\n  "version" : {\n    "number" : "7.11.1",\n    "build_flavor" : "default",\n    "build_type" : "docker",\n    "build_hash" : "ff17057114c2199c9c1bbecc727003a907c0db7a",\n    "build_date" : "2021-02-15T13:44:09.394032Z",\n    "build_snapshot" : false,\n    "lucene_version" : "8.7.0",\n    "minimum_wire_compatibility_version" : "6.8.0",\n    "minimum_index_compatibility_version" : "6.0.0-beta1"\n  },\n  "tagline" : "You Know, for Search"\n}\n'


In [3]:
es = Elasticsearch('http://localhost:9200')

## Create, delete and verify index

In [4]:
#create
es.indices.create(index="first_index",ignore=400)

#verify
print(es.indices.exists(index="first_index"))

#delete
print(es.indices.delete(index="first_index", ignore=[400,404]))

True
{'acknowledged': True}


## Insert documents

In [5]:
#documents to insert in the elasticsearch index "cities"
doc1 = {"city":"New Delhi", "country":"India"}
doc2 = {"city":"London", "country":"England"}
doc3 = {"city":"Los Angeles", "country":"USA"}

#Inserting doc1 in id=1
es.index(index="cities", doc_type="places", id=1, body=doc1)

#Inserting doc2 in id=2
es.index(index="cities", doc_type="places", id=2, body=doc2)

#Inserting doc3 in id=3
es.index(index="cities", doc_type="places", id=3, body=doc3)


{'_index': 'cities',
 '_type': 'places',
 '_id': '3',
 '_version': 5,
 'result': 'updated',
 '_shards': {'total': 2, 'successful': 1, 'failed': 0},
 '_seq_no': 14,
 '_primary_term': 2}

### 📟 Exercice [optionnel]
Trouver la fonction qui vérifie que votre index est bien crée.  

In [6]:
dir(es)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__enter__',
 '__eq__',
 '__exit__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'async_search',
 'autoscaling',
 'bulk',
 'cat',
 'ccr',
 'clear_scroll',
 'close',
 'close_point_in_time',
 'cluster',
 'count',
 'create',
 'dangling_indices',
 'data_frame',
 'delete',
 'delete_by_query',
 'delete_by_query_rethrottle',
 'delete_script',
 'deprecation',
 'enrich',
 'eql',
 'exists',
 'exists_source',
 'explain',
 'field_caps',
 'get',
 'get_script',
 'get_script_context',
 'get_script_languages',
 'get_source',
 'graph',
 'ilm',
 'index',
 'indices',
 'info',
 'ingest',
 'license',
 'mget',
 'migration',
 'ml',
 'monitoring',
 'msearch',
 'msearch_template',
 'mtermvectors',
 

In [7]:
print(es.exists("cities", 1, doc_type='places'),
      es.exists("cities", 2, doc_type='places'),
      es.exists("cities", 3, doc_type='places'))

True True True


### Retrieve data with id : `get`

In [8]:
res = es.get(index="cities", doc_type="places", id=2)
res

{'_index': 'cities',
 '_type': 'places',
 '_id': '2',
 '_version': 5,
 '_seq_no': 13,
 '_primary_term': 2,
 'found': True,
 '_source': {'city': 'London', 'country': 'England'}}

### 📟 Exercice [optionnel]
Afficher uniquement les informations ci-dessous à partir de la variable `res` 

In [9]:
# res=es.get(index="cities", doc_type="places", id=2, _source_includes=["city", "country"])
res=es.get(index="cities", doc_type="places", id=2, _source=True)

res['_source']

{'city': 'London', 'country': 'England'}

### Mapping

In [10]:
es.indices.get_mapping(index='cities')

{'cities': {'mappings': {'properties': {'city': {'type': 'text',
     'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
    'country': {'type': 'text',
     'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}}}}}

More about mappings: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

## Le endpoint `_search` et les `query`

Pour la suite des exemples assurez vous d'avoir importer les data via la `_bulk api`

In [11]:
res = es.search(index="cities", body={"query":{"match_all":{}}})
res

{'took': 0,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 3, 'relation': 'eq'},
  'max_score': 1.0,
  'hits': [{'_index': 'cities',
    '_type': 'places',
    '_id': '1',
    '_score': 1.0,
    '_source': {'city': 'New Delhi', 'country': 'India'}},
   {'_index': 'cities',
    '_type': 'places',
    '_id': '2',
    '_score': 1.0,
    '_source': {'city': 'London', 'country': 'England'}},
   {'_index': 'cities',
    '_type': 'places',
    '_id': '3',
    '_score': 1.0,
    '_source': {'city': 'Los Angeles', 'country': 'USA'}}]}}

### 📟 Exercice [optionnel]
Afficher uniquement les informations ci-dessous à partir de la variable `res` 

In [12]:
res['hits']['hits']

[{'_index': 'cities',
  '_type': 'places',
  '_id': '1',
  '_score': 1.0,
  '_source': {'city': 'New Delhi', 'country': 'India'}},
 {'_index': 'cities',
  '_type': 'places',
  '_id': '2',
  '_score': 1.0,
  '_source': {'city': 'London', 'country': 'England'}},
 {'_index': 'cities',
  '_type': 'places',
  '_id': '3',
  '_score': 1.0,
  '_source': {'city': 'Los Angeles', 'country': 'USA'}}]

### Affiner ces critères de recherche avec `_source`

In [13]:
es.search(index="movies", body={
  "_source": {
    "includes": [
      "*.title",
      "*.directors"
    ],
    "excludes": [
      "*.actors*",
      "*.genres"
    ]
  },
  "query": {
    "match": {
      "fields.directors": "George"
    }
  }
})

{'took': 3,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 56, 'relation': 'eq'},
  'max_score': 5.019411,
  'hits': [{'_index': 'movies',
    '_type': 'movie',
    '_id': '3378',
    '_score': 5.019411,
    '_source': {'fields': {'directors': ['George Miller', 'George Ogilvie'],
      'title': 'Mad Max Beyond Thunderdome'}}},
   {'_index': 'movies',
    '_type': 'movie',
    '_id': '226',
    '_score': 4.6606607,
    '_source': {'fields': {'directors': ['George Lucas'],
      'title': 'Star Wars'}}},
   {'_index': 'movies',
    '_type': 'movie',
    '_id': '371',
    '_score': 4.6606607,
    '_source': {'fields': {'directors': ['George Lucas'],
      'title': 'Star Wars: Episode III - Revenge of the Sith'}}},
   {'_index': 'movies',
    '_type': 'movie',
    '_id': '469',
    '_score': 4.6606607,
    '_source': {'fields': {'directors': ['George Lucas'],
      'title': 'Star Wars: Episode I - The Phantom Menace'}}

### Logique booléenne 

In [14]:
es.search(index="movies", body=
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "fields.directors": "George"
          }
        },
        {
          "match": {
            "fields.title": "Star Wars"
          }
        }
      ]
    }
  }
})


{'took': 1,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 4, 'relation': 'eq'},
  'max_score': 17.335049,
  'hits': [{'_index': 'movies',
    '_type': 'movie',
    '_id': '226',
    '_score': 17.335049,
    '_source': {'fields': {'directors': ['George Lucas'],
      'release_date': '1977-05-25T00:00:00Z',
      'rating': 8.7,
      'genres': ['Action', 'Adventure', 'Fantasy', 'Sci-Fi'],
      'image_url': 'http://ia.media-imdb.com/images/M/MV5BMTU4NTczODkwM15BMl5BanBnXkFtZTcwMzEyMTIyMw@@._V1_SX400_.jpg',
      'plot': "Luke Skywalker joins forces with a Jedi Knight, a cocky pilot, a wookiee and two droids to save the universe from the Empire's world-destroying battle-station, while also attempting to rescue Princess Leia from the evil Darth Vader.",
      'title': 'Star Wars',
      'rank': 226,
      'running_time_secs': 7260,
      'actors': ['Mark Hamill', 'Harrison Ford', 'Carrie Fisher'],
      'year': 1977}

### Les critères : SHOULD / MUST

In [15]:
res=es.search(index="movies", size=10, body=
{
  "query": {
    "bool": {
      "must": [
                  { "match": { "fields.title": "Star Wars"}}
      ],
      "must_not": { "match_phrase": { "fields.directors": "George Miller" }},
      "should": [
                  { "match": { "fields.title": "Star" }},
                  { "match": { "fields.directors": "George Lucas"}}
      ]
    }
  }
})
for i in range(len(res['hits']['hits'])):
    print(res['hits']['hits'][i]['_source']['fields']['title'], 
          res['hits']['hits'][i]['_source']['fields']['directors'],
          res['hits']['hits'][i]['_score'])

Star Wars ['George Lucas'] 29.932575
Star Wars: Episode I - The Phantom Menace ['George Lucas'] 21.437202
Star Wars: Episode III - Revenge of the Sith ['George Lucas'] 20.595882
Star Wars: Episode II - Attack of the Clones ['George Lucas'] 20.595882
Star Wars: Episode VII ['J.J. Abrams'] 13.803417
Star Trek ['J.J. Abrams'] 11.640135
Lone Star ['John Sayles'] 11.640135
Dark Star ['John Carpenter'] 11.640135
Rock Star ['Stephen Herek'] 11.640135
Bright Star ['Jane Campion'] 11.640135


In [16]:
res['hits']

{'total': {'value': 29, 'relation': 'eq'},
 'max_score': 29.932575,
 'hits': [{'_index': 'movies',
   '_type': 'movie',
   '_id': '226',
   '_score': 29.932575,
   '_source': {'fields': {'directors': ['George Lucas'],
     'release_date': '1977-05-25T00:00:00Z',
     'rating': 8.7,
     'genres': ['Action', 'Adventure', 'Fantasy', 'Sci-Fi'],
     'image_url': 'http://ia.media-imdb.com/images/M/MV5BMTU4NTczODkwM15BMl5BanBnXkFtZTcwMzEyMTIyMw@@._V1_SX400_.jpg',
     'plot': "Luke Skywalker joins forces with a Jedi Knight, a cocky pilot, a wookiee and two droids to save the universe from the Empire's world-destroying battle-station, while also attempting to rescue Princess Leia from the evil Darth Vader.",
     'title': 'Star Wars',
     'rank': 226,
     'running_time_secs': 7260,
     'actors': ['Mark Hamill', 'Harrison Ford', 'Carrie Fisher'],
     'year': 1977},
    'id': 'tt0076759',
    'type': 'add'}},
  {'_index': 'movies',
   '_type': 'movie',
   '_id': '469',
   '_score': 21.4372

In [17]:
es.search(index="movies", body=
{
  "query": {
    "bool": {
      "must": [
                  { "match": { "fields.title": "Star Wars"}}
                  
      ],
      "must_not": { "match": { "fields.directors": "George Miller" }},
      "should": [
                  { "match": { "fields.title": "Star" }},
                  { "match": { "fields.directors": "George Lucas"}}
      ]
    }
  }
})

{'took': 2,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 25, 'relation': 'eq'},
  'max_score': 13.803417,
  'hits': [{'_index': 'movies',
    '_type': 'movie',
    '_id': '168',
    '_score': 13.803417,
    '_source': {'fields': {'directors': ['J.J. Abrams'],
      'release_date': '2015-01-01T00:00:00Z',
      'genres': ['Action', 'Adventure', 'Fantasy', 'Sci-Fi'],
      'plot': 'A continuation of the saga created by George Lucas.',
      'title': 'Star Wars: Episode VII',
      'rank': 168,
      'year': 2015,
      'actors': ['Mark Hamill', 'Harrison Ford', 'Carrie Fisher']},
     'id': 'tt2488496',
     'type': 'add'}},
   {'_index': 'movies',
    '_type': 'movie',
    '_id': '128',
    '_score': 11.640135,
    '_source': {'fields': {'directors': ['J.J. Abrams'],
      'release_date': '2009-04-06T00:00:00Z',
      'rating': 8,
      'genres': ['Action', 'Adventure', 'Sci-Fi'],
      'image_url': 'http://ia.me

### Filtrer ses query avec `filter` 

On cherche ici les recettes avec un ingrédient de type `parmesan` sans ingrédient `tuna` en filtrant les recettes avec un temps de préparation inférieur ou egale à 15minutes.  

In [18]:
res=es.search(index="receipe", body={
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "ingredients.name": "parmesan"
          }
        }
      ], 
      "must_not": [
        {
          "match": {
            "ingredients.name": "tuna"
          }
        }
      ], 
      "filter": [
        {
          "range":{
            "preparation_time_minutes": {
              "lte":15
            }
          }
        }
        ]
    }
  }
})

for i in res['hits']['hits'] : print(i['_source']['preparation_time_minutes'])

12
15


### Recherche avec un prefix 

Les query de type `prefix` permettent de trouver tout les termes commencant par le(s) caractère(s) correspondant.  

In [19]:
es.search(index="cities", body={"query": {"prefix" : { "city" : "l" }}})

{'took': 1,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 2, 'relation': 'eq'},
  'max_score': 1.0,
  'hits': [{'_index': 'cities',
    '_type': 'places',
    '_id': '2',
    '_score': 1.0,
    '_source': {'city': 'London', 'country': 'England'}},
   {'_index': 'cities',
    '_type': 'places',
    '_id': '3',
    '_score': 1.0,
    '_source': {'city': 'Los Angeles', 'country': 'USA'}}]}}

### Rechercher avec des regex 

In [20]:
#tout afficher 
es.search(index="cities", body={"query": {"regexp" : { "city" : ".*" }}})

{'took': 0,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 3, 'relation': 'eq'},
  'max_score': 1.0,
  'hits': [{'_index': 'cities',
    '_type': 'places',
    '_id': '1',
    '_score': 1.0,
    '_source': {'city': 'New Delhi', 'country': 'India'}},
   {'_index': 'cities',
    '_type': 'places',
    '_id': '2',
    '_score': 1.0,
    '_source': {'city': 'London', 'country': 'England'}},
   {'_index': 'cities',
    '_type': 'places',
    '_id': '3',
    '_score': 1.0,
    '_source': {'city': 'Los Angeles', 'country': 'USA'}}]}}

In [21]:
#afficher les cities qui commencent par L
es.search(index="cities", body={"query": {"regexp" : { "city" : "l.*" }}})

{'took': 1,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 2, 'relation': 'eq'},
  'max_score': 1.0,
  'hits': [{'_index': 'cities',
    '_type': 'places',
    '_id': '2',
    '_score': 1.0,
    '_source': {'city': 'London', 'country': 'England'}},
   {'_index': 'cities',
    '_type': 'places',
    '_id': '3',
    '_score': 1.0,
    '_source': {'city': 'Los Angeles', 'country': 'USA'}}]}}

In [22]:
#afficher les cities qui commencent par L et terminent par n 
es.search(index="cities", body={"query": {"regexp" : { "city" : "l.*n" }}})

{'took': 1,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 1, 'relation': 'eq'},
  'max_score': 1.0,
  'hits': [{'_index': 'cities',
    '_type': 'places',
    '_id': '2',
    '_score': 1.0,
    '_source': {'city': 'London', 'country': 'England'}}]}}

### Agregation 

In [23]:
#agregation simple -> movies/years
res = es.search(index="movies",body={"aggs" : {
    "nb_par_annee" : {
        "terms" : {"field" : "fields.year"}
}}})
res['aggregations']

{'nb_par_annee': {'doc_count_error_upper_bound': 0,
  'sum_other_doc_count': 2192,
  'buckets': [{'key': 2013, 'doc_count': 448},
   {'key': 2012, 'doc_count': 404},
   {'key': 2011, 'doc_count': 308},
   {'key': 2009, 'doc_count': 253},
   {'key': 2010, 'doc_count': 249},
   {'key': 2008, 'doc_count': 207},
   {'key': 2006, 'doc_count': 204},
   {'key': 2007, 'doc_count': 200},
   {'key': 2005, 'doc_count': 170},
   {'key': 2014, 'doc_count': 152}]}}

In [24]:
#agregation et stats simple -> moyennes des raitings 
res = es.search(index="movies",body={"aggs" : {
    "note_moyenne" : {
        "avg" : {"field" : "fields.rating"}
}}})
res['aggregations']

{'note_moyenne': {'value': 6.387107691895831}}

In [25]:
#agregation et stats simple -> stats basiques raitings/years
res = es.search(index="movies",body={"aggs" : {
    "group_year" : {
        "terms" : { "field" : "fields.year" },
        "aggs" : {
            "note_moyenne" : {"avg" : {"field" : "fields.rating"}},
            "note_min" : {"min" : {"field" : "fields.rating"}},
            "note_max" : {"max" : {"field" : "fields.rating"}}
        }
}}})
res["aggregations"]

{'group_year': {'doc_count_error_upper_bound': 0,
  'sum_other_doc_count': 2192,
  'buckets': [{'key': 2013,
    'doc_count': 448,
    'note_max': {'value': 8.699999809265137},
    'note_moyenne': {'value': 5.962700002789497},
    'note_min': {'value': 2.5}},
   {'key': 2012,
    'doc_count': 404,
    'note_max': {'value': 8.600000381469727},
    'note_moyenne': {'value': 5.961786593160322},
    'note_min': {'value': 2.4000000953674316}},
   {'key': 2011,
    'doc_count': 308,
    'note_max': {'value': 8.5},
    'note_moyenne': {'value': 6.114285714440531},
    'note_min': {'value': 1.7000000476837158}},
   {'key': 2009,
    'doc_count': 253,
    'note_max': {'value': 8.399999618530273},
    'note_moyenne': {'value': 6.268774692248921},
    'note_min': {'value': 2.700000047683716}},
   {'key': 2010,
    'doc_count': 249,
    'note_max': {'value': 8.800000190734863},
    'note_moyenne': {'value': 6.239759046868627},
    'note_min': {'value': 1.7999999523162842}},
   {'key': 2008,
    'd

### 📟 Exercice [optionnel]
Tester d'autres requetes

### Datetime agrégation 

Pour illuster l'agrégation par datetime on va créer un index `travel` et utiliser des data de type :
```
doc1 = {"city":"Bangalore", "country":"India","datetime": datetime.datetime(2018,1,1,10,20,0)} 
```

In [26]:
#specify mapping and create index 
if es.indices.exists(index="travel"):
    es.indices.delete(index="travel", ignore=[400,404])

settings = {
    "settings": {
        "number_of_shards": 2,
        "number_of_replicas": 1
    },
    "mappings": {
            "properties": {
                "city": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "country": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "datetime": {
                        "type": "date",
                    }
        }
     }
}
es.indices.create(index="travel", ignore=400, body=settings)

{'acknowledged': True, 'shards_acknowledged': True, 'index': 'travel'}

In [27]:
import datetime
doc1 = {"city":"Bangalore", "country":"India","datetime": datetime.datetime(2018,1,1,10,20,0)} #datetime format: yyyy,MM,dd,hh,mm,ss
doc2 = {"city":"London", "country":"England","datetime": datetime.datetime(2018,1,2,22,30,0)}
doc3 = {"city":"Los Angeles", "country":"USA","datetime": datetime.datetime(2018,4,19,18,20,0)}
es.index(index="travel", id=1, body=doc1)
es.index(index="travel", id=2, body=doc2)
es.index(index="travel", id=3, body=doc3)

{'_index': 'travel',
 '_type': '_doc',
 '_id': '3',
 '_version': 1,
 'result': 'created',
 '_shards': {'total': 2, 'successful': 1, 'failed': 0},
 '_seq_no': 2,
 '_primary_term': 1}

In [28]:
es.indices.get_mapping(index='travel')

{'travel': {'mappings': {'properties': {'city': {'type': 'text',
     'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
    'country': {'type': 'text',
     'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
    'datetime': {'type': 'date'}}}}}

In [29]:
es.search(index="travel", body={"from": 0, "size": 0, "query": {"match_all": {}}, "aggs": {
                  "country": {
                      "date_histogram": {"field": "datetime", "calendar_interval": "year"}}}})

{'took': 1,
 'timed_out': False,
 '_shards': {'total': 2, 'successful': 2, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 0, 'relation': 'eq'},
  'max_score': None,
  'hits': []},
 'aggregations': {'country': {'buckets': []}}}

In [30]:
#la clé est le nombrede millisecondes depuis 1970
1514764800000/365/24/3600/1000, 2018-1970


(48.03287671232877, 48)

### 📟 Exercice [optionnel]
Créer le document suivant et inserer le en base afin de rafficher l'histogramme precedent, dite ce qui à changer.
```
doc4 = {"city":"Sydney", "country":"Australia","datetime":datetime.datetime(2019,4,19,18,20,0)}
```

In [31]:
doc4 = {"city":"Sydney", "country":"Australia","datetime":datetime.datetime(2019,4,19,18,20,0)}
es.index(index='travel', id=4, body=doc4)

{'_index': 'travel',
 '_type': '_doc',
 '_id': '4',
 '_version': 1,
 'result': 'created',
 '_shards': {'total': 2, 'successful': 1, 'failed': 0},
 '_seq_no': 0,
 '_primary_term': 1}

In [32]:
es.search(index="travel", body={ "size": 0, "query": {"match_all": {}}, "aggs": {
                  "country": {
                      "date_histogram": {"field": "datetime", "calendar_interval": "year"}}}})

{'took': 9,
 'timed_out': False,
 '_shards': {'total': 2, 'successful': 2, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 0, 'relation': 'eq'},
  'max_score': None,
  'hits': []},
 'aggregations': {'country': {'buckets': []}}}

In [33]:
# La date ajoutée représente un second élément dans l'histogramme, ce qui fait 3 points en 2018 et 1 en 2019.

## Search text introduction : endpoint `_analyze`

### Construire un Analyzer
Avant de commencer cette partie assurez vous d'avoir créer un `french analyzer` dans elasticsearch. 
Ci joint l'exemple d'analyzer francais vu dans le cour : 
```json
PUT french
{
  "settings": {
    "analysis": { 
      "filter": {
        "french_elision": {
          "type": "elision",
          "articles_case": true,
          "articles": ["l", "m", "t", "qu", "n", "s", "j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"]
        },
        "french_synonym": {
          "type": "synonym",
          "ignore_case": true,
          "expand": true,
          "synonyms": [
            "réviser, étudier, bosser",
            "mayo, mayonnaise",
            "grille, toaste"
          ]
        },
        "french_stemmer": {
          "type": "stemmer",
          "language": "light_french"
        }
      },
      "analyzer": {
        "french_heavy": {
          "tokenizer": "icu_tokenizer",
          "filter": [
            "french_elision",
            "icu_folding",
            "french_synonym",
            "french_stemmer"
          ]
        },
        "french_light": {
          "tokenizer": "icu_tokenizer",
          "filter": [
            "french_elision",
            "icu_folding"
          ]
        }
      }
    }
  }
}
```

🤓 Assurer vous d'installer le pluging qui contient `icu_tokenizer` avant sinon vous allez avoir une erreur. 

In [34]:
# docker exec [docker_id for elasticsearch] bin/elasticsearch-plugin install analysis-icu
# docker restart [docker_id for elasticsearch]

In [35]:
if es.indices.exists("french") : es.indices.delete(index="french")
es.indices.create(index='french' , body={
  "settings": {
    "analysis": { 
      "filter": {
        "french_elision": {
          "type": "elision",
          "articles_case": True,
          "articles": ["l", "m", "t", "qu", "n", "s", "j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"]
        },
        "french_synonym": {
          "type": "synonym",
          "ignore_case": True,
          "expand": True,
          "synonyms": [
            "réviser, étudier, bosser",
            "mayo, mayonnaise",
            "grille, toaste"
          ]
        },
        "french_stemmer": {
          "type": "stemmer",
          "language": "light_french"
        }
      },
      "analyzer": {
        "french_heavy": {
          "tokenizer": "icu_tokenizer",
          "filter": [
            "french_elision",
            "icu_folding",
            "french_synonym",
            "french_stemmer"
          ]
        },
        "french_light": {
          "tokenizer": "icu_tokenizer",
          "filter": [
            "french_elision",
            "icu_folding"
          ]
        }
      }
    }
  }
})


{'acknowledged': True, 'shards_acknowledged': True, 'index': 'french'}

In [36]:
doc1 = {"text" : "Une phrase en français :) ..."}
es.index(index="french", body=doc1)

{'_index': 'french',
 '_type': '_doc',
 '_id': 'fEv-5YYBqjgdNdAf_1UK',
 '_version': 1,
 'result': 'created',
 '_shards': {'total': 2, 'successful': 1, 'failed': 0},
 '_seq_no': 0,
 '_primary_term': 1}

In [37]:
es.indices.analyze(index="french",body={
  "text" : "Je dois bosser pour mon QCM sinon je vais avoir une sale note :( ..."
})

{'tokens': [{'token': 'je',
   'start_offset': 0,
   'end_offset': 2,
   'type': '<ALPHANUM>',
   'position': 0},
  {'token': 'dois',
   'start_offset': 3,
   'end_offset': 7,
   'type': '<ALPHANUM>',
   'position': 1},
  {'token': 'bosser',
   'start_offset': 8,
   'end_offset': 14,
   'type': '<ALPHANUM>',
   'position': 2},
  {'token': 'pour',
   'start_offset': 15,
   'end_offset': 19,
   'type': '<ALPHANUM>',
   'position': 3},
  {'token': 'mon',
   'start_offset': 20,
   'end_offset': 23,
   'type': '<ALPHANUM>',
   'position': 4},
  {'token': 'qcm',
   'start_offset': 24,
   'end_offset': 27,
   'type': '<ALPHANUM>',
   'position': 5},
  {'token': 'sinon',
   'start_offset': 28,
   'end_offset': 33,
   'type': '<ALPHANUM>',
   'position': 6},
  {'token': 'je',
   'start_offset': 34,
   'end_offset': 36,
   'type': '<ALPHANUM>',
   'position': 7},
  {'token': 'vais',
   'start_offset': 37,
   'end_offset': 41,
   'type': '<ALPHANUM>',
   'position': 8},
  {'token': 'avoir',
   's

### 📟 Exercice [optionnel]
Ajouter une fonctionnalités de reconnaissance de smiley à votre analyzer, de sorte qu'il fasse le lien suivant : 
```
:) -> _content_
:( -> _triste_
```
Faite ensuite une requete en python sur le document ci-dessous : 
```json
{
     "text" : "Je dois bosser pour mon QCM sinon je vais avoir une sale note :( ..."
}
```

In [38]:
#Redéfinition de l'analyzer:
if es.indices.exists("french") : es.indices.delete(index="french")
es.indices.create(index='french' , body={
  "settings": {
    "analysis": {
      "analyzer": {
        "french_heavy": {
          "char_filter": [
            "emoticons"
          ],
          "tokenizer": "icu_tokenizer",
          "filter": [
            "french_elision",
            "icu_folding",
            "french_synonym",
            "french_stemmer"
          ]
        },
        "french_light": {
          "char_filter": [
            "emoticons"
          ],
          "tokenizer": "icu_tokenizer",
          "filter": [
            "french_elision",
            "icu_folding"
          ]
        }
      },
      "char_filter": {
        "emoticons": { 
          "type": "mapping",
          "mappings": [
            ":) => _content_",
            ":( => _triste_"
          ]
        }
      },
      "filter": {
        "french_elision": {
          "type": "elision",
          "articles_case": True,
          "articles": ["l", "m", "t", "qu", "n", "s", "j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"]
        },
        "french_synonym": {
          "type": "synonym",
          "ignore_case": True,
          "expand": True,
          "synonyms": [
            "réviser, étudier, bosser",
            "mayo, mayonnaise",
            "grille, toaste"
          ]
        },
        "french_stemmer": {
          "type": "stemmer",
          "language": "light_french"
        }
      }
    }
  }
})

{'acknowledged': True, 'shards_acknowledged': True, 'index': 'french'}

In [39]:
#Analyse
es.indices.analyze(index="french",body={"analyzer":"french_light",
  "text" : "Je dois bosser pour mon QCM sinon je vais avoir une sale note :( ..."
})

{'tokens': [{'token': 'je',
   'start_offset': 0,
   'end_offset': 2,
   'type': '<ALPHANUM>',
   'position': 0},
  {'token': 'dois',
   'start_offset': 3,
   'end_offset': 7,
   'type': '<ALPHANUM>',
   'position': 1},
  {'token': 'bosser',
   'start_offset': 8,
   'end_offset': 14,
   'type': '<ALPHANUM>',
   'position': 2},
  {'token': 'pour',
   'start_offset': 15,
   'end_offset': 19,
   'type': '<ALPHANUM>',
   'position': 3},
  {'token': 'mon',
   'start_offset': 20,
   'end_offset': 23,
   'type': '<ALPHANUM>',
   'position': 4},
  {'token': 'qcm',
   'start_offset': 24,
   'end_offset': 27,
   'type': '<ALPHANUM>',
   'position': 5},
  {'token': 'sinon',
   'start_offset': 28,
   'end_offset': 33,
   'type': '<ALPHANUM>',
   'position': 6},
  {'token': 'je',
   'start_offset': 34,
   'end_offset': 36,
   'type': '<ALPHANUM>',
   'position': 7},
  {'token': 'vais',
   'start_offset': 37,
   'end_offset': 41,
   'type': '<ALPHANUM>',
   'position': 8},
  {'token': 'avoir',
   's