# Elasticsearch python API overview 

In [2]:
import warnings
from elasticsearch import Elasticsearch, RequestsHttpConnection
warnings.filterwarnings('ignore')

## Avant de commencer 

### Lancer elasticsearch avec docker 

Pour ce faire, on va run un cluster elastic dans un container. Si vous n'avez pas deja l'image elastic dans votre registery local il faut la pull du hub avec la commande suivante: 

```
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.11.1
```

puis on run le container sur le port 9200 tel que: 

```
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.11.1
```

### Lancer elasticsearch avec docker-compose 

On peut aussi lancer plusieurs noeud au sein d'un meme cluster avec docker-compose  tel que 

```Dockerfile 
version: '2.2'
services:
  es01:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.11.1
    container_name: es01
    environment:
      - node.name=es01
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es02,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - data01:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
    networks:
      - elastic
  es02:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.11.1
    container_name: es02
    environment:
      - node.name=es02
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es01,es03
      - cluster.initial_master_nodes=es01,es02,es03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - data02:/usr/share/elasticsearch/data
    networks:
      - elastic
  es03:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.11.1
    container_name: es03
    environment:
      - node.name=es03
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es01,es02
      - cluster.initial_master_nodes=es01,es02,es03
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - data03:/usr/share/elasticsearch/data
    networks:
      - elastic

volumes:
  data01:
    driver: local
  data02:
    driver: local
  data03:
    driver: local

networks:
  elastic:
    driver: bridge
```

Plus d'info sur le doc [ici](https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html)

#### 🚧Attention à votre configuration Docker 🚧
Elastic demande beaucoup de ressource à votre docker (et donc à votre machine) il faut avoir au moins configurer 4GB de memoire que Docker peut utiliser. Vous pouvez aussi changer directement la configuration de la JVM des container avec le paramètre `ES_JAVA_OPTS=-Xms512m -Xmx512m` et le passer à `256m` ou bien `128m`. 


### 📟 Exercice [optionnel]

**Ecrire un fichier `docker-compose.yml` avec un service Elasticsearch sur le port 9200 (un seul noeud) et un service Kibana sur le port 5601 ainsi qu'un network elnet**

## Ping du container 

In [3]:
import requests
res = requests.get('http://localhost:9200?pretty')
print(res.content)

b'{\n  "name" : "a61bef1bcc47",\n  "cluster_name" : "docker-cluster",\n  "cluster_uuid" : "YHpEfOfyRte--4t9yYEUhQ",\n  "version" : {\n    "number" : "7.11.1",\n    "build_flavor" : "default",\n    "build_type" : "docker",\n    "build_hash" : "ff17057114c2199c9c1bbecc727003a907c0db7a",\n    "build_date" : "2021-02-15T13:44:09.394032Z",\n    "build_snapshot" : false,\n    "lucene_version" : "8.7.0",\n    "minimum_wire_compatibility_version" : "6.8.0",\n    "minimum_index_compatibility_version" : "6.0.0-beta1"\n  },\n  "tagline" : "You Know, for Search"\n}\n'


In [3]:
es = Elasticsearch('http://localhost:9200')

## Create, delete and verify index

```python
#create
es.indices.create(index="first_index",ignore=400)

#verify
print es.indices.exists(index="first_index")

#delete
print es.indices.delete(index="first_index", ignore=[400,404])
```

In [6]:
#verify
print(es.indices.exists(index="movies"))

True


In [7]:
es.indices.create(index="first_index",ignore=400)

{'acknowledged': True, 'shards_acknowledged': True, 'index': 'first_index'}

## Insert documents

```python 
#documents to insert in the elasticsearch index "cities"
doc1 = {"city":"New Delhi", "country":"India"}
doc2 = {"city":"London", "country":"England"}
doc3 = {"city":"Los Angeles", "country":"USA"}

#Inserting doc1 in id=1
es.index(index="cities", doc_type="places", id=1, body=doc1)

#Inserting doc2 in id=2
es.index(index="cities", doc_type="places", id=2, body=doc2)

#Inserting doc3 in id=3
es.index(index="cities", doc_type="places", id=3, body=doc3)

```

In [10]:
doc1 = {"city":"New Delhi", "country":"India"}
doc2 = {"city":"London", "country":"England"}
doc3 = {"city":"Los Angeles", "country":"USA"}

In [12]:
#Inserting doc1 in id=1
es.index(index="cities", doc_type="places", id=1, body=doc1)

#Inserting doc2 in id=2
es.index(index="cities", doc_type="places", id=2, body=doc2)

#Inserting doc3 in id=3
es.index(index="cities", doc_type="places", id=3, body=doc3)

{'_index': 'cities',
 '_type': 'places',
 '_id': '3',
 '_version': 1,
 'result': 'created',
 '_shards': {'total': 2, 'successful': 1, 'failed': 0},
 '_seq_no': 2,
 '_primary_term': 1}

### 📟 Exercice [optionnel]
Trouver la fonction qui vérifie que votre index est bien crée.  

In [13]:
print(es.indices.exists(index="cities"))

True


### Retrieve data with id : `get`

In [15]:
res = es.get(index="cities", doc_type="places", id=2)
res, type(res)

({'_index': 'cities',
  '_type': 'places',
  '_id': '2',
  '_version': 1,
  '_seq_no': 1,
  '_primary_term': 1,
  'found': True,
  '_source': {'city': 'London', 'country': 'England'}},
 dict)

### 📟 Exercice [optionnel]
Afficher uniquement les informations ci-dessous à partir de la variable `res` 

In [16]:
res["_source"]

{'city': 'London', 'country': 'England'}

### Mapping

In [17]:
es.indices.get_mapping(index='cities')

{'cities': {'mappings': {'properties': {'city': {'type': 'text',
     'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
    'country': {'type': 'text',
     'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}}}}}

More about mappings: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

## Le endpoint `_search` et les `query`

Pour la suite des exemple assurez vous d'avoir importer les data via la `_bulk api`

In [23]:
res = es.search(index="cities", body={"query":{"match_all":{}}})
res

{'took': 0,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 3, 'relation': 'eq'},
  'max_score': 1.0,
  'hits': [{'_index': 'cities',
    '_type': 'places',
    '_id': '1',
    '_score': 1.0,
    '_source': {'city': 'New Delhi', 'country': 'India'}},
   {'_index': 'cities',
    '_type': 'places',
    '_id': '2',
    '_score': 1.0,
    '_source': {'city': 'London', 'country': 'England'}},
   {'_index': 'cities',
    '_type': 'places',
    '_id': '3',
    '_score': 1.0,
    '_source': {'city': 'Los Angeles', 'country': 'USA'}}]}}

### 📟 Exercice [optionnel]
Afficher uniquement les informations ci-dessous à partir de la variable `res` 

In [24]:
res["hits"]["hits"]

[{'_index': 'cities',
  '_type': 'places',
  '_id': '1',
  '_score': 1.0,
  '_source': {'city': 'New Delhi', 'country': 'India'}},
 {'_index': 'cities',
  '_type': 'places',
  '_id': '2',
  '_score': 1.0,
  '_source': {'city': 'London', 'country': 'England'}},
 {'_index': 'cities',
  '_type': 'places',
  '_id': '3',
  '_score': 1.0,
  '_source': {'city': 'Los Angeles', 'country': 'USA'}}]

### Affiner ces critères de recherche avec `_source`

In [28]:
res = es.search(index="movies", body={
  "_source": {
    "includes": [
      "*.title",
      "*.directors"
    ]
  },
  "query": {
    "match": {
      "fields.directors": "George"
    }
  }
})

res["hits"]["hits"]

[{'_index': 'movies',
  '_type': 'movie',
  '_id': '3378',
  '_score': 5.019411,
  '_source': {'fields': {'directors': ['George Miller', 'George Ogilvie'],
    'title': 'Mad Max Beyond Thunderdome'}}},
 {'_index': 'movies',
  '_type': 'movie',
  '_id': '226',
  '_score': 4.6606607,
  '_source': {'fields': {'directors': ['George Lucas'],
    'title': 'Star Wars'}}},
 {'_index': 'movies',
  '_type': 'movie',
  '_id': '371',
  '_score': 4.6606607,
  '_source': {'fields': {'directors': ['George Lucas'],
    'title': 'Star Wars: Episode III - Revenge of the Sith'}}},
 {'_index': 'movies',
  '_type': 'movie',
  '_id': '469',
  '_score': 4.6606607,
  '_source': {'fields': {'directors': ['George Lucas'],
    'title': 'Star Wars: Episode I - The Phantom Menace'}}},
 {'_index': 'movies',
  '_type': 'movie',
  '_id': '475',
  '_score': 4.6606607,
  '_source': {'fields': {'directors': ['George Clooney'],
    'title': 'The Monuments Men'}}},
 {'_index': 'movies',
  '_type': 'movie',
  '_id': '690',

### Logique booléenne 

In [31]:
es.search(index="movies", body=
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "fields.directors": "George"
          }
        },
        {
          "match": {
            "fields.title": "Star Wars"
          }
        }
      ]
    }
  }
})["hits"]["hits"]


[{'_index': 'movies',
  '_type': 'movie',
  '_id': '226',
  '_score': 17.335049,
  '_source': {'fields': {'directors': ['George Lucas'],
    'release_date': '1977-05-25T00:00:00Z',
    'rating': 8.7,
    'genres': ['Action', 'Adventure', 'Fantasy', 'Sci-Fi'],
    'image_url': 'http://ia.media-imdb.com/images/M/MV5BMTU4NTczODkwM15BMl5BanBnXkFtZTcwMzEyMTIyMw@@._V1_SX400_.jpg',
    'plot': "Luke Skywalker joins forces with a Jedi Knight, a cocky pilot, a wookiee and two droids to save the universe from the Empire's world-destroying battle-station, while also attempting to rescue Princess Leia from the evil Darth Vader.",
    'title': 'Star Wars',
    'rank': 226,
    'running_time_secs': 7260,
    'actors': ['Mark Hamill', 'Harrison Ford', 'Carrie Fisher'],
    'year': 1977},
   'id': 'tt0076759',
   'type': 'add'}},
 {'_index': 'movies',
  '_type': 'movie',
  '_id': '469',
  '_score': 11.513107,
  '_source': {'fields': {'directors': ['George Lucas'],
    'release_date': '1999-05-19T00:00

### Les critères : SHOULD / MUST

In [33]:
es.search(index="movies", body=
{
  "query": {
    "bool": {
      "must": [
                  { "match": { "fields.title": "Star Wars"}}
                  
      ],
      "must_not": { "match": { "fields.directors": "George Miller" }},
      "should": [
                  { "match": { "fields.title": "Star" }},
                  { "match": { "fields.directors": "George Lucas"}}
      ]
    }
  }
})["hits"]["hits"]

[{'_index': 'movies',
  '_type': 'movie',
  '_id': '168',
  '_score': 13.803417,
  '_source': {'fields': {'directors': ['J.J. Abrams'],
    'release_date': '2015-01-01T00:00:00Z',
    'genres': ['Action', 'Adventure', 'Fantasy', 'Sci-Fi'],
    'plot': 'A continuation of the saga created by George Lucas.',
    'title': 'Star Wars: Episode VII',
    'rank': 168,
    'year': 2015,
    'actors': ['Mark Hamill', 'Harrison Ford', 'Carrie Fisher']},
   'id': 'tt2488496',
   'type': 'add'}},
 {'_index': 'movies',
  '_type': 'movie',
  '_id': '128',
  '_score': 11.640135,
  '_source': {'fields': {'directors': ['J.J. Abrams'],
    'release_date': '2009-04-06T00:00:00Z',
    'rating': 8,
    'genres': ['Action', 'Adventure', 'Sci-Fi'],
    'image_url': 'http://ia.media-imdb.com/images/M/MV5BMjE5NDQ5OTE4Ml5BMl5BanBnXkFtZTcwOTE3NDIzMw@@._V1_SX400_.jpg',
    'plot': "The brash James T. Kirk tries to live up to his father's legacy with Mr. Spock keeping him in check as a vengeful, time-traveling Romu

### Filtrer ses query avec `filter` 

On cherche ici les recettes avec un ingrédient de type `parmesan` sans ingrédient `tuna` en filtrant les recettes avec un temps de préparation inférieur ou egale à 15minutes.  

In [36]:
es.search(index="receipe", body={
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "ingredients.name": "parmesan"
          }
        }
      ], 
      "must_not": [
        {
          "match": {
            "ingredients.name": "tuna"
          }
        }
      ], 
      "filter": [
        {
          "range":{
            "preparation_time_minutes": {
              "lte":15
            }
          }
        }
        ]
    }
  }
})["hits"]["hits"]

# Le filter n'influe pas sur le score, contrairement au bool

[{'_index': 'receipe',
  '_type': '_doc',
  '_id': '10',
  '_score': 1.3155547,
  '_source': {'title': 'Penne With Hot-As-You-Dare Arrabbiata Sauce',
   'description': 'Exceedingly simple in concept and execution, arrabbiata sauce is tomato sauce with the distinction of being spicy enough to earn its "angry" moniker. Here\'s how to make it, from start to finish.',
   'preparation_time_minutes': 15,
   'servings': {'min': 4, 'max': 4},
   'steps': ['In a medium saucepan of boiling salted water, cook penne until just short of al dente, about 1 minute less than the package recommends.',
    'Meanwhile, in a large skillet, combine oil, garlic, and pepper flakes. Cook over medium heat until garlic is very lightly golden, about 5 minutes. (Adjust heat as necessary to keep it gently sizzling.)',
    'Add tomatoes, stir to combine, and bring to a bare simmer. When pasta is ready, transfer it to sauce using a strainer or slotted spoon. (Alternatively, drain pasta through a colander, reserving 1

### Recherche avec un prefix 

Les query de type `prefix` permettent de trouver tout les termes commencant par le(s) caractère(s) correspondant.  

In [37]:
es.search(index="cities", body={"query": {"prefix" : { "city" : "l" }}})

{'took': 0,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 2, 'relation': 'eq'},
  'max_score': 1.0,
  'hits': [{'_index': 'cities',
    '_type': 'places',
    '_id': '2',
    '_score': 1.0,
    '_source': {'city': 'London', 'country': 'England'}},
   {'_index': 'cities',
    '_type': 'places',
    '_id': '3',
    '_score': 1.0,
    '_source': {'city': 'Los Angeles', 'country': 'USA'}}]}}

### Rechercher avec des regex 

In [38]:
#tout afficher 
es.search(index="cities", body={"query": {"regexp" : { "city" : ".*" }}})

{'took': 1,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 3, 'relation': 'eq'},
  'max_score': 1.0,
  'hits': [{'_index': 'cities',
    '_type': 'places',
    '_id': '1',
    '_score': 1.0,
    '_source': {'city': 'New Delhi', 'country': 'India'}},
   {'_index': 'cities',
    '_type': 'places',
    '_id': '2',
    '_score': 1.0,
    '_source': {'city': 'London', 'country': 'England'}},
   {'_index': 'cities',
    '_type': 'places',
    '_id': '3',
    '_score': 1.0,
    '_source': {'city': 'Los Angeles', 'country': 'USA'}}]}}

In [16]:
#afficher les cities qui commencent par L
es.search(index="cities", body={"query": {"regexp" : { "city" : "l.*" }}})

{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},
 'hits': {'hits': [{'_id': '2',
    '_index': 'cities',
    '_score': 1.0,
    '_source': {'city': 'London', 'country': 'England'},
    '_type': 'places'},
   {'_id': '3',
    '_index': 'cities',
    '_score': 1.0,
    '_source': {'city': 'Los Angeles', 'country': 'America Bitch'},
    '_type': 'places'}],
  'max_score': 1.0,
  'total': {'relation': 'eq', 'value': 2}},
 'timed_out': False,
 'took': 15}

In [17]:
#afficher les cities qui commencent par L et terminent par n 
es.search(index="cities", body={"query": {"regexp" : { "city" : "l.*n" }}})

{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1},
 'hits': {'hits': [{'_id': '2',
    '_index': 'cities',
    '_score': 1.0,
    '_source': {'city': 'London', 'country': 'England'},
    '_type': 'places'}],
  'max_score': 1.0,
  'total': {'relation': 'eq', 'value': 1}},
 'timed_out': False,
 'took': 50}

### Agregation 

In [55]:
#agregation simple -> movies/years
res = es.search(index="movies",body=
{
    "aggs" : {
        "nb_par_annee" : {
            "terms" : {
                "field" : "fields.year"
            }
        }
    }
})
res['aggregations']

{'nb_par_annee': {'doc_count_error_upper_bound': 0,
  'sum_other_doc_count': 2192,
  'buckets': [{'key': 2013, 'doc_count': 448},
   {'key': 2012, 'doc_count': 404},
   {'key': 2011, 'doc_count': 308},
   {'key': 2009, 'doc_count': 253},
   {'key': 2010, 'doc_count': 249},
   {'key': 2008, 'doc_count': 207},
   {'key': 2006, 'doc_count': 204},
   {'key': 2007, 'doc_count': 200},
   {'key': 2005, 'doc_count': 170},
   {'key': 2014, 'doc_count': 152}]}}

In [None]:
{
    'nb_par_annee': {
        'doc_count_error_upper_bound': 0,
        'sum_other_doc_count': 2192,
        'buckets': [
            {'key': 2013, 'doc_count': 448},
            {'key': 2012, 'doc_count': 404},
            {'key': 2011, 'doc_count': 308},
            {'key': 2009, 'doc_count': 253},
            {'key': 2010, 'doc_count': 249},
            {'key': 2008, 'doc_count': 207},
            {'key': 2006, 'doc_count': 204},
            {'key': 2007, 'doc_count': 200},
            {'key': 2005, 'doc_count': 170},
            {'key': 2014, 'doc_count': 152}
        ]
    }
}

In [57]:
#agregation et stats simple -> moyennes des raitings 
res = es.search(index="movies",body={"aggs" : {
    "note_moyenne" : {
        "avg" : {"field" : "fields.rating"}
}}})
res['aggregations']

{'note_moyenne': {'value': 6.387107691895831}}

In [58]:
#agregation et stats simple -> stats basiques raitings/years
res = es.search(index="movies",body={"aggs" : {
    "group_year" : {
        "terms" : { "field" : "fields.year" },
        "aggs" : {
            "note_moyenne" : {"avg" : {"field" : "fields.rating"}},
            "note_min" : {"min" : {"field" : "fields.rating"}},
            "note_max" : {"max" : {"field" : "fields.rating"}}
        }
}}})
res["aggregations"]

{'group_year': {'doc_count_error_upper_bound': 0,
  'sum_other_doc_count': 2192,
  'buckets': [{'key': 2013,
    'doc_count': 448,
    'note_max': {'value': 8.699999809265137},
    'note_moyenne': {'value': 5.962700002789497},
    'note_min': {'value': 2.5}},
   {'key': 2012,
    'doc_count': 404,
    'note_max': {'value': 8.600000381469727},
    'note_moyenne': {'value': 5.961786593160322},
    'note_min': {'value': 2.4000000953674316}},
   {'key': 2011,
    'doc_count': 308,
    'note_max': {'value': 8.5},
    'note_moyenne': {'value': 6.114285714440531},
    'note_min': {'value': 1.7000000476837158}},
   {'key': 2009,
    'doc_count': 253,
    'note_max': {'value': 8.399999618530273},
    'note_moyenne': {'value': 6.268774692248921},
    'note_min': {'value': 2.700000047683716}},
   {'key': 2010,
    'doc_count': 249,
    'note_max': {'value': 8.800000190734863},
    'note_moyenne': {'value': 6.239759046868627},
    'note_min': {'value': 1.7999999523162842}},
   {'key': 2008,
    'd

### 📟 Exercice [optionnel]
Tester d'autres requetes

### Datetime agrégation 

Pour illuster l'agrégation par datetime on va créer un index `travel` et utiliser des data de type :
```
doc1 = {"city":"Bangalore", "country":"India","datetime": datetime.datetime(2018,1,1,10,20,0)} 
```

In [60]:
#specify mapping and create index 
if es.indices.exists(index="travel"):
    es.indices.delete(index="travel", ignore=[400,404])

settings = {
    "settings": {
        "number_of_shards": 2,
        "number_of_replicas": 1
    },
    "mappings": {
            "properties": {
                "city": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "country": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "datetime": {
                        "type": "date",
                    }
        }
     }
}
es.indices.create(index="travel", ignore=400, body=settings)

{'acknowledged': True, 'shards_acknowledged': True, 'index': 'travel'}

In [61]:
import datetime
doc1 = {"city":"Bangalore", "country":"India","datetime": datetime.datetime(2018,1,1,10,20,0)} #datetime format: yyyy,MM,dd,hh,mm,ss
doc2 = {"city":"London", "country":"England","datetime": datetime.datetime(2018,1,2,22,30,0)}
doc3 = {"city":"Los Angeles", "country":"USA","datetime": datetime.datetime(2018,4,19,18,20,0)}
es.index(index="travel", id=1, body=doc1)
es.index(index="travel", id=2, body=doc2)
es.index(index="travel", id=3, body=doc3)

{'_index': 'travel',
 '_type': '_doc',
 '_id': '3',
 '_version': 1,
 'result': 'created',
 '_shards': {'total': 2, 'successful': 1, 'failed': 0},
 '_seq_no': 2,
 '_primary_term': 1}

In [62]:
es.indices.get_mapping(index='travel')

{'travel': {'mappings': {'properties': {'city': {'type': 'text',
     'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
    'country': {'type': 'text',
     'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}},
    'datetime': {'type': 'date'}}}}}

In [63]:
es.search(index="travel", body=
{
    "from": 0,
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "country": {
            "date_histogram": {
                "field": "datetime", 
                "calendar_interval": "year"
            }
        }
    }
})

{'took': 21,
 'timed_out': False,
 '_shards': {'total': 2, 'successful': 2, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 3, 'relation': 'eq'},
  'max_score': None,
  'hits': []},
 'aggregations': {'country': {'buckets': [{'key_as_string': '2018-01-01T00:00:00.000Z',
     'key': 1514764800000,
     'doc_count': 3}]}}}

In [None]:
{
    'took': 21,
    'timed_out': False,
    '_shards': {
        'total': 2, 
        'successful': 2, 
        'skipped': 0, 
        'failed': 0
    },
    'hits': {
        'total': {
            'value': 3, 
            'relation': 'eq'
        },
        'max_score': None,
        'hits': []
    },
    'aggregations': {
        'country': {
            'buckets': [{
                'key_as_string': '2018-01-01T00:00:00.000Z',
                'key': 1514764800000,
                'doc_count': 3
            }]
        }
    }
}

### 📟 Exercice [optionnel]
Créer le document suivant et inserer le en base afin de rafficher l'histogramme precedent, dite ce qui à changer.
```
doc4 = {"city":"Sydney", "country":"Australia","datetime":datetime.datetime(2019,4,19,18,20,0)}
```

In [64]:
doc4 = {"city":"Sydney", "country":"Australia","datetime":datetime.datetime(2019,4,19,18,20,0)}
es.index(index="travel", id=4, body=doc4)

{'_index': 'travel',
 '_type': '_doc',
 '_id': '4',
 '_version': 1,
 'result': 'created',
 '_shards': {'total': 2, 'successful': 1, 'failed': 0},
 '_seq_no': 0,
 '_primary_term': 1}

## Search text introduction : endpoint `_analyze`

### Construire un Analyzer
Avant de commencer cette partie assurez vous d'avoir créer un `french analyzer` dans elasticsearch. 
Ci joint l'exemple d'analyzer francais vu dans le cour : 
```json
PUT french
{
  "settings": {
    "analysis": { 
      "filter": {
        "french_elision": {
          "type": "elision",
          "articles_case": true,
          "articles": ["l", "m", "t", "qu", "n", "s", "j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"]
        },
        "french_synonym": {
          "type": "synonym",
          "ignore_case": true,
          "expand": true,
          "synonyms": [
            "réviser, étudier, bosser",
            "mayo, mayonnaise",
            "grille, toaste"
          ]
        },
        "french_stemmer": {
          "type": "stemmer",
          "language": "light_french"
        }
      },
      "analyzer": {
        "french_heavy": {
          "tokenizer": "icu_tokenizer",
          "filter": [
            "french_elision",
            "icu_folding",
            "french_synonym",
            "french_stemmer"
          ]
        },
        "french_light": {
          "tokenizer": "icu_tokenizer",
          "filter": [
            "french_elision",
            "icu_folding"
          ]
        }
      }
    }
  }
}
```

🤓 Assurer vous d'installer le pluging qui contient `icu_tokenizer` avant sinon vous allez avoir une erreur. 

In [67]:
doc1 = {"text" : "Une phrase en français :) ..."}
es.index(index="french", id=1, body=doc1)

{'_index': 'french',
 '_type': '_doc',
 '_id': '1',
 '_version': 1,
 'result': 'created',
 '_shards': {'total': 2, 'successful': 1, 'failed': 0},
 '_seq_no': 0,
 '_primary_term': 1}

In [68]:
es.indices.analyze(index="french",body={
  "text" : "Je dois bosser pour mon QCM sinon je vais avoir une sale note :( ..."
})

{'tokens': [{'token': 'je',
   'start_offset': 0,
   'end_offset': 2,
   'type': '<ALPHANUM>',
   'position': 0},
  {'token': 'dois',
   'start_offset': 3,
   'end_offset': 7,
   'type': '<ALPHANUM>',
   'position': 1},
  {'token': 'bosser',
   'start_offset': 8,
   'end_offset': 14,
   'type': '<ALPHANUM>',
   'position': 2},
  {'token': 'pour',
   'start_offset': 15,
   'end_offset': 19,
   'type': '<ALPHANUM>',
   'position': 3},
  {'token': 'mon',
   'start_offset': 20,
   'end_offset': 23,
   'type': '<ALPHANUM>',
   'position': 4},
  {'token': 'qcm',
   'start_offset': 24,
   'end_offset': 27,
   'type': '<ALPHANUM>',
   'position': 5},
  {'token': 'sinon',
   'start_offset': 28,
   'end_offset': 33,
   'type': '<ALPHANUM>',
   'position': 6},
  {'token': 'je',
   'start_offset': 34,
   'end_offset': 36,
   'type': '<ALPHANUM>',
   'position': 7},
  {'token': 'vais',
   'start_offset': 37,
   'end_offset': 41,
   'type': '<ALPHANUM>',
   'position': 8},
  {'token': 'avoir',
   's

### 📟 Exercice [optionnel]
Ajouter une fonctionnalités de reconnaissance de smiley à votre analyzer, de sorte qu'il fasse le lien suivant : 
```
:) -> _content_
:( -> _triste_
```
Faite ensuite une requete en python sur le document ci-dessous : 
```json
{
     "text" : "Je dois bosser pour mon QCM sinon je vais avoir une sale note :( ..."
}
```

In [71]:
help(es.indices.analyze)

Help on method analyze in module elasticsearch.client.indices:

analyze(body=None, index=None, params=None, headers=None) method of elasticsearch.client.indices.IndicesClient instance
    Performs the analysis process on a text and return the tokens breakdown of the
    text.
    
    `<https://www.elastic.co/guide/en/elasticsearch/reference/7.13/indices-analyze.html>`_
    
    :arg body: Define analyzer/tokenizer parameters and the text on
        which the analysis should be performed
    :arg index: The name of the index to scope the operation



In [7]:
es.indices.analyze(index="french",body={
    "analyzer": "french_heavy",
    "text" : "Je dois bosser pour mon QCM sinon je vais avoir une sale note :) ..."
})

{'tokens': [{'token': 'je',
   'start_offset': 0,
   'end_offset': 2,
   'type': '<ALPHANUM>',
   'position': 0},
  {'token': 'doi',
   'start_offset': 3,
   'end_offset': 7,
   'type': '<ALPHANUM>',
   'position': 1},
  {'token': 'bos',
   'start_offset': 8,
   'end_offset': 14,
   'type': '<ALPHANUM>',
   'position': 2},
  {'token': 'revis',
   'start_offset': 8,
   'end_offset': 14,
   'type': 'SYNONYM',
   'position': 2},
  {'token': 'etudi',
   'start_offset': 8,
   'end_offset': 14,
   'type': 'SYNONYM',
   'position': 2},
  {'token': 'taf',
   'start_offset': 8,
   'end_offset': 14,
   'type': 'SYNONYM',
   'position': 2},
  {'token': 'pour',
   'start_offset': 15,
   'end_offset': 19,
   'type': '<ALPHANUM>',
   'position': 3},
  {'token': 'mon',
   'start_offset': 20,
   'end_offset': 23,
   'type': '<ALPHANUM>',
   'position': 4},
  {'token': 'qcm',
   'start_offset': 24,
   'end_offset': 27,
   'type': '<ALPHANUM>',
   'position': 5},
  {'token': 'sinon',
   'start_offset': 

In [76]:
es.indices.analyze(index="french",body={
    "analyzer": "french_light",
    "text" : "Je vais réviser et bosser"
})

{'tokens': [{'token': 'je',
   'start_offset': 0,
   'end_offset': 2,
   'type': '<ALPHANUM>',
   'position': 0},
  {'token': 'vais',
   'start_offset': 3,
   'end_offset': 7,
   'type': '<ALPHANUM>',
   'position': 1},
  {'token': 'reviser',
   'start_offset': 8,
   'end_offset': 15,
   'type': '<ALPHANUM>',
   'position': 2},
  {'token': 'etudier',
   'start_offset': 8,
   'end_offset': 15,
   'type': 'SYNONYM',
   'position': 2},
  {'token': 'bosser',
   'start_offset': 8,
   'end_offset': 15,
   'type': 'SYNONYM',
   'position': 2},
  {'token': 'taffer',
   'start_offset': 8,
   'end_offset': 15,
   'type': 'SYNONYM',
   'position': 2},
  {'token': 'et',
   'start_offset': 16,
   'end_offset': 18,
   'type': '<ALPHANUM>',
   'position': 3},
  {'token': 'bosser',
   'start_offset': 19,
   'end_offset': 25,
   'type': '<ALPHANUM>',
   'position': 4},
  {'token': 'reviser',
   'start_offset': 19,
   'end_offset': 25,
   'type': 'SYNONYM',
   'position': 4},
  {'token': 'etudier',
   '

### On refait les cities à partir des données

De https://github.com/lutangar/cities.json

In [39]:
es.indices.delete(index="cities", ignore=[400,404])

{'acknowledged': True}

In [52]:
# Insertion de nos données cities
# ATTENTION: voir avec bulk, peut-être plus optimal
# Par défaut, 1 shard seulement

import sys
import json

with open("cities.json") as json_file:
    json_docs = json.load(json_file)


len(json_docs), type(json_docs[0])

for i in range(len(json_docs)):
    es.index(index="cities", doc_type="places", id=i+1, body=json_docs[i])

In [54]:
res = es.get(index="cities", doc_type="places", id=1312)
res

{'_index': 'cities',
 '_type': 'places',
 '_id': '1312',
 '_version': 1,
 '_seq_no': 1311,
 '_primary_term': 1,
 'found': True,
 '_source': {'country': 'AR',
  'name': 'El Alcázar',
  'lat': '-26.71459',
  'lng': '-54.81523'}}

In [124]:
#tout afficher 
es.search(index="cities", body=
{
    "query": {
        "regexp": {
            "name" : "lès.*Nancy"
        }
    }
})

{'took': 4,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 0, 'relation': 'eq'},
  'max_score': None,
  'hits': []}}