# Elastic Stack
L'objectif du TP est de continuer à prendre en main la stack Elastic(elasticsearch & Kibana) en mode local avec des requêtes d'agrégation

##  1- elasticsearch
Utiliser l'environnement déja mis en place lors du TP1 et utilisé lors du TP2

Vérifier que l'environnement est bien lancé :

* L'url d'elasticsearch:  http://localhost:9200
* L'url cerebro:  http://localhost:9000
* L'url de Kibana : http://localhost:5601

## 2- Aggregations

* 1- Ecrire l'aggrégation qui permet de chercher dans l'index, le nombre de films par année.

In [4]:
from elasticsearch import Elasticsearch

es = Elasticsearch("http://localhost:9200")

body = {
    "size": 0,
    "aggs": {
        "movies_per_year": {
            "terms": {
                "field": "fields.year",
                "size": 10
            }
        }
    }
}

response = es.search(index="movies2", body=body)

for bucket in response['aggregations']['movies_per_year']['buckets']:
    print(f"Year: {bucket['key']}, quantity of films: {bucket['doc_count']}")


Year: 2013, quantity of films: 448
Year: 2012, quantity of films: 404
Year: 2011, quantity of films: 308
Year: 2009, quantity of films: 253
Year: 2010, quantity of films: 249
Year: 2008, quantity of films: 207
Year: 2006, quantity of films: 204
Year: 2007, quantity of films: 200
Year: 2005, quantity of films: 170
Year: 2014, quantity of films: 152


  response = es.search(index="movies2", body=body)


* 2- Donner la note (rating) moyenne des films.

In [11]:
body = {
    "size": 0,
    "aggs": {
        "average_rating": {
            "avg": {"field": "fields.rating"}
        }
    }
}

response = es.search(index="movies2", body=body)

avg_rating = response['aggregations']['average_rating']['value']
print(f"the average rating : {avg_rating:.2f}")

the average rating : 6.39


  response = es.search(index="movies2", body=body)


* 3- Donner la note (rating) moyenne, et le rang moyen des films de George Lucas.

In [12]:
body = {
    "size": 0,
    "query": {
        "match": {"fields.directors": "George Lucas"}
    },
    "aggs": {
        "average_rating": {
            "avg": {"field": "fields.rating"}
        },
        "average_rank": {
            "avg": {"field": "fields.rank"}
        }
    }
}

response = es.search(index="movies2", body=body)

avg_rating = response['aggregations']['average_rating']['value']
avg_rank = response['aggregations']['average_rank']['value']

print(f"The average note films of George Lucas: {avg_rating:.2f}")
print(f"the Aver rank of films George Lucas: {avg_rank:.2f}")

The average note films of George Lucas: 6.92
the Aver rank of films George Lucas: 2580.98


  response = es.search(index="movies2", body=body)


* 4- Donnez la note (rating) moyenne des films par année. Attention, il y a ici une imbrication d’agrégats.

In [13]:
body = {
    "size": 0,
    "aggs": {
        "by_year": {
            "terms": {"field": "fields.year", "size": 10},
            "aggs": {
                "average_rating": {"avg": {"field": "fields.rating"}}
            }
        }
    }
}

response = es.search(index="movies2", body=body)

for bucket in response['aggregations']['by_year']['buckets']:
    print(f"Year: {bucket['key']}, Average rating : {bucket['average_rating']['value']:.2f}")

Year: 2013, Average rating : 5.96
Year: 2012, Average rating : 5.96
Year: 2011, Average rating : 6.11
Year: 2009, Average rating : 6.27
Year: 2010, Average rating : 6.24
Year: 2008, Average rating : 6.23
Year: 2006, Average rating : 6.32
Year: 2007, Average rating : 6.42
Year: 2005, Average rating : 6.29
Year: 2014, Average rating : 4.86


  response = es.search(index="movies2", body=body)


 * 5- Donner la note (rating) minimum, maximum et moyenne des films par année.

In [14]:
body = {
    "size": 0,
    "aggs": {
        "ratings_by_year": {
            "terms": {"field": "fields.year", "size": 10},
            "aggs": {
                "min_rating": {"min": {"field": "fields.rating"}},
                "max_rating": {"max": {"field": "fields.rating"}},
                "avg_rating": {"avg": {"field": "fields.rating"}}
            }
        }
    }
}

response = es.search(index="movies2", body=body)

for bucket in response['aggregations']['ratings_by_year']['buckets']:
    print(f"Year: {bucket['key']}, min: {bucket['min_rating']['value']:.2f}, max: {bucket['max_rating']['value']:.2f}, avg: {bucket['avg_rating']['value']:.2f}")

Year: 2013, min: 2.50, max: 8.70, avg: 5.96
Year: 2012, min: 2.40, max: 8.60, avg: 5.96
Year: 2011, min: 1.70, max: 8.50, avg: 6.11
Year: 2009, min: 2.70, max: 8.40, avg: 6.27
Year: 2010, min: 1.80, max: 8.80, avg: 6.24
Year: 2008, min: 1.80, max: 9.00, avg: 6.23
Year: 2006, min: 1.80, max: 8.50, avg: 6.32
Year: 2007, min: 2.20, max: 8.30, avg: 6.42
Year: 2005, min: 2.30, max: 8.30, avg: 6.29
Year: 2014, min: 4.86, max: 4.86, avg: 4.86


  response = es.search(index="movies2", body=body)


* 6- Donner le rang (rank) moyen des films par année et trier par ordre décroissant.

In [15]:
body = {
    "size": 0,
    "aggs": {
        "rank_by_year": {
            "terms": {
                "field": "fields.year",
                "size": 10,
                "order": {"avg_rank": "desc"}
            },
            "aggs": {
                "avg_rank": {"avg": {"field": "fields.rank"}}
            }
        }
    }
}

response = es.search(index="movies2", body=body)

for bucket in response['aggregations']['rank_by_year']['buckets']:
    print(f"Year: {bucket['key']}, Average rang: {bucket['avg_rank']['value']:.2f}")

Year: 1920, Average rang: 4950.00
Year: 1921, Average rang: 4925.00
Year: 2018, Average rang: 4896.00
Year: 1932, Average rang: 4243.00
Year: 1948, Average rang: 3786.00
Year: 1930, Average rang: 3784.00
Year: 1931, Average rang: 3658.00
Year: 1957, Average rang: 3657.77
Year: 1925, Average rang: 3564.50
Year: 1933, Average rang: 3551.00


  response = es.search(index="movies2", body=body)


* 7- Compter le nombre de films par tranche de note (0-1.9, 2-3.9, 4-5.9...). 

Indication : group_range.


In [None]:
body = {
    "size": 0,
    "aggs": {
        "rating_ranges": {
            "range": {
                "field": "fields.rating",
                "ranges": [
                    {"from": 0, "to": 1.9},
                    {"from": 2, "to": 3.9},
                    {"from": 4, "to": 5.9},
                    {"from": 6, "to": 7.9},
                    {"from": 8, "to": 10}
                ]
            }
        }
    }
}

response = es.search(index="movies2", body=body)

for bucket in response['aggregations']['rating_ranges']['buckets']:
    print(f"Rating from {bucket['from']} to {bucket.get('to', 'более')}: quantity of films: {bucket['doc_count']}")

 * 8- grouper par “genre” de film, et donner leurs occurrences :
 
indication : un mapping des propriétes peut être nécessaire..
 

In [None]:
body = {
    "size": 0,
    "aggs": {
        "movies_by_genre": {
            "terms": {
                "field": "fields.genres",
                "size": 20
            }
        }
    }
}

response = es.search(index="movies2", body=body)

print("Quantity of movies by genre:")
for bucket in response['aggregations']['movies_by_genre']['buckets']:
    print(f"Genre: {bucket['key']}, quantity: {bucket['doc_count']}")

### Bonus

   * Donner le nombre d’occurrences de chaque réalisateur ou réalisatrice.

   * Donner la note (rating) moyenne, le rang min et max, des films par acteur. 
   
   * Nombre de réalisateurs distincts pour les films d’aventure.

   * Termes les plus utilisés (agrégat : significant_terms) dans les descriptions des films de George Lucas.


* Donner la note moyenne par genre,
* Donner une note minimale, maximale et moyenne pour chaque genre,
* Donner le classement moyen des films par an et les trier par ordre ascendant,
* Donner le rang moyen du film et la note moyenne du film pour chaque réalisateur. Trier le résultat de manière décroissante en moyenne de rating
* Donner les termes occurrences extraites du titre de chaque film. 



# 3- Visualiser les résultats avec Kibana

lancer kibana : https://www.elastic.co/fr/downloads/kibana

Vérifier que c'est bien démarré via l'URL:  http://localhost:5601

RQ: Analyser les diff paramètres dans le fichier de conf: config/kibana.yml

Charger votre index movies dans Elasticsearch et contruire un Dashboard avec 3 Graphes pertinents de votre choix

# Ressources utiles:

voir https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html

