Features described in this module are primarly designed for interactive usage, for instance in an ipython shell<https://ipython.org/>_, since one of the key features is the intuitive usage provided by auto-completion.
:func:`~pandagg.discovery.discover` function list all indices on a cluster matching a provided pattern:
>>> from elasticsearch import Elasticsearch >>> from pandagg.discovery import discover >>> client = Elasticsearch(hosts=['xxx']) >>> indices = discover(client, index='mov*') >>> indices <Indices> ['movies', 'movies_fake']
Each of the indices is accessible via autocompletion:
>>> indices.movies <Index 'movies'>
An :class:`~pandagg.discovery.Index` exposes: settings, mapping (interactive), aliases and name:
>>> movies = indices.movies >>> movies.settings {'index': {'creation_date': '1591824202943', 'number_of_shards': '1', 'number_of_replicas': '1', 'uuid': 'v6Amj9x1Sk-trBShI-188A', 'version': {'created': '7070199'}, 'provided_name': 'movies'}}
>>> movies.mapping <Mapping> _ ├── directors [Nested] │ ├── director_id Keyword │ ├── first_name Text │ │ └── raw ~ Keyword │ ├── full_name Text │ │ └── raw ~ Keyword │ ├── genres Keyword │ └── last_name Text │ └── raw ~ Keyword ├── genres Keyword ├── movie_id Keyword ├── name Text │ └── raw ~ Keyword ├── nb_directors Integer ├── nb_roles Integer ├── rank Float ├── roles [Nested] │ ├── actor_id Keyword │ ├── first_name Text │ │ └── raw ~ Keyword │ ├── full_name Text │ │ └── raw ~ Keyword │ ├── gender Keyword │ ├── last_name Text │ │ └── raw ~ Keyword │ └── role Keyword └── year Integer
The :class:`~pandagg.discovery.Index` mapping attribute returns a :class:`~pandagg.interactive.mapping.IMapping` instance that provides navigation features with autocompletion to quickly discover a large mapping:
>>> movies.roles <Mapping subpart: roles> roles [Nested] ├── actor_id Integer ├── first_name Text │ └── raw ~ Keyword ├── gender Keyword ├── last_name Text │ └── raw ~ Keyword └── role Keyword >>> movies.roles.first_name <IMapping subpart: roles.first_name> first_name Text └── raw ~ Keyword
Note
a navigable mapping can be obtained directly using :class:`~pandagg.interactive.mapping.IMapping` class without using discovery module:
>>> from pandagg.mapping import IMapping >>> from examples.imdb.load import mapping >>> m = IMapping(mapping) >>> m.roles.first_name <Mapping subpart: roles.first_name> first_name Text └── raw ~ Keyword
To get the complete field definition, just call it:
>>> movies.roles.first_name() <Mapping Field first_name> of type text: { "type": "text", "fields": { "raw": { "type": "keyword" } } }
A IMapping instance can be bound to an Elasticsearch client to get quick access to aggregations computation on mapping fields.
Suppose you have the following client:
>>> from elasticsearch import Elasticsearch >>> client = Elasticsearch(hosts=['localhost:9200'])
Client can be bound at instantiation:
>>> movies = IMapping(mapping, client=client, index_name='movies')
Doing so will generate a a attribute on mapping fields, this attribute will list all available aggregation for that field type (with autocompletion):
>>> movies.roles.gender.a.terms() [('M', {'key': 'M', 'doc_count': 2296792}), ('F', {'key': 'F', 'doc_count': 1135174})]
Note
Nested clauses will be automatically taken into account.
When executing a :class:`~pandagg.search.Search` request with aggregations, resulting aggregations can be parsed in multiple formats as described :doc:`user-guide.response`.
Suppose we execute the following search request:
>>> from elasticsearch import Elasticsearch >>> from pandagg.search import Search >>> >>> client = ElasticSearch(hosts=['localhost:9200']) >>> response = Search(using=client, index='movies')\ >>> .size(0)\ >>> .groupby('decade', 'histogram', interval=10, field='year')\ >>> .groupby('genres', size=3)\ >>> .aggs('avg_rank', 'avg', field='rank')\ >>> .aggs('avg_nb_roles', 'avg', field='nb_roles')\ >>> .filter('range', year={"gte": 1990})\ >>> .execute()
One of the available serialization methods for aggregations, :func:`~pandagg.response.Aggregations.to_interactive_tree`, generates an interactive tree of class :class:`~pandagg.interactive.response.IResponse`:
>>> tree = response.aggregations.to_interactive_tree() >>> tree <IResponse> root ├── decade=1990 79495 │ ├── genres=Documentary 8393 │ │ ├── avg_nb_roles 3.7789824854045038 │ │ └── avg_rank 6.517093241977517 │ ├── genres=Drama 12232 │ │ ├── avg_nb_roles 18.518067364290385 │ │ └── avg_rank 5.981429367965072 │ └── genres=Short 12197 │ ├── avg_nb_roles 3.023284414200213 │ └── avg_rank 6.311325829450123 └── decade=2000 57649 ├── genres=Documentary 8639 │ ├── avg_nb_roles 5.581433036231045 │ └── avg_rank 6.980897812811443 ├── genres=Drama 11500 │ ├── avg_nb_roles 14.385391304347825 │ └── avg_rank 6.269675415719865 └── genres=Short 13451 ├── avg_nb_roles 4.053081555274701 └── avg_rank 6.83625304327684
This tree provides auto-completion on each node to select a subpart of the tree:
>>> tree.decade_1990 <IResponse subpart: decade_1990> decade=1990 79495 ├── genres=Documentary 8393 │ ├── avg_nb_roles 3.7789824854045038 │ └── avg_rank 6.517093241977517 ├── genres=Drama 12232 │ ├── avg_nb_roles 18.518067364290385 │ └── avg_rank 5.981429367965072 └── genres=Short 12197 ├── avg_nb_roles 3.023284414200213 └── avg_rank 6.311325829450123>>> tree.genres_Drama <IResponse subpart: decade_1990.genres_Drama> genres=Drama 12232 ├── avg_nb_roles 18.518067364290385 └── avg_rank 5.981429367965072
:func:`~pandagg.interactive.response.IResponse.get_bucket_filter` returns the query that filters documents belonging to the given bucket:
>>> tree.decade_1990.genres_Drama.get_bucket_filter() {'bool': { 'must': [ {'term': {'genres': {'value': 'Drama'}}}, {'range': {'year': {'gte': 1990.0, 'lt': 2000.0}}} ], 'filter': [{'range': {'year': {'gte': 1990}}}] } }
:func:`~pandagg.interactive.response.IResponse.list_documents` method actually execute this query to list documents belonging to bucket:
>>> tree.decade_1990.genres_Drama.list_documents(size=2, _source={"include": ['name']}) {'took': 10, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 10000, 'relation': 'gte'}, 'max_score': 2.4539857, 'hits': [{'_index': 'movies', '_type': '_doc', '_id': '706', '_score': 2.4539857, '_source': {'name': '100 meter fri'}}, {'_index': 'movies', '_type': '_doc', '_id': '714', '_score': 2.4539857, '_source': {'name': '100 Proof'}}]}}