Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Updated documentation

commit 7532c97d92eddb2d3526ab72188121d808201655 1 parent c872c40
Alberto Paro authored
7 AUTHORS
View
@@ -1,10 +1,11 @@
-Origin based on a pyelasticsearch of Robert Eanes and Matt Dennewitz
-
Ordered by date of first contribution:
Alberto Paro <alberto.paro@gmail.com>
+ George Sakkis
sandymahalo
andrei
- Tavis Aitken
+ Tavis Aitken
Richard Boulton
matterkkila
Matt Chu <matt.chu@gmail.com>
+
+Origin based on a pyelasticsearch of Robert Eanes and Matt Dennewitz
398 Changelog
View
@@ -1,209 +1,379 @@
-Changelog
-=========
-v. 0.19.0:
-
- Use default_indices instead of hardcoding ['_all'] (gsakkis)
-
- Complete rewrite of connection_http (gsakkis)
-
- Don't collect info on creation of ES object (patricksmith)
-
- Add interval to histogram facet. (vrachil)
+================
+ Change history
+================
- Improved connection string construction and added more flexibility. (ferhatsb)
+.. contents::
- Fixed pickling DotDict.
+.. _version-0.19.1:
- Fixed a bug in Decoder.
+0.19.1
+======
- Added execution to TermsFilter. Fixed missing _name attribute in serialized object
+News
+----
- Added _cache and _cache_key parameters to filters.
+- Create Manager to manage API action grouped as Elasticsearch.
- Added scope, filter and global parameters to facets. closes #119
-
- Use a single global ConnectionPool instead of initializing it on every execute call. (gsakkis)
+- This allows to simplify ES object and to move grouped functionality in manager. We are following the ElasticSearch
+- grouping of actions. For now we are adding:
- Allow partial_fields to be passed in the Search class. (tehmaze)
-
- Propagated parameters to bulker.
+ - Indices Manager: to manage index operation
- Support params for analyze. (akheron)
+ - Cluster Manager: to manage index operation
- Added LimitFilter.
+- Renamed field_name in name in ScriptFields
- Fixed support for query as dict in Search object.
+- Got docs building on readthedocs.org (Wraithan - Chris McDonald)
- Added ListBulker implementation and create_bulker method.
+- Added model and scan to search.
- Moved imports to absolute ones.
+- So one can pass custom object to be created
- Removed inused urllib3 files and added timeout to connection_http.
+- Added document exists call, to check is a document exists.
- Add NotFilter as facet filter (junckritter)
+Deprecated
+----------
- Add terms facet filter
+Using manager, a lot of es methods are refactored in the managers. This is the list of moved methods:
-v. 0.18.7-rc1:
+- .aliases -> .indices.aliases
- Tested against 0.18.7, with all tests passing
+- .status -> .indices.status
- Added support for index_stats
+- .create_index -> .indices.create_index
-v. 0.17.0:
+- .create_index_if_missing -> .indices.create_index_if_missing
- API BREAKING: Added new searcher iterator API. (To use the old code rename ".search" in ".search_raw")
+- .delete_index -> .indices.delete_index
- API BREAKING: renamed indexes in indices. To be complaint to ES documentation.
+- .exists_index -> .indices.exists_index
- Tests refactory.
-
- Add model object to objetify a dict.
-
-v. 0.16.0:
+- .delete_index_if_exists -> .indices.delete_index_if_exists
- Updated documentation.
+- .get_indices -> .indices.get_indices
- Added TextQuery and some clean up of code.
+- .get_closed_indices -> .indices.get_closed_indices
- Added percolator (matterkkila).
+- .get_alias -> .indices.get_alias
- Added date_histogram facet (zebuline).
+- .change_aliases -> .indices.change_aliases
- Added script fields to Search object, also add "fields" to TermFacet (aguereca).
+- .add_alias -> .indices.add_alias
- Added analyze_wildcard param to StringQuery (available for ES 0.16.0) (zebuline).
+- .delete_alias -> .indices.delete_alias
- Add ScriptFields object used as parameter script_fields of Search object (aguereca).
+- .set_alias -> .indices.set_alias
- Add IdsQuery, IdsFilter and delete_by_query (aguereca).
+- .close_index -> .indices.close_index
- Bulk delete (acdha).
+- .open_index -> .indices.open_index
-v. 0.15.0:
+- .flush -> .indices.flush
- Only require simplejson for python < 2.6 (matterkkila)
+- .refresh -> .indices.refresh
- Added basic version support to ES.index and Search (merrellb)
+- .optimize -> .indices.optimize
- Added scan method to ES. This is only supported on ES Master (pre 0.16) (merrellb)
+- .analyze -> .indices.analyze
- Added GeoPointField to mapping types (merrellb)
+- .gateway_snapshot -> .indices.gateway_snapshot
- Disable thrift in setup.py.
+- .put_mapping -> .indices.put_mapping
- Added missing _routing property in ObjectField
+- .get_mapping -> .indices.get_mapping
- Added ExistsFilter
+- .cluster_health -> .cluster.cluster_health
- Improved HasChildren
+- .cluster_state -> .cluster.state
- Add min_similarity and prefix_length to flt.
+- .cluster_nodes -> .cluster.nodes_info
- Added _scope to HasChildQuery. (andreiz)
+- .cluster_stats -> .cluster.node_stats
- Added parent/child document in test indexing. Added _scope to HasChildFilter.
+- .index_stats -> .indices.stats
- Added MissingFilter as a subclass of TermFilter
+- .delete_mapping -> .indices.delete_mapping
- Fixed error in checking TermsQuery (merrellb)
+- .get_settings -> .indices.get_settings
- If an analyzer is set on a field, the returned mapping will have an analyzer
+- .update_settings -> .indices.update_settings
- Add a specific error subtype for mapper parsing exceptions (rboulton)
- Add support for Float numeric field mappings (rboulton)
+Fixes
+-----
- ES.get() now accepts "fields" as well as other keyword arguments (eg "routing") (rboulton)
+- Fixed ResultSet slicing.
- Allow dump_curl to be passed a filehandle (or still a filename), don't for filenames to be in /tmp, and add a basic test of it.
+- Moved tests outside pyes code dir. Update references. Upgraded test elasticsearch to 0.19.9.
- Add alias handling (rboulton)
+- Added documentation links.
- Add ElasticSearchIllegalArgumentException - used for example when writing to an alias which refers to more than one index. (rboulton)
+- Renamed scroll_timeout in scroll.
- Handle errors produced by deleting a missing document, and add a test for it. (rboulton)
+- Renamed field_name in name in ScriptFields.
- Split Query object into a Search object, for the search specific parts, and a Query base class. Allow ES.search() to take a query or a search object. Make some of the methods of Query base classes chainable, where that is an obviously reasonable thing to do. (rboulton)
+- Added routing to delete document call.
-v. 0.14.0: Added delete of mapping type.
+- Removed minimum_number_should_match parameter.It is not supported by ElasticSearch and causes errors when using a BoolFilter. (Jernej Kos)
- Embedded urllib3 to be buildout safe and for users sake.
+- Improved speed json conversion of datetime values
- Some code cleanup.
+- Added boost argument to TextQuery. (Jernej Kos)
- Added reindex by query (usable only with my elasticsearch git branch).
+- Go back to urllib3 instead of requests. (gsakkis)
- Added contrib with mailman indexing.
+- Enhance Twitter River class. (thanks @dendright)
- Autodetect if django is available and added related functions.
+- Add OAuth authentication and filtering abilities to Twitter River. (Jack Riches)
- Code cleanup and PEP8.
+- HasChildFilter expects a Query. (gsakkis)
- Reactivated the morelikethis query.
+- Fixed _parent being pulled from _meta rather than the instance itself. (merrellb)
- Fixed river support plus unittest. (Tavis Aitken)
+- Add support of all_terms to TermFacet. (mouad)
- Added autorefresh to sync search and write.
- Added QueryFilter.
+0.19.0
+======
- Forced name attribute in multifield declaration.
- Added is_empty to ConstantScoreQuery and fixed some bad behaviour.
+- Use default_indices instead of hardcoding ['_all'] (gsakkis)
- Added CustomScoreQuery.
+- Complete rewrite of connection_http (gsakkis)
- Added parent/children indexing.
+- Don't collect info on creation of ES object (patricksmith)
- Added dump commands in a script file "curl" way.
+- Add interval to histogram facet. (vrachil)
- Added a lot of fix from Richard Boulton.
+- Improved connection string construction and added more flexibility. (ferhatsb)
-v. 0.13.1: Added jython support (HTTP only for now).
+- Fixed pickling DotDict.
-v. 0.13.0: API Changes: errors -> exceptions.
+- Fixed a bug in Decoder.
- Splitting of query/filters.
+- Added execution to TermsFilter. Fixed missing _name attribute in serialized object
- Added open/close of index.
+- Added _cache and _cache_key parameters to filters.
- Added the number of retries if server is down.
+- Added scope, filter and global parameters to facets. closes #119
- Refactory Range query. (Andrei)
+- Use a single global ConnectionPool instead of initializing it on every execute call. (gsakkis)
- Improved HTTP connection timeout/retries. (Sandymahalo)
+- Allow partial_fields to be passed in the Search class. (tehmaze)
- Cleanup some imports. (Sandymahalo)
+- Propagated parameters to bulker.
-v. 0.12.1: Added collecting server info.
+- Support params for analyze. (akheron)
- Version 0.12 or above requirement.
+- Added LimitFilter.
- Fixed attachment plugin.
+- Fixed support for query as dict in Search object.
- Updated bulk insert to use new api.
+- Added ListBulker implementation and create_bulker method.
- Added facet support (except geotypes).
+- Moved imports to absolute ones.
- Added river support.
+- Removed inused urllib3 files and added timeout to connection_http.
- Cleanup some method.
+- Add NotFilter as facet filter (junckritter)
- Added default_indexes variable.
+- Add terms facet filter
- Added datetime deserialization.
+0.18.7-rc1
+==========
- Improved performance and memory usage in bulk insert replacing list with StringIO.
- Initial propagation of elasticsearch exception to python.
+- Tested against 0.18.7, with all tests passing
-v. 0.12.0: added http transport, added autodetect of transport, updated thrift interface.
+- Added support for index_stats
-v. 0.10.3: added bulk insert, explain and facet.
+0.17.0
+======
-v. 0.10.2: added new geo query type.
+- API BREAKING: Added new searcher iterator API. (To use the old code rename ".search" in ".search_raw")
-v. 0.10.1: added new connection pool system based on pycassa one.
+- API BREAKING: renamed indexes in indices. To be complaint to ES documentation.
-v. 0.10.0: initial working version.
+- Tests refactory.
+
+- Add model object to objetify a dict.
+
+0.16.0
+======
+
+- Updated documentation.
+
+- Added TextQuery and some clean up of code.
+
+- Added percolator (matterkkila).
+
+- Added date_histogram facet (zebuline).
+
+- Added script fields to Search object, also add "fields" to TermFacet (aguereca).
+
+- Added analyze_wildcard param to StringQuery (available for ES 0.16.0) (zebuline).
+
+- Add ScriptFields object used as parameter script_fields of Search object (aguereca).
+
+- Add IdsQuery, IdsFilter and delete_by_query (aguereca).
+
+- Bulk delete (acdha).
+
+
+0.15.0
+======
+
+
+- Only require simplejson for python < 2.6 (matterkkila)
+
+- Added basic version support to ES.index and Search (merrellb)
+
+- Added scan method to ES. This is only supported on ES Master (pre 0.16) (merrellb)
+
+- Added GeoPointField to mapping types (merrellb)
+
+- Disable thrift in setup.py.
+
+- Added missing _routing property in ObjectField
+
+- Added ExistsFilter
+
+- Improved HasChildren
+
+- Add min_similarity and prefix_length to flt.
+
+- Added _scope to HasChildQuery. (andreiz)
+
+- Added parent/child document in test indexing. Added _scope to HasChildFilter.
+
+- Added MissingFilter as a subclass of TermFilter
+
+- Fixed error in checking TermsQuery (merrellb)
+
+- If an analyzer is set on a field, the returned mapping will have an analyzer
+
+- Add a specific error subtype for mapper parsing exceptions (rboulton)
+
+- Add support for Float numeric field mappings (rboulton)
+
+- ES.get() now accepts "fields" as well as other keyword arguments (eg "routing") (rboulton)
+
+- Allow dump_curl to be passed a filehandle (or still a filename), don't for filenames to be in /tmp, and add a basic test of it.
+
+- Add alias handling (rboulton)
+
+- Add ElasticSearchIllegalArgumentException - used for example when writing to an alias which refers to more than one index. (rboulton)
+
+- Handle errors produced by deleting a missing document, and add a test for it. (rboulton)
+
+- Split Query object into a Search object, for the search specific parts, and a Query base class. Allow ES.search() to take a query or a search object. Make some of the methods of Query base classes chainable, where that is an obviously reasonable thing to do. (rboulton)
+
+0.14.0
+======
+
+
+- Added delete of mapping type.
+
+- Embedded urllib3 to be buildout safe and for users sake.
+
+- Some code cleanup.
+
+- Added reindex by query (usable only with my elasticsearch git branch).
+
+- Added contrib with mailman indexing.
+
+- Autodetect if django is available and added related functions.
+
+- Code cleanup and PEP8.
+
+- Reactivated the morelikethis query.
+
+- Fixed river support plus unittest. (Tavis Aitken)
+
+- Added autorefresh to sync search and write.
+
+- Added QueryFilter.
+
+- Forced name attribute in multifield declaration.
+
+- Added is_empty to ConstantScoreQuery and fixed some bad behaviour.
+
+- Added CustomScoreQuery.
+
+- Added parent/children indexing.
+
+- Added dump commands in a script file "curl" way.
+
+- Added a lot of fix from Richard Boulton.
+
+0.13.1
+======
+
+- Added jython support (HTTP only for now).
+
+0.13.0
+======
+
+- API Changes: errors -> exceptions.
+
+- Splitting of query/filters.
+
+- Added open/close of index.
+
+- Added the number of retries if server is down.
+
+- Refactory Range query. (Andrei)
+
+- Improved HTTP connection timeout/retries. (Sandymahalo)
+
+- Cleanup some imports. (Sandymahalo)
+
+0.12.1
+======
+
+- Added collecting server info.
+
+- Version 0.12 or above requirement.
+
+- Fixed attachment plugin.
+
+- Updated bulk insert to use new api.
+
+- Added facet support (except geotypes).
+
+- Added river support.
+
+- Cleanup some method.
+
+- Added default_indexes variable.
+
+- Added datetime deserialization.
+
+- Improved performance and memory usage in bulk insert replacing list with StringIO.
+
+- Initial propagation of elasticsearch exception to python.
+
+0.12.0
+======
+
+- Added http transport, added autodetect of transport, updated thrift interface.
+
+0.10.3
+======
+
+- Added bulk insert, explain and facet.
+
+0.10.2
+======
+
+- Added new geo query type.
+
+0.10.1
+======
+
+- Added new connection pool system based on pycassa one.
+
+0.10.0
+======
+
+- Initial working version.
29 FAQ
View
@@ -1,3 +1,5 @@
+.. _faq:
+
============================
Frequently Asked Questions
============================
@@ -5,4 +7,29 @@
.. contents::
:local:
-TO be written
+.. _faq-general:
+
+General
+=======
+
+.. _faq-when-to-use:
+
+What connection type should I use?
+----------------------------------
+
+For general usage I suggest to use HTTP connection versus your server.
+
+For more fast performance, mainly in indexing, I suggest to use thrift because its latency is lower.
+
+How you can return a plain dict from a resultset?
+=================================================
+
+ResultSet iterates on ElasticSearchModel by default, to change this behaviour you need to pass a an object that
+receive a connection and a dict object.
+
+To return plain dict object, you must pass to the search call a model parameter:
+
+.. code-block:: python
+
+ model=lambda x,y:y
+
52 README.rst
View
@@ -36,21 +36,61 @@ http://pyes.readthedocs.org/en/latest/
Changelog
=========
-v. 0.18.7-rc1:
+v. 0.19.1:
- Tested against 0.18.7, with all tests passing
+ Renamed field_name in name in ScriptFields
- Added support for index_stats
+ Fixed ResultSet slicing.
-v. 0.17.0:
+ Create Manager to manage API action grouped as Elasticsearch.
+
+ Moved tests outside pyes code dir. Update references. Upgraded test elasticsearch to 0.19.9.
+
+ Added documentation links
+
+ Got docs building on readthedocs.org (Wraithan - Chris McDonald)
+
+ Renamed scroll_timeout in scroll
+
+ Moved FacetFactory include
+
+ Renamed field_name in name in ScriptFields
+
+ Using only thrift_connect to manage thrift existence
+
+ Added model and scan to query
+
+ Added exists document call
+
+ Added routing to delete
+
+ Removed minimum_number_should_match parameter.It is not supported by elastic search and causes errors when using a BoolFilter. (Jernej Kos)
+
+ Improved speed json conversion of datetime values
+
+ Add boost argument to TextQuery
+
+ Added boost argument to TextQuery. (Jernej Kos)
+
+ Go back to urllib3 instead of requests. (gsakkis)
+
+ Enhance Twitter River class. (thanks @dendright)
+
+ Add OAuth authentication and filtering abilities to Twitter River. (Jack Riches)
+
+ HasChildFilter expects a Query. (gsakkis)
+
+ Fixed _parent being pulled from _meta rather than the instance itself. (merrellb)
+
+ Add support of all_terms to TermFacet. (mouad)
- API BREAKING: Added new searcher iterator API. (To use the old code rename ".search" in ".search_raw")
- Tests refactory.
TODO
----
+- add ORM to manage objects
+- much more documentation
- add coverage
- add jython native client protocol
350 docs/guide/appendix/glossary.rst
View
@@ -4,226 +4,210 @@
Glossary
========
-glossary:
--
- id: analysis
- text: >
- Analysis is the process of converting full text_ to terms_.
- Depending on which analyzer is used, these phrases: "**FOO BAR**",
- "**Foo-Bar**", "**foo,bar**" will probably all result in the terms "**foo**"
- and "**bar**". These terms are what is actually stored in the index.
-
-
- A full text query (not a term_ query) for "**FoO:bAR**" will
- also be analyzed to the terms "**foo**","**bar**" and will thus match
- the terms stored in the index.
-
-
- It is this process of analysis (both at index time and at search time)
- that allows elasticsearch to perform full text queries.
-
-
- Also see text_ and term_.
--
- id: cluster
- text: >
- A cluster consists of one or more nodes_ which share the same
- cluster name. Each cluster has a single master node which is
- chosen automatically by the cluster and which can be replaced if
- the current master node fails.
-
--
- id: document
- text: >
- A document is a JSON document which is stored in elasticsearch. It is
- like a row in a table in a relational database. Each document is
- stored in an index_ and has a type_
- and an id_.
+.. _glossary-analysis:
+analysis
+ Analysis is the process of converting full :ref:`text <glossary-text>` to :ref:`terms <glossary-term>`.
+ Depending on which analyzer is used, these phrases: "**FOO BAR**",
+ "**Foo-Bar**", "**foo,bar**" will probably all result in the terms "**foo**"
+ and "**bar**". These terms are what is actually stored in the index.
- A document is a JSON object (also known in other languages
- as a hash / hashmap / associative array) which contains zero or more
- fields_, or key-value pairs.
+ A full text query (not a :ref:`term <glossary-term>` query) for "**FoO:bAR**" will
+ also be analyzed to the terms "**foo**","**bar**" and will thus match
+ the terms stored in the index.
+ It is this process of analysis (both at index time and at search time)
+ that allows elasticsearch to perform full text queries.
- The original JSON document that is indexed will be stored in the
- **_source** field_, which is returned by default
- when getting or searching for a document.
+ Also see :ref:`text <glossary-text>` and :ref:`term <glossary-term>`.
--
- id: id
- text: >
- The ID of a document_ identifies a document. The
- **index/type/id** of a document must be unique. If no ID is provided,
- then it will be auto-generated. (also see routing_)
+.. _glossary-cluster:
--
- id: field
- text: >
- A document_ contains a list of fields, or key-value pairs.
- The value can be a simple (scalar) value (eg a string, integer, date),
- or a nested structure like an array or an object. A field is similar
- to a column in a table in a relational database.
+cluster
+ A cluster consists of one or more :ref:`nodes <glossary-node>` which share the same
+ cluster name. Each cluster has a single master node which is
+ chosen automatically by the cluster and which can be replaced if
+ the current master node fails.
+.. _glossary-document:
- The mapping_ for each field has a field 'type'
- (not to be confused with document type_) which indicates the
- type of data that can be stored in that field, eg
- **integer**, **string**, **object**.
- The mapping also allows you to define (amongst other things) how the
- value for a field should be analyzed.
+document
+ A document is a JSON document which is stored in elasticsearch. It is
+ like a row in a table in a relational database. Each document is
+ stored in an :ref:`index <glossary-index>` and has a :ref:`type <glossary-type>`
+ and an :ref:`id <glossary-id>`.
--
- id: index
- text: >
- An index is like a 'database' in a relational database. It has a
- mapping_ which defines multiple
- types_.
+ A document is a JSON object (also known in other languages
+ as a hash / hashmap / associative array) which contains zero or more
+ :ref:`fields <glossary-field>`, or key-value pairs.
+ The original JSON document that is indexed will be stored in the
+ **_source** :ref:`field <glossary-field>`, which is returned by default
+ when getting or searching for a document.
- An index is a logical namespace which maps to one or more
- primary shards_ and can have zero or more
- replica shards_.
+.. _glossary-id:
--
- id: mapping
- text: >
- A mapping is like a 'schema definition' in a relational database.
- Each index_ has a mapping, which defines each
- type_ within the index, plus a number of
- index-wide settings.
+id
+ The ID of a :ref:`document <glossary-document>` identifies a document. The
+ **index/type/id** of a document must be unique. If no ID is provided,
+ then it will be auto-generated. (also see :ref:`routing <glossary-routing>`)
+.. _glossary-field:
- A mapping can either be defined explicitly, or it will be generated
- automatically when a document is indexed.
--
- id: node
- text: >
- A node is a running instance of elasticsearch which belongs to a
- cluster_. Multiple nodes can be started on a single
- server for testing purposes, but usually you should have one node
- per server.
-
-
- At startup, a node will use unicast (or multicast, if specified)
- to discover an existing cluster with the same cluster name and will
- try to join that cluster.
+field
+ A :ref:`document <glossary-document>` contains a list of fields, or key-value pairs.
+ The value can be a simple (scalar) value (eg a string, integer, date),
+ or a nested structure like an array or an object. A field is similar
+ to a column in a table in a relational database.
--
- id: primary shard
- text: >
- Each document is stored in a single primary shard_. When you
- index a document, it is indexed first on the primary shard, then
- on all replicas_ of the primary shard.
+ The :ref:`mapping <glossary-mapping>` for each field has a field 'type'
+ (not to be confused with document :ref:`type <glossary-type>`) which indicates the
+ type of data that can be stored in that field, eg
+ **integer**, **string**, **object**.
+ The mapping also allows you to define (amongst other things) how the
+ value for a field should be analyzed.
+.. _glossary-index:
- By default, an index_ has 5 primary shards. You can specify fewer
- or more primary shards to scale the number of documents_
- that your index can handle.
-
-
- You cannot change the number of primary shards in an index, once the
- index is created.
-
-
- See also routing_
-
--
- id: replica shard
- text: >
- Each primary shard_ can have zero or more replicas.
- A replica is a copy of the primary shard, and has two purposes:
+index
+ An index is like a 'database' in a relational database. It has a
+ :ref:`mapping <glossary-mapping>` which defines multiple
+ :ref:`types <glossary-type>`.
- # increase failover: a replica shard can be promoted
- to a primary shard if the primary fails
+ An index is a logical namespace which maps to one or more
+ primary :ref:`shards <glossary-shard>` and can have zero or more
+ replica :ref:`shards <glossary-shard>`.
- # increase performance: get and search requests can be handled by
- primary or replica shards.
+.. _glossary-mapping:
+mapping
+ A mapping is like a 'schema definition' in a relational database.
+ Each :ref:`index <glossary-index>` has a mapping, which defines each
+ :ref:`type <glossary-type>` within the index, plus a number of
+ index-wide settings.
- By default, each primary shard has one replica, but the number
- of replicas can be changed dynamically on an existing index.
- A replica shard will never be started on the same node as its primary
- shard.
-
--
- id: routing
- text: >
- When you index a document, it is stored on a single
- primary shard_. That shard is chosen by hashing
- the **routing** value. By default, the **routing** value is derived
- from the ID of the document or, if the document has a specified
- parent document, from the ID of the parent document (to ensure
- that child and parent documents are stored on the same shard).
+ A mapping can either be defined explicitly, or it will be generated
+ automatically when a document is indexed.
+.. _glossary-node:
- This value can be overridden by specifying a **routing** value at index
- time, or a :ref:`routing field <es-guide-reference-mapping-routing-field>` in the mapping_.
+node
+ A node is a running instance of elasticsearch which belongs to a
+ :ref:`cluster <glossary-cluster>`. Multiple nodes can be started on a single
+ server for testing purposes, but usually you should have one node
+ per server.
--
- id: shard
- text: >
- A shard is a single Lucene instance. It is a low-level "worker" unit
- which is managed automatically by elasticsearch. An index
- is a logical namespace which points to primary_
- and replica_ shards.
+ At startup, a node will use unicast (or multicast, if specified)
+ to discover an existing cluster with the same cluster name and will
+ try to join that cluster.
+.. _glossary-primary-shard:
- Other than defining the number of primary and replica shards that
- an index should have, you never need to refer to shards directly.
- Instead, your code should deal only with an index.
+primary shard
+ Each document is stored in a single primary :ref:`shard <glossary-shard>`. When you
+ index a document, it is indexed first on the primary shard, then
+ on all :ref:`replicas <glossary-replica-shard>` of the primary shard.
+ By default, an :ref:`index <glossary-index>` has 5 primary shards. You can specify fewer
+ or more primary shards to scale the number of :ref:`documents <glossary-document>`
+ that your index can handle.
- Elasticsearch distributes shards amongst all nodes_ in
- the cluster_, and can be move shards automatically from
- one node to another in the case of node failure, or the addition
- of new nodes.
+ You cannot change the number of primary shards in an index, once the
+ index is created.
--
- id: source field
- text: >
- By default, the JSON document that you index will be stored in the
- **_source** field and will be returned by all get and search requests.
- This allows you access to the original object directly from search
- results, rather than requiring a second step to retrieve the object
- from an ID.
+ See also :ref:`routing <glossary-routing>`
- Note: the exact JSON string that you indexed will be returned to you,
- even if it contains invalid JSON. The contents of this field do not
- indicate anything about how the data in the object has been indexed.
--
- id: term
- text: >
- A term is an exact value that is indexed in elasticsearch. The terms
- **foo**, **Foo**, **FOO are NOT equivalent. Terms (ie exact values) can
- be searched for using 'term' queries.
+.. _glossary-replica-shard:
- See also text_ and analysis_.
--
- id: text
- text: >
- Text (or full text) is ordinary unstructured text, such as this
- paragraph. By default, text will by :ref:`analyzed <es-guide-appendix-analysis>` into
- terms_, which is what is actually stored in the index.
+replica shard
+ Each primary :ref:`shard <glossary-shard>` can have zero or more replicas.
+ A replica is a copy of the primary shard, and has two purposes:
+ # increase failover: a replica shard can be promoted
+ to a primary shard if the primary fails
- Text fields_ need to be analyzed at index time in order to
- be searchable as full text, and keywords in full text queries must
- be analyzed at search time to produce (and search for) the same
- terms that were generated at index time.
+ # increase performance: get and search requests can be handled by
+ primary or replica shards.
+ By default, each primary shard has one replica, but the number
+ of replicas can be changed dynamically on an existing index.
+ A replica shard will never be started on the same node as its primary
+ shard.
- See also term_ and analysis_.
--
- id: type
- text: >
- A type is like a 'table' in a relational database. Each type has
- a list of fields_ that can be specified for
- documents_ of that type. The
- mapping_ defines how each field in the document
- is analyzed.
+.. _glossary-routing:
+routing
+ When you index a document, it is stored on a single
+ primary :ref:`shard <glossary-shard>`. That shard is chosen by hashing
+ the **routing** value. By default, the **routing** value is derived
+ from the ID of the document or, if the document has a specified
+ parent document, from the ID of the parent document (to ensure
+ that child and parent documents are stored on the same shard).
+ This value can be overridden by specifying a **routing** value at index
+ time, or a :ref:`routing field <es-guide-reference-mapping-routing-field>` in the :ref:`mapping <glossary-mapping>`.
+.. _glossary-shard:
+
+shard
+ A shard is a single Lucene instance. It is a low-level "worker" unit
+ which is managed automatically by elasticsearch. An index
+ is a logical namespace which points to :ref:`primary <glossary-primary-shard>`
+ and :ref:`replica <glossary-replica-shard>` shards.
+
+ Other than defining the number of primary and replica shards that
+ an index should have, you never need to refer to shards directly.
+ Instead, your code should deal only with an index.
+
+ Elasticsearch distributes shards amongst all :ref:`nodes <glossary-node>` in
+ the :ref:`cluster <glossary-cluster>`, and can be move shards automatically from
+ one node to another in the case of node failure, or the addition
+ of new nodes.
+
+.. _glossary-source-field:
+
+source field
+ By default, the JSON document that you index will be stored in the
+ **_source** field and will be returned by all get and search requests.
+ This allows you access to the original object directly from search
+ results, rather than requiring a second step to retrieve the object
+ from an ID.
+
+
+ Note: the exact JSON string that you indexed will be returned to you,
+ even if it contains invalid JSON. The contents of this field do not
+ indicate anything about how the data in the object has been indexed.
+
+.. _glossary-term:
+
+term
+ A term is an exact value that is indexed in elasticsearch. The terms
+ **foo**, **Foo**, **FOO** are NOT equivalent. Terms (ie exact values) can
+ be searched for using 'term' queries.
+
+ See also :ref:`text <glossary-text>` and :ref:`analysis <glossary-analysis>`.
+
+.. _glossary-text:
+
+text
+ Text (or full text) is ordinary unstructured text, such as this
+ paragraph. By default, text will by :ref:`analyzed <glossary-analysis>` into
+ :ref:`terms <glossary-term>`, which is what is actually stored in the index.
+
+ Text :ref:`fields <glossary-field>` need to be analyzed at index time in order to
+ be searchable as full text, and keywords in full text queries must
+ be analyzed at search time to produce (and search for) the same
+ terms that were generated at index time.
+
+ See also :ref:`term <glossary-term>` and :ref:`analysis <glossary-analysis>`.
+
+.. _glossary-type:
+
+type
+ A type is like a 'table' in a relational database. Each type has
+ a list of :ref:`fields <glossary-field>` that can be specified for
+ :ref:`documents <glossary-document>` of that type. The
+ :ref:`mapping <glossary-mapping>` defines how each field in the document
+ is analyzed.
1  docs/index.rst
View
@@ -14,6 +14,7 @@ Contents:
links
guide/reference/index
guide/appendix/index
+ guide/appendix/glossary
Indices and tables
6 docs/manual/connections.rst
View
@@ -1,3 +1,5 @@
+.. _pyes-connections:
+
Connections
===========
@@ -16,12 +18,16 @@ For thrift:
>>> conn = pyes.ES() # Defaults to connecting to the server at '127.0.0.1:9500'
>>> conn = pyes.ES(['127.0.0.1:9500'])
+ >>> conn = pyes.ES(("thrift", "127.0.0.1", "9500"))
+ >>> conn = pyes.ES([("thrift", "127.0.0.1", "9500"), ("thrift", "192.168.1.1", "9500"),])
For http:
.. code-block:: python
>>> conn = pyes.ES(['127.0.0.1:9200'])
+ >>> conn = pyes.ES(("http", "127.0.0.1","9200"))
+ >>> conn = pyes.ES([("thrift", "127.0.0.1", "9200"), ("thrift", "192.168.1.1", "8000"),])
Connections are robust to server failures. Upon a disconnection, it will attempt to connect to each server in the list in turn. If no server is available, it will raise a NoServerAvailable exception.
4 docs/manual/index.rst
View
@@ -11,4 +11,6 @@
installation
usage
connections
- queries
+ models
+ queries
+ resultset
65 docs/manual/models.rst
View
@@ -0,0 +1,65 @@
+.. _pyes-models:
+
+Models
+======
+
+DotDict
+-------
+
+The DotDict is the base model used. It allows to use a dict with the DotNotation.
+
+.. code-block:: python
+
+ >>> dotdict = DotDict(foo="bar")
+ >>> dotdict2 = deepcopy(dotdict)
+ >>> dotdict2["foo"] = "baz"
+ >>> dotdict.foo = "bar"
+ >>> dotdict2.foo== "baz"
+ True
+
+ElasticSearchModel
+------------------
+
+It extends DotDict adding methods for common uses.
+
+Every search return an ElasticSearchModel as result. Iterating on results, you iterate on ElasticSearchModel objects.
+
+You can create a new one with the factory or get one by search/get methods.
+
+.. code-block:: python
+
+ obj = self.conn.factory_object(self.index_name, self.document_type, {"name": "test", "val": 1})
+ assert obj.name=="test"
+
+You can change value via dot notation or dictionary.
+
+.. code-block:: python
+
+ obj.name = "aaa"
+ assert obj.name == "aaa"
+ assert obj.val == 1
+
+You can change ES info via ._meta property or get_meta call.
+
+.. code-block:: python
+
+ assert obj._meta.id is None
+ obj._meta.id = "dasdas"
+ assert obj._meta.id == "dasdas"
+
+Remember that it works as a dict object.
+
+.. code-block:: python
+
+ assert sorted(obj.keys()) == ["name", "val"]
+
+You can save it.
+
+.. code-block:: python
+
+ obj.save()
+ obj.name = "test2"
+ obj.save()
+
+ reloaded = self.conn.get(self.index_name, self.document_type, obj._meta.id)
+ assert reloaded.name, "test2")
2  docs/manual/queries.rst
View
@@ -1,3 +1,5 @@
+.. _pyes-queries:
+
Queries
=======
40 docs/manual/resultset.rst
View
@@ -0,0 +1,40 @@
+.. _pyes-resultset:
+
+ResultSet
+=========
+
+This object is returned as result of a query. It's lazy.
+
+.. code-block:: python
+
+ >>> resultset = self.conn.search(Search(MatchAllQuery(), size=20), self.index_name, self.document_type)
+
+It contains the matched and limited records. Very useful to use in pagination.
+
+.. code-block:: python
+
+ >>> len([p for p in resultset])
+ 20
+
+The total matched results is in the total property.
+
+.. code-block:: python
+
+ >>> resultset.total
+ 1000
+
+You can slice it.
+
+.. code-block:: python
+
+ >>> resultset = self.conn.search(Search(MatchAllQuery(), size=10), self.index_name, self.document_type)
+ >>> len([p for p in resultset[:10]])
+ 10
+
+Remember all result are default ElasticSearchModel objects
+
+.. code-block:: python
+
+ >>> resultset[10].uuid
+ "11111"
+
38 docs/manual/usage.rst
View
@@ -1,12 +1,12 @@
Usage
=====
-Creating a connection:
+Creating a connection. (See more details here :ref:`pyes-connections`)
.. code-block:: python
>>> from pyes import *
- >>> conn = ES('127.0.0.1:9200')
+ >>> conn = ES('127.0.0.1:9200') #for http
Deleting an index:
@@ -17,7 +17,7 @@ Deleting an index:
>>> except:
>>> pass
-(an exception is fored if the index is not present)
+(an exception is raised if the index is not present)
Create an index:
@@ -25,7 +25,7 @@ Create an index:
>>> conn.create_index("test-index")
-Creating a mapping:
+Creating a mapping via dictionary:
.. code-block:: python
@@ -52,6 +52,29 @@ Creating a mapping:
>>> 'type': u'string'}}
>>> conn.put_mapping("test-type", {'properties':mapping}, ["test-index"])
+Creating a mapping via objects:
+
+.. code-block:: python
+
+ >>> from pyes.mappings import *
+ >>> docmapping = DocumentObjectField(name=self.document_type)
+ >>> docmapping.add_property(
+ >>> StringField(name="parsedtext", store=True, term_vector="with_positions_offsets", index="analyzed"))
+ >>> docmapping.add_property(
+ >>> StringField(name="name", store=True, term_vector="with_positions_offsets", index="analyzed"))
+ >>> docmapping.add_property(
+ >>> StringField(name="title", store=True, term_vector="with_positions_offsets", index="analyzed"))
+ >>> docmapping.add_property(IntegerField(name="position", store=True))
+ >>> docmapping.add_property(StringField(name="uuid", store=True, index="not_analyzed"))
+ >>> nested_object = NestedObject(name="nested")
+ >>> nested_object.add_property(StringField(name="name", store=True))
+ >>> nested_object.add_property(StringField(name="value", store=True))
+ >>> nested_object.add_property(IntegerField(name="num", store=True))
+ >>> docmapping.add_property(nested_object)
+ >>> settings.add_mapping(docmapping)
+ >>> conn.ensure_index(self.index_name, settings)
+
+
Index some documents:
.. code-block:: python
@@ -63,15 +86,18 @@ Refresh an index:
.. code-block:: python
+ >>> conn.refresh("test-index")
>>> conn.refresh(["test-index"])
-Execute a query
+Execute a query. (See :ref:`pyes-queries`)
.. code-block:: python
>>> q = TermQuery("name", "joe")
>>> results = conn.search(query = q)
+results is a (See :ref:`pyes-resultset`), you can iterate it. It caches some results and pages them. The default returned objects are ElasticSearchModel (See :ref:`pyes-models`).
+
Iterate on results:
.. code-block:: python
@@ -79,4 +105,4 @@ Iterate on results:
>>> for r in results:
>>> print r
-For more examples looks at the tests.
+The tests directory there are a lot of examples of functionalities.
4 pyes/__init__.py
View
@@ -14,9 +14,9 @@
def is_stable_release():
- if len(VERSION) > 3 and isinstance(VERSION[3], basestring):
+ if len(VERSION) > 3:
return False
- return not VERSION[1] % 2
+ return True
def version_with_meta():
Please sign in to comment.
Something went wrong with that request. Please try again.