Skip to content
This repository
Browse code

Updated documentation

  • Loading branch information...
commit 7532c97d92eddb2d3526ab72188121d808201655 1 parent c872c40
Alberto Paro authored
7 AUTHORS
... ... @@ -1,10 +1,11 @@
1   -Origin based on a pyelasticsearch of Robert Eanes and Matt Dennewitz
2   -
3 1 Ordered by date of first contribution:
4 2 Alberto Paro <alberto.paro@gmail.com>
  3 + George Sakkis
5 4 sandymahalo
6 5 andrei
7   - Tavis Aitken
  6 + Tavis Aitken
8 7 Richard Boulton
9 8 matterkkila
10 9 Matt Chu <matt.chu@gmail.com>
  10 +
  11 +Origin based on a pyelasticsearch of Robert Eanes and Matt Dennewitz
398 Changelog
... ... @@ -1,209 +1,379 @@
1   -Changelog
2   -=========
3   -v. 0.19.0:
4   -
5   - Use default_indices instead of hardcoding ['_all'] (gsakkis)
6   -
7   - Complete rewrite of connection_http (gsakkis)
8   -
9   - Don't collect info on creation of ES object (patricksmith)
10   -
11   - Add interval to histogram facet. (vrachil)
  1 +================
  2 + Change history
  3 +================
12 4
13   - Improved connection string construction and added more flexibility. (ferhatsb)
  5 +.. contents::
14 6
15   - Fixed pickling DotDict.
  7 +.. _version-0.19.1:
16 8
17   - Fixed a bug in Decoder.
  9 +0.19.1
  10 +======
18 11
19   - Added execution to TermsFilter. Fixed missing _name attribute in serialized object
  12 +News
  13 +----
20 14
21   - Added _cache and _cache_key parameters to filters.
  15 +- Create Manager to manage API action grouped as Elasticsearch.
22 16
23   - Added scope, filter and global parameters to facets. closes #119
24   -
25   - Use a single global ConnectionPool instead of initializing it on every execute call. (gsakkis)
  17 +- This allows to simplify ES object and to move grouped functionality in manager. We are following the ElasticSearch
  18 +- grouping of actions. For now we are adding:
26 19
27   - Allow partial_fields to be passed in the Search class. (tehmaze)
28   -
29   - Propagated parameters to bulker.
  20 + - Indices Manager: to manage index operation
30 21
31   - Support params for analyze. (akheron)
  22 + - Cluster Manager: to manage index operation
32 23
33   - Added LimitFilter.
  24 +- Renamed field_name in name in ScriptFields
34 25
35   - Fixed support for query as dict in Search object.
  26 +- Got docs building on readthedocs.org (Wraithan - Chris McDonald)
36 27
37   - Added ListBulker implementation and create_bulker method.
  28 +- Added model and scan to search.
38 29
39   - Moved imports to absolute ones.
  30 +- So one can pass custom object to be created
40 31
41   - Removed inused urllib3 files and added timeout to connection_http.
  32 +- Added document exists call, to check is a document exists.
42 33
43   - Add NotFilter as facet filter (junckritter)
  34 +Deprecated
  35 +----------
44 36
45   - Add terms facet filter
  37 +Using manager, a lot of es methods are refactored in the managers. This is the list of moved methods:
46 38
47   -v. 0.18.7-rc1:
  39 +- .aliases -> .indices.aliases
48 40
49   - Tested against 0.18.7, with all tests passing
  41 +- .status -> .indices.status
50 42
51   - Added support for index_stats
  43 +- .create_index -> .indices.create_index
52 44
53   -v. 0.17.0:
  45 +- .create_index_if_missing -> .indices.create_index_if_missing
54 46
55   - API BREAKING: Added new searcher iterator API. (To use the old code rename ".search" in ".search_raw")
  47 +- .delete_index -> .indices.delete_index
56 48
57   - API BREAKING: renamed indexes in indices. To be complaint to ES documentation.
  49 +- .exists_index -> .indices.exists_index
58 50
59   - Tests refactory.
60   -
61   - Add model object to objetify a dict.
62   -
63   -v. 0.16.0:
  51 +- .delete_index_if_exists -> .indices.delete_index_if_exists
64 52
65   - Updated documentation.
  53 +- .get_indices -> .indices.get_indices
66 54
67   - Added TextQuery and some clean up of code.
  55 +- .get_closed_indices -> .indices.get_closed_indices
68 56
69   - Added percolator (matterkkila).
  57 +- .get_alias -> .indices.get_alias
70 58
71   - Added date_histogram facet (zebuline).
  59 +- .change_aliases -> .indices.change_aliases
72 60
73   - Added script fields to Search object, also add "fields" to TermFacet (aguereca).
  61 +- .add_alias -> .indices.add_alias
74 62
75   - Added analyze_wildcard param to StringQuery (available for ES 0.16.0) (zebuline).
  63 +- .delete_alias -> .indices.delete_alias
76 64
77   - Add ScriptFields object used as parameter script_fields of Search object (aguereca).
  65 +- .set_alias -> .indices.set_alias
78 66
79   - Add IdsQuery, IdsFilter and delete_by_query (aguereca).
  67 +- .close_index -> .indices.close_index
80 68
81   - Bulk delete (acdha).
  69 +- .open_index -> .indices.open_index
82 70
83   -v. 0.15.0:
  71 +- .flush -> .indices.flush
84 72
85   - Only require simplejson for python < 2.6 (matterkkila)
  73 +- .refresh -> .indices.refresh
86 74
87   - Added basic version support to ES.index and Search (merrellb)
  75 +- .optimize -> .indices.optimize
88 76
89   - Added scan method to ES. This is only supported on ES Master (pre 0.16) (merrellb)
  77 +- .analyze -> .indices.analyze
90 78
91   - Added GeoPointField to mapping types (merrellb)
  79 +- .gateway_snapshot -> .indices.gateway_snapshot
92 80
93   - Disable thrift in setup.py.
  81 +- .put_mapping -> .indices.put_mapping
94 82
95   - Added missing _routing property in ObjectField
  83 +- .get_mapping -> .indices.get_mapping
96 84
97   - Added ExistsFilter
  85 +- .cluster_health -> .cluster.cluster_health
98 86
99   - Improved HasChildren
  87 +- .cluster_state -> .cluster.state
100 88
101   - Add min_similarity and prefix_length to flt.
  89 +- .cluster_nodes -> .cluster.nodes_info
102 90
103   - Added _scope to HasChildQuery. (andreiz)
  91 +- .cluster_stats -> .cluster.node_stats
104 92
105   - Added parent/child document in test indexing. Added _scope to HasChildFilter.
  93 +- .index_stats -> .indices.stats
106 94
107   - Added MissingFilter as a subclass of TermFilter
  95 +- .delete_mapping -> .indices.delete_mapping
108 96
109   - Fixed error in checking TermsQuery (merrellb)
  97 +- .get_settings -> .indices.get_settings
110 98
111   - If an analyzer is set on a field, the returned mapping will have an analyzer
  99 +- .update_settings -> .indices.update_settings
112 100
113   - Add a specific error subtype for mapper parsing exceptions (rboulton)
114 101
115   - Add support for Float numeric field mappings (rboulton)
  102 +Fixes
  103 +-----
116 104
117   - ES.get() now accepts "fields" as well as other keyword arguments (eg "routing") (rboulton)
  105 +- Fixed ResultSet slicing.
118 106
119   - Allow dump_curl to be passed a filehandle (or still a filename), don't for filenames to be in /tmp, and add a basic test of it.
  107 +- Moved tests outside pyes code dir. Update references. Upgraded test elasticsearch to 0.19.9.
120 108
121   - Add alias handling (rboulton)
  109 +- Added documentation links.
122 110
123   - Add ElasticSearchIllegalArgumentException - used for example when writing to an alias which refers to more than one index. (rboulton)
  111 +- Renamed scroll_timeout in scroll.
124 112
125   - Handle errors produced by deleting a missing document, and add a test for it. (rboulton)
  113 +- Renamed field_name in name in ScriptFields.
126 114
127   - Split Query object into a Search object, for the search specific parts, and a Query base class. Allow ES.search() to take a query or a search object. Make some of the methods of Query base classes chainable, where that is an obviously reasonable thing to do. (rboulton)
  115 +- Added routing to delete document call.
128 116
129   -v. 0.14.0: Added delete of mapping type.
  117 +- Removed minimum_number_should_match parameter.It is not supported by ElasticSearch and causes errors when using a BoolFilter. (Jernej Kos)
130 118
131   - Embedded urllib3 to be buildout safe and for users sake.
  119 +- Improved speed json conversion of datetime values
132 120
133   - Some code cleanup.
  121 +- Added boost argument to TextQuery. (Jernej Kos)
134 122
135   - Added reindex by query (usable only with my elasticsearch git branch).
  123 +- Go back to urllib3 instead of requests. (gsakkis)
136 124
137   - Added contrib with mailman indexing.
  125 +- Enhance Twitter River class. (thanks @dendright)
138 126
139   - Autodetect if django is available and added related functions.
  127 +- Add OAuth authentication and filtering abilities to Twitter River. (Jack Riches)
140 128
141   - Code cleanup and PEP8.
  129 +- HasChildFilter expects a Query. (gsakkis)
142 130
143   - Reactivated the morelikethis query.
  131 +- Fixed _parent being pulled from _meta rather than the instance itself. (merrellb)
144 132
145   - Fixed river support plus unittest. (Tavis Aitken)
  133 +- Add support of all_terms to TermFacet. (mouad)
146 134
147   - Added autorefresh to sync search and write.
148 135
149   - Added QueryFilter.
  136 +0.19.0
  137 +======
150 138
151   - Forced name attribute in multifield declaration.
152 139
153   - Added is_empty to ConstantScoreQuery and fixed some bad behaviour.
  140 +- Use default_indices instead of hardcoding ['_all'] (gsakkis)
154 141
155   - Added CustomScoreQuery.
  142 +- Complete rewrite of connection_http (gsakkis)
156 143
157   - Added parent/children indexing.
  144 +- Don't collect info on creation of ES object (patricksmith)
158 145
159   - Added dump commands in a script file "curl" way.
  146 +- Add interval to histogram facet. (vrachil)
160 147
161   - Added a lot of fix from Richard Boulton.
  148 +- Improved connection string construction and added more flexibility. (ferhatsb)
162 149
163   -v. 0.13.1: Added jython support (HTTP only for now).
  150 +- Fixed pickling DotDict.
164 151
165   -v. 0.13.0: API Changes: errors -> exceptions.
  152 +- Fixed a bug in Decoder.
166 153
167   - Splitting of query/filters.
  154 +- Added execution to TermsFilter. Fixed missing _name attribute in serialized object
168 155
169   - Added open/close of index.
  156 +- Added _cache and _cache_key parameters to filters.
170 157
171   - Added the number of retries if server is down.
  158 +- Added scope, filter and global parameters to facets. closes #119
172 159
173   - Refactory Range query. (Andrei)
  160 +- Use a single global ConnectionPool instead of initializing it on every execute call. (gsakkis)
174 161
175   - Improved HTTP connection timeout/retries. (Sandymahalo)
  162 +- Allow partial_fields to be passed in the Search class. (tehmaze)
176 163
177   - Cleanup some imports. (Sandymahalo)
  164 +- Propagated parameters to bulker.
178 165
179   -v. 0.12.1: Added collecting server info.
  166 +- Support params for analyze. (akheron)
180 167
181   - Version 0.12 or above requirement.
  168 +- Added LimitFilter.
182 169
183   - Fixed attachment plugin.
  170 +- Fixed support for query as dict in Search object.
184 171
185   - Updated bulk insert to use new api.
  172 +- Added ListBulker implementation and create_bulker method.
186 173
187   - Added facet support (except geotypes).
  174 +- Moved imports to absolute ones.
188 175
189   - Added river support.
  176 +- Removed inused urllib3 files and added timeout to connection_http.
190 177
191   - Cleanup some method.
  178 +- Add NotFilter as facet filter (junckritter)
192 179
193   - Added default_indexes variable.
  180 +- Add terms facet filter
194 181
195   - Added datetime deserialization.
  182 +0.18.7-rc1
  183 +==========
196 184
197   - Improved performance and memory usage in bulk insert replacing list with StringIO.
198 185
199   - Initial propagation of elasticsearch exception to python.
  186 +- Tested against 0.18.7, with all tests passing
200 187
201   -v. 0.12.0: added http transport, added autodetect of transport, updated thrift interface.
  188 +- Added support for index_stats
202 189
203   -v. 0.10.3: added bulk insert, explain and facet.
  190 +0.17.0
  191 +======
204 192
205   -v. 0.10.2: added new geo query type.
  193 +- API BREAKING: Added new searcher iterator API. (To use the old code rename ".search" in ".search_raw")
206 194
207   -v. 0.10.1: added new connection pool system based on pycassa one.
  195 +- API BREAKING: renamed indexes in indices. To be complaint to ES documentation.
208 196
209   -v. 0.10.0: initial working version.
  197 +- Tests refactory.
  198 +
  199 +- Add model object to objetify a dict.
  200 +
  201 +0.16.0
  202 +======
  203 +
  204 +- Updated documentation.
  205 +
  206 +- Added TextQuery and some clean up of code.
  207 +
  208 +- Added percolator (matterkkila).
  209 +
  210 +- Added date_histogram facet (zebuline).
  211 +
  212 +- Added script fields to Search object, also add "fields" to TermFacet (aguereca).
  213 +
  214 +- Added analyze_wildcard param to StringQuery (available for ES 0.16.0) (zebuline).
  215 +
  216 +- Add ScriptFields object used as parameter script_fields of Search object (aguereca).
  217 +
  218 +- Add IdsQuery, IdsFilter and delete_by_query (aguereca).
  219 +
  220 +- Bulk delete (acdha).
  221 +
  222 +
  223 +0.15.0
  224 +======
  225 +
  226 +
  227 +- Only require simplejson for python < 2.6 (matterkkila)
  228 +
  229 +- Added basic version support to ES.index and Search (merrellb)
  230 +
  231 +- Added scan method to ES. This is only supported on ES Master (pre 0.16) (merrellb)
  232 +
  233 +- Added GeoPointField to mapping types (merrellb)
  234 +
  235 +- Disable thrift in setup.py.
  236 +
  237 +- Added missing _routing property in ObjectField
  238 +
  239 +- Added ExistsFilter
  240 +
  241 +- Improved HasChildren
  242 +
  243 +- Add min_similarity and prefix_length to flt.
  244 +
  245 +- Added _scope to HasChildQuery. (andreiz)
  246 +
  247 +- Added parent/child document in test indexing. Added _scope to HasChildFilter.
  248 +
  249 +- Added MissingFilter as a subclass of TermFilter
  250 +
  251 +- Fixed error in checking TermsQuery (merrellb)
  252 +
  253 +- If an analyzer is set on a field, the returned mapping will have an analyzer
  254 +
  255 +- Add a specific error subtype for mapper parsing exceptions (rboulton)
  256 +
  257 +- Add support for Float numeric field mappings (rboulton)
  258 +
  259 +- ES.get() now accepts "fields" as well as other keyword arguments (eg "routing") (rboulton)
  260 +
  261 +- Allow dump_curl to be passed a filehandle (or still a filename), don't for filenames to be in /tmp, and add a basic test of it.
  262 +
  263 +- Add alias handling (rboulton)
  264 +
  265 +- Add ElasticSearchIllegalArgumentException - used for example when writing to an alias which refers to more than one index. (rboulton)
  266 +
  267 +- Handle errors produced by deleting a missing document, and add a test for it. (rboulton)
  268 +
  269 +- Split Query object into a Search object, for the search specific parts, and a Query base class. Allow ES.search() to take a query or a search object. Make some of the methods of Query base classes chainable, where that is an obviously reasonable thing to do. (rboulton)
  270 +
  271 +0.14.0
  272 +======
  273 +
  274 +
  275 +- Added delete of mapping type.
  276 +
  277 +- Embedded urllib3 to be buildout safe and for users sake.
  278 +
  279 +- Some code cleanup.
  280 +
  281 +- Added reindex by query (usable only with my elasticsearch git branch).
  282 +
  283 +- Added contrib with mailman indexing.
  284 +
  285 +- Autodetect if django is available and added related functions.
  286 +
  287 +- Code cleanup and PEP8.
  288 +
  289 +- Reactivated the morelikethis query.
  290 +
  291 +- Fixed river support plus unittest. (Tavis Aitken)
  292 +
  293 +- Added autorefresh to sync search and write.
  294 +
  295 +- Added QueryFilter.
  296 +
  297 +- Forced name attribute in multifield declaration.
  298 +
  299 +- Added is_empty to ConstantScoreQuery and fixed some bad behaviour.
  300 +
  301 +- Added CustomScoreQuery.
  302 +
  303 +- Added parent/children indexing.
  304 +
  305 +- Added dump commands in a script file "curl" way.
  306 +
  307 +- Added a lot of fix from Richard Boulton.
  308 +
  309 +0.13.1
  310 +======
  311 +
  312 +- Added jython support (HTTP only for now).
  313 +
  314 +0.13.0
  315 +======
  316 +
  317 +- API Changes: errors -> exceptions.
  318 +
  319 +- Splitting of query/filters.
  320 +
  321 +- Added open/close of index.
  322 +
  323 +- Added the number of retries if server is down.
  324 +
  325 +- Refactory Range query. (Andrei)
  326 +
  327 +- Improved HTTP connection timeout/retries. (Sandymahalo)
  328 +
  329 +- Cleanup some imports. (Sandymahalo)
  330 +
  331 +0.12.1
  332 +======
  333 +
  334 +- Added collecting server info.
  335 +
  336 +- Version 0.12 or above requirement.
  337 +
  338 +- Fixed attachment plugin.
  339 +
  340 +- Updated bulk insert to use new api.
  341 +
  342 +- Added facet support (except geotypes).
  343 +
  344 +- Added river support.
  345 +
  346 +- Cleanup some method.
  347 +
  348 +- Added default_indexes variable.
  349 +
  350 +- Added datetime deserialization.
  351 +
  352 +- Improved performance and memory usage in bulk insert replacing list with StringIO.
  353 +
  354 +- Initial propagation of elasticsearch exception to python.
  355 +
  356 +0.12.0
  357 +======
  358 +
  359 +- Added http transport, added autodetect of transport, updated thrift interface.
  360 +
  361 +0.10.3
  362 +======
  363 +
  364 +- Added bulk insert, explain and facet.
  365 +
  366 +0.10.2
  367 +======
  368 +
  369 +- Added new geo query type.
  370 +
  371 +0.10.1
  372 +======
  373 +
  374 +- Added new connection pool system based on pycassa one.
  375 +
  376 +0.10.0
  377 +======
  378 +
  379 +- Initial working version.
29 FAQ
... ... @@ -1,3 +1,5 @@
  1 +.. _faq:
  2 +
1 3 ============================
2 4 Frequently Asked Questions
3 5 ============================
@@ -5,4 +7,29 @@
5 7 .. contents::
6 8 :local:
7 9
8   -TO be written
  10 +.. _faq-general:
  11 +
  12 +General
  13 +=======
  14 +
  15 +.. _faq-when-to-use:
  16 +
  17 +What connection type should I use?
  18 +----------------------------------
  19 +
  20 +For general usage I suggest to use HTTP connection versus your server.
  21 +
  22 +For more fast performance, mainly in indexing, I suggest to use thrift because its latency is lower.
  23 +
  24 +How you can return a plain dict from a resultset?
  25 +=================================================
  26 +
  27 +ResultSet iterates on ElasticSearchModel by default, to change this behaviour you need to pass a an object that
  28 +receive a connection and a dict object.
  29 +
  30 +To return plain dict object, you must pass to the search call a model parameter:
  31 +
  32 +.. code-block:: python
  33 +
  34 + model=lambda x,y:y
  35 +
52 README.rst
Source Rendered
@@ -36,21 +36,61 @@ http://pyes.readthedocs.org/en/latest/
36 36 Changelog
37 37 =========
38 38
39   -v. 0.18.7-rc1:
  39 +v. 0.19.1:
40 40
41   - Tested against 0.18.7, with all tests passing
  41 + Renamed field_name in name in ScriptFields
42 42
43   - Added support for index_stats
  43 + Fixed ResultSet slicing.
44 44
45   -v. 0.17.0:
  45 + Create Manager to manage API action grouped as Elasticsearch.
  46 +
  47 + Moved tests outside pyes code dir. Update references. Upgraded test elasticsearch to 0.19.9.
  48 +
  49 + Added documentation links
  50 +
  51 + Got docs building on readthedocs.org (Wraithan - Chris McDonald)
  52 +
  53 + Renamed scroll_timeout in scroll
  54 +
  55 + Moved FacetFactory include
  56 +
  57 + Renamed field_name in name in ScriptFields
  58 +
  59 + Using only thrift_connect to manage thrift existence
  60 +
  61 + Added model and scan to query
  62 +
  63 + Added exists document call
  64 +
  65 + Added routing to delete
  66 +
  67 + Removed minimum_number_should_match parameter.It is not supported by elastic search and causes errors when using a BoolFilter. (Jernej Kos)
  68 +
  69 + Improved speed json conversion of datetime values
  70 +
  71 + Add boost argument to TextQuery
  72 +
  73 + Added boost argument to TextQuery. (Jernej Kos)
  74 +
  75 + Go back to urllib3 instead of requests. (gsakkis)
  76 +
  77 + Enhance Twitter River class. (thanks @dendright)
  78 +
  79 + Add OAuth authentication and filtering abilities to Twitter River. (Jack Riches)
  80 +
  81 + HasChildFilter expects a Query. (gsakkis)
  82 +
  83 + Fixed _parent being pulled from _meta rather than the instance itself. (merrellb)
  84 +
  85 + Add support of all_terms to TermFacet. (mouad)
46 86
47   - API BREAKING: Added new searcher iterator API. (To use the old code rename ".search" in ".search_raw")
48 87
49   - Tests refactory.
50 88
51 89 TODO
52 90 ----
53 91
  92 +- add ORM to manage objects
  93 +- much more documentation
54 94 - add coverage
55 95 - add jython native client protocol
56 96
350 docs/guide/appendix/glossary.rst
Source Rendered
@@ -4,226 +4,210 @@
4 4 Glossary
5 5 ========
6 6
7   -glossary:
8   --
9   - id: analysis
10   - text: >
11   - Analysis is the process of converting full text_ to terms_.
12   - Depending on which analyzer is used, these phrases: "**FOO BAR**",
13   - "**Foo-Bar**", "**foo,bar**" will probably all result in the terms "**foo**"
14   - and "**bar**". These terms are what is actually stored in the index.
15   -
16   -
17   - A full text query (not a term_ query) for "**FoO:bAR**" will
18   - also be analyzed to the terms "**foo**","**bar**" and will thus match
19   - the terms stored in the index.
20   -
21   -
22   - It is this process of analysis (both at index time and at search time)
23   - that allows elasticsearch to perform full text queries.
24   -
25   -
26   - Also see text_ and term_.
27   --
28   - id: cluster
29   - text: >
30   - A cluster consists of one or more nodes_ which share the same
31   - cluster name. Each cluster has a single master node which is
32   - chosen automatically by the cluster and which can be replaced if
33   - the current master node fails.
34   -
35   --
36   - id: document
37   - text: >
38   - A document is a JSON document which is stored in elasticsearch. It is
39   - like a row in a table in a relational database. Each document is
40   - stored in an index_ and has a type_
41   - and an id_.
  7 +.. _glossary-analysis:
42 8
  9 +analysis
  10 + Analysis is the process of converting full :ref:`text <glossary-text>` to :ref:`terms <glossary-term>`.
  11 + Depending on which analyzer is used, these phrases: "**FOO BAR**",
  12 + "**Foo-Bar**", "**foo,bar**" will probably all result in the terms "**foo**"
  13 + and "**bar**". These terms are what is actually stored in the index.
43 14
44   - A document is a JSON object (also known in other languages
45   - as a hash / hashmap / associative array) which contains zero or more
46   - fields_, or key-value pairs.
  15 + A full text query (not a :ref:`term <glossary-term>` query) for "**FoO:bAR**" will
  16 + also be analyzed to the terms "**foo**","**bar**" and will thus match
  17 + the terms stored in the index.
47 18
  19 + It is this process of analysis (both at index time and at search time)
  20 + that allows elasticsearch to perform full text queries.
48 21
49   - The original JSON document that is indexed will be stored in the
50   - **_source** field_, which is returned by default
51   - when getting or searching for a document.
  22 + Also see :ref:`text <glossary-text>` and :ref:`term <glossary-term>`.
52 23
53   --
54   - id: id
55   - text: >
56   - The ID of a document_ identifies a document. The
57   - **index/type/id** of a document must be unique. If no ID is provided,
58   - then it will be auto-generated. (also see routing_)
  24 +.. _glossary-cluster:
59 25
60   --
61   - id: field
62   - text: >
63   - A document_ contains a list of fields, or key-value pairs.
64   - The value can be a simple (scalar) value (eg a string, integer, date),
65   - or a nested structure like an array or an object. A field is similar
66   - to a column in a table in a relational database.
  26 +cluster
  27 + A cluster consists of one or more :ref:`nodes <glossary-node>` which share the same
  28 + cluster name. Each cluster has a single master node which is
  29 + chosen automatically by the cluster and which can be replaced if
  30 + the current master node fails.
67 31
  32 +.. _glossary-document:
68 33
69   - The mapping_ for each field has a field 'type'
70   - (not to be confused with document type_) which indicates the
71   - type of data that can be stored in that field, eg
72   - **integer**, **string**, **object**.
73   - The mapping also allows you to define (amongst other things) how the
74   - value for a field should be analyzed.
  34 +document
  35 + A document is a JSON document which is stored in elasticsearch. It is
  36 + like a row in a table in a relational database. Each document is
  37 + stored in an :ref:`index <glossary-index>` and has a :ref:`type <glossary-type>`
  38 + and an :ref:`id <glossary-id>`.
75 39
76   --
77   - id: index
78   - text: >
79   - An index is like a 'database' in a relational database. It has a
80   - mapping_ which defines multiple
81   - types_.
  40 + A document is a JSON object (also known in other languages
  41 + as a hash / hashmap / associative array) which contains zero or more
  42 + :ref:`fields <glossary-field>`, or key-value pairs.
82 43
  44 + The original JSON document that is indexed will be stored in the
  45 + **_source** :ref:`field <glossary-field>`, which is returned by default
  46 + when getting or searching for a document.
83 47
84   - An index is a logical namespace which maps to one or more
85   - primary shards_ and can have zero or more
86   - replica shards_.
  48 +.. _glossary-id:
87 49
88   --
89   - id: mapping
90   - text: >
91   - A mapping is like a 'schema definition' in a relational database.
92   - Each index_ has a mapping, which defines each
93   - type_ within the index, plus a number of
94   - index-wide settings.
  50 +id
  51 + The ID of a :ref:`document <glossary-document>` identifies a document. The
  52 + **index/type/id** of a document must be unique. If no ID is provided,
  53 + then it will be auto-generated. (also see :ref:`routing <glossary-routing>`)
95 54
  55 +.. _glossary-field:
96 56
97   - A mapping can either be defined explicitly, or it will be generated
98   - automatically when a document is indexed.
99   --
100   - id: node
101   - text: >
102   - A node is a running instance of elasticsearch which belongs to a
103   - cluster_. Multiple nodes can be started on a single
104   - server for testing purposes, but usually you should have one node
105   - per server.
106   -
107   -
108   - At startup, a node will use unicast (or multicast, if specified)
109   - to discover an existing cluster with the same cluster name and will
110   - try to join that cluster.
  57 +field
  58 + A :ref:`document <glossary-document>` contains a list of fields, or key-value pairs.
  59 + The value can be a simple (scalar) value (eg a string, integer, date),
  60 + or a nested structure like an array or an object. A field is similar
  61 + to a column in a table in a relational database.
111 62
112   --
113   - id: primary shard
114   - text: >
115   - Each document is stored in a single primary shard_. When you
116   - index a document, it is indexed first on the primary shard, then
117   - on all replicas_ of the primary shard.
  63 + The :ref:`mapping <glossary-mapping>` for each field has a field 'type'
  64 + (not to be confused with document :ref:`type <glossary-type>`) which indicates the
  65 + type of data that can be stored in that field, eg
  66 + **integer**, **string**, **object**.
  67 + The mapping also allows you to define (amongst other things) how the
  68 + value for a field should be analyzed.
118 69
  70 +.. _glossary-index:
119 71
120   - By default, an index_ has 5 primary shards. You can specify fewer
121   - or more primary shards to scale the number of documents_
122   - that your index can handle.
123   -
124   -
125   - You cannot change the number of primary shards in an index, once the
126   - index is created.
127   -
128   -
129   - See also routing_
130   -
131   --
132   - id: replica shard
133   - text: >
134   - Each primary shard_ can have zero or more replicas.
135   - A replica is a copy of the primary shard, and has two purposes:
  72 +index
  73 + An index is like a 'database' in a relational database. It has a
  74 + :ref:`mapping <glossary-mapping>` which defines multiple
  75 + :ref:`types <glossary-type>`.
136 76
137   - # increase failover: a replica shard can be promoted
138   - to a primary shard if the primary fails
  77 + An index is a logical namespace which maps to one or more
  78 + primary :ref:`shards <glossary-shard>` and can have zero or more
  79 + replica :ref:`shards <glossary-shard>`.
139 80
140   - # increase performance: get and search requests can be handled by
141   - primary or replica shards.
142 81
  82 +.. _glossary-mapping:
143 83
  84 +mapping
  85 + A mapping is like a 'schema definition' in a relational database.
  86 + Each :ref:`index <glossary-index>` has a mapping, which defines each
  87 + :ref:`type <glossary-type>` within the index, plus a number of
  88 + index-wide settings.
144 89
145   - By default, each primary shard has one replica, but the number
146   - of replicas can be changed dynamically on an existing index.
147   - A replica shard will never be started on the same node as its primary
148   - shard.
149   -
150   --
151   - id: routing
152   - text: >
153   - When you index a document, it is stored on a single
154   - primary shard_. That shard is chosen by hashing
155   - the **routing** value. By default, the **routing** value is derived
156   - from the ID of the document or, if the document has a specified
157   - parent document, from the ID of the parent document (to ensure
158   - that child and parent documents are stored on the same shard).
  90 + A mapping can either be defined explicitly, or it will be generated
  91 + automatically when a document is indexed.
159 92
  93 +.. _glossary-node:
160 94
161   - This value can be overridden by specifying a **routing** value at index
162   - time, or a :ref:`routing field <es-guide-reference-mapping-routing-field>` in the mapping_.
  95 +node
  96 + A node is a running instance of elasticsearch which belongs to a
  97 + :ref:`cluster <glossary-cluster>`. Multiple nodes can be started on a single
  98 + server for testing purposes, but usually you should have one node
  99 + per server.
163 100
164   --
165   - id: shard
166   - text: >
167   - A shard is a single Lucene instance. It is a low-level "worker" unit
168   - which is managed automatically by elasticsearch. An index
169   - is a logical namespace which points to primary_
170   - and replica_ shards.
  101 + At startup, a node will use unicast (or multicast, if specified)
  102 + to discover an existing cluster with the same cluster name and will
  103 + try to join that cluster.
171 104
  105 +.. _glossary-primary-shard:
172 106
173   - Other than defining the number of primary and replica shards that
174   - an index should have, you never need to refer to shards directly.
175   - Instead, your code should deal only with an index.
  107 +primary shard
  108 + Each document is stored in a single primary :ref:`shard <glossary-shard>`. When you
  109 + index a document, it is indexed first on the primary shard, then
  110 + on all :ref:`replicas <glossary-replica-shard>` of the primary shard.
176 111
  112 + By default, an :ref:`index <glossary-index>` has 5 primary shards. You can specify fewer
  113 + or more primary shards to scale the number of :ref:`documents <glossary-document>`
  114 + that your index can handle.
177 115
178   - Elasticsearch distributes shards amongst all nodes_ in
179   - the cluster_, and can be move shards automatically from
180   - one node to another in the case of node failure, or the addition
181   - of new nodes.
  116 + You cannot change the number of primary shards in an index, once the
  117 + index is created.
182 118
183   --
184   - id: source field
185   - text: >
186   - By default, the JSON document that you index will be stored in the
187   - **_source** field and will be returned by all get and search requests.
188   - This allows you access to the original object directly from search
189   - results, rather than requiring a second step to retrieve the object
190   - from an ID.
  119 + See also :ref:`routing <glossary-routing>`
191 120
192 121
193   - Note: the exact JSON string that you indexed will be returned to you,
194   - even if it contains invalid JSON. The contents of this field do not
195   - indicate anything about how the data in the object has been indexed.
196   --
197   - id: term
198   - text: >
199   - A term is an exact value that is indexed in elasticsearch. The terms
200   - **foo**, **Foo**, **FOO are NOT equivalent. Terms (ie exact values) can
201   - be searched for using 'term' queries.
  122 +.. _glossary-replica-shard:
202 123
203   - See also text_ and analysis_.
204   --
205   - id: text
206   - text: >
207   - Text (or full text) is ordinary unstructured text, such as this
208   - paragraph. By default, text will by :ref:`analyzed <es-guide-appendix-analysis>` into
209   - terms_, which is what is actually stored in the index.
  124 +replica shard
  125 + Each primary :ref:`shard <glossary-shard>` can have zero or more replicas.
  126 + A replica is a copy of the primary shard, and has two purposes:
210 127
  128 + # increase failover: a replica shard can be promoted
  129 + to a primary shard if the primary fails
211 130
212   - Text fields_ need to be analyzed at index time in order to
213   - be searchable as full text, and keywords in full text queries must
214   - be analyzed at search time to produce (and search for) the same
215   - terms that were generated at index time.
  131 + # increase performance: get and search requests can be handled by
  132 + primary or replica shards.
216 133
  134 + By default, each primary shard has one replica, but the number
  135 + of replicas can be changed dynamically on an existing index.
  136 + A replica shard will never be started on the same node as its primary
  137 + shard.
217 138
218   - See also term_ and analysis_.
219   --
220   - id: type
221   - text: >
222   - A type is like a 'table' in a relational database. Each type has
223   - a list of fields_ that can be specified for
224   - documents_ of that type. The
225   - mapping_ defines how each field in the document
226   - is analyzed.
  139 +.. _glossary-routing:
227 140
  141 +routing
  142 + When you index a document, it is stored on a single
  143 + primary :ref:`shard <glossary-shard>`. That shard is chosen by hashing
  144 + the **routing** value. By default, the **routing** value is derived
  145 + from the ID of the document or, if the document has a specified
  146 + parent document, from the ID of the parent document (to ensure
  147 + that child and parent documents are stored on the same shard).
228 148
  149 + This value can be overridden by specifying a **routing** value at index
  150 + time, or a :ref:`routing field <es-guide-reference-mapping-routing-field>` in the :ref:`mapping <glossary-mapping>`.
229 151
  152 +.. _glossary-shard:
  153 +
  154 +shard
  155 + A shard is a single Lucene instance. It is a low-level "worker" unit
  156 + which is managed automatically by elasticsearch. An index
  157 + is a logical namespace which points to :ref:`primary <glossary-primary-shard>`
  158 + and :ref:`replica <glossary-replica-shard>` shards.
  159 +
  160 + Other than defining the number of primary and replica shards that
  161 + an index should have, you never need to refer to shards directly.
  162 + Instead, your code should deal only with an index.
  163 +
  164 + Elasticsearch distributes shards amongst all :ref:`nodes <glossary-node>` in
  165 + the :ref:`cluster <glossary-cluster>`, and can be move shards automatically from
  166 + one node to another in the case of node failure, or the addition
  167 + of new nodes.
  168 +
  169 +.. _glossary-source-field:
  170 +
  171 +source field
  172 + By default, the JSON document that you index will be stored in the
  173 + **_source** field and will be returned by all get and search requests.
  174 + This allows you access to the original object directly from search
  175 + results, rather than requiring a second step to retrieve the object
  176 + from an ID.
  177 +
  178 +
  179 + Note: the exact JSON string that you indexed will be returned to you,
  180 + even if it contains invalid JSON. The contents of this field do not
  181 + indicate anything about how the data in the object has been indexed.
  182 +
  183 +.. _glossary-term:
  184 +
  185 +term
  186 + A term is an exact value that is indexed in elasticsearch. The terms
  187 + **foo**, **Foo**, **FOO** are NOT equivalent. Terms (ie exact values) can
  188 + be searched for using 'term' queries.
  189 +
  190 + See also :ref:`text <glossary-text>` and :ref:`analysis <glossary-analysis>`.
  191 +
  192 +.. _glossary-text:
  193 +
  194 +text
  195 + Text (or full text) is ordinary unstructured text, such as this
  196 + paragraph. By default, text will by :ref:`analyzed <glossary-analysis>` into
  197 + :ref:`terms <glossary-term>`, which is what is actually stored in the index.
  198 +
  199 + Text :ref:`fields <glossary-field>` need to be analyzed at index time in order to
  200 + be searchable as full text, and keywords in full text queries must
  201 + be analyzed at search time to produce (and search for) the same
  202 + terms that were generated at index time.
  203 +
  204 + See also :ref:`term <glossary-term>` and :ref:`analysis <glossary-analysis>`.
  205 +
  206 +.. _glossary-type:
  207 +
  208 +type
  209 + A type is like a 'table' in a relational database. Each type has
  210 + a list of :ref:`fields <glossary-field>` that can be specified for
  211 + :ref:`documents <glossary-document>` of that type. The
  212 + :ref:`mapping <glossary-mapping>` defines how each field in the document
  213 + is analyzed.
1  docs/index.rst
Source Rendered
@@ -14,6 +14,7 @@ Contents:
14 14 links
15 15 guide/reference/index
16 16 guide/appendix/index
  17 + guide/appendix/glossary
17 18
18 19
19 20 Indices and tables
6 docs/manual/connections.rst
Source Rendered
... ... @@ -1,3 +1,5 @@
  1 +.. _pyes-connections:
  2 +
1 3 Connections
2 4 ===========
3 5
@@ -16,12 +18,16 @@ For thrift:
16 18
17 19 >>> conn = pyes.ES() # Defaults to connecting to the server at '127.0.0.1:9500'
18 20 >>> conn = pyes.ES(['127.0.0.1:9500'])
  21 + >>> conn = pyes.ES(("thrift", "127.0.0.1", "9500"))
  22 + >>> conn = pyes.ES([("thrift", "127.0.0.1", "9500"), ("thrift", "192.168.1.1", "9500"),])
19 23
20 24 For http:
21 25
22 26 .. code-block:: python
23 27
24 28 >>> conn = pyes.ES(['127.0.0.1:9200'])
  29 + >>> conn = pyes.ES(("http", "127.0.0.1","9200"))
  30 + >>> conn = pyes.ES([("thrift", "127.0.0.1", "9200"), ("thrift", "192.168.1.1", "8000"),])
25 31
26 32 Connections are robust to server failures. Upon a disconnection, it will attempt to connect to each server in the list in turn. If no server is available, it will raise a NoServerAvailable exception.
27 33
4 docs/manual/index.rst
Source Rendered
@@ -11,4 +11,6 @@
11 11 installation
12 12 usage
13 13 connections
14   - queries
  14 + models
  15 + queries
  16 + resultset
65 docs/manual/models.rst
Source Rendered
... ... @@ -0,0 +1,65 @@
  1 +.. _pyes-models:
  2 +
  3 +Models
  4 +======
  5 +
  6 +DotDict
  7 +-------
  8 +
  9 +The DotDict is the base model used. It allows to use a dict with the DotNotation.
  10 +
  11 +.. code-block:: python
  12 +
  13 + >>> dotdict = DotDict(foo="bar")
  14 + >>> dotdict2 = deepcopy(dotdict)
  15 + >>> dotdict2["foo"] = "baz"
  16 + >>> dotdict.foo = "bar"
  17 + >>> dotdict2.foo== "baz"
  18 + True
  19 +
  20 +ElasticSearchModel
  21 +------------------
  22 +
  23 +It extends DotDict adding methods for common uses.
  24 +
  25 +Every search return an ElasticSearchModel as result. Iterating on results, you iterate on ElasticSearchModel objects.
  26 +
  27 +You can create a new one with the factory or get one by search/get methods.
  28 +
  29 +.. code-block:: python
  30 +
  31 + obj = self.conn.factory_object(self.index_name, self.document_type, {"name": "test", "val": 1})
  32 + assert obj.name=="test"
  33 +
  34 +You can change value via dot notation or dictionary.
  35 +
  36 +.. code-block:: python
  37 +
  38 + obj.name = "aaa"
  39 + assert obj.name == "aaa"
  40 + assert obj.val == 1
  41 +
  42 +You can change ES info via ._meta property or get_meta call.
  43 +
  44 +.. code-block:: python
  45 +
  46 + assert obj._meta.id is None
  47 + obj._meta.id = "dasdas"
  48 + assert obj._meta.id == "dasdas"
  49 +
  50 +Remember that it works as a dict object.
  51 +
  52 +.. code-block:: python
  53 +
  54 + assert sorted(obj.keys()) == ["name", "val"]
  55 +
  56 +You can save it.
  57 +
  58 +.. code-block:: python
  59 +
  60 + obj.save()
  61 + obj.name = "test2"
  62 + obj.save()
  63 +
  64 + reloaded = self.conn.get(self.index_name, self.document_type, obj._meta.id)
  65 + assert reloaded.name, "test2")
2  docs/manual/queries.rst
Source Rendered
... ... @@ -1,3 +1,5 @@
  1 +.. _pyes-queries:
  2 +
1 3 Queries
2 4 =======
3 5
40 docs/manual/resultset.rst
Source Rendered
... ... @@ -0,0 +1,40 @@
  1 +.. _pyes-resultset:
  2 +
  3 +ResultSet
  4 +=========
  5 +
  6 +This object is returned as result of a query. It's lazy.
  7 +
  8 +.. code-block:: python
  9 +
  10 + >>> resultset = self.conn.search(Search(MatchAllQuery(), size=20), self.index_name, self.document_type)
  11 +
  12 +It contains the matched and limited records. Very useful to use in pagination.
  13 +
  14 +.. code-block:: python
  15 +
  16 + >>> len([p for p in resultset])
  17 + 20
  18 +
  19 +The total matched results is in the total property.
  20 +
  21 +.. code-block:: python
  22 +
  23 + >>> resultset.total
  24 + 1000
  25 +
  26 +You can slice it.
  27 +
  28 +.. code-block:: python
  29 +
  30 + >>> resultset = self.conn.search(Search(MatchAllQuery(), size=10), self.index_name, self.document_type)
  31 + >>> len([p for p in resultset[:10]])
  32 + 10
  33 +
  34 +Remember all result are default ElasticSearchModel objects
  35 +
  36 +.. code-block:: python
  37 +
  38 + >>> resultset[10].uuid
  39 + "11111"
  40 +
38 docs/manual/usage.rst
Source Rendered
... ... @@ -1,12 +1,12 @@
1 1 Usage
2 2 =====
3 3
4   -Creating a connection:
  4 +Creating a connection. (See more details here :ref:`pyes-connections`)
5 5
6 6 .. code-block:: python
7 7
8 8 >>> from pyes import *
9   - >>> conn = ES('127.0.0.1:9200')
  9 + >>> conn = ES('127.0.0.1:9200') #for http
10 10
11 11 Deleting an index:
12 12
@@ -17,7 +17,7 @@ Deleting an index:
17 17 >>> except:
18 18 >>> pass
19 19
20   -(an exception is fored if the index is not present)
  20 +(an exception is raised if the index is not present)
21 21
22 22 Create an index:
23 23
@@ -25,7 +25,7 @@ Create an index:
25 25
26 26 >>> conn.create_index("test-index")
27 27
28   -Creating a mapping:
  28 +Creating a mapping via dictionary:
29 29
30 30 .. code-block:: python
31 31
@@ -52,6 +52,29 @@ Creating a mapping:
52 52 >>> 'type': u'string'}}
53 53 >>> conn.put_mapping("test-type", {'properties':mapping}, ["test-index"])