Skip to content
This repository
Browse code

Added proper metadata to elasticsearch node; updated documentation; r…

…egenerated node reference (to contain the new node)
  • Loading branch information...
commit 2f602fc7db22f53ff7ac1a8143981aac46d2c254 1 parent decf1be
Stefan Urbanek authored
3  brewery/nodes/__init__.py
@@ -29,12 +29,15 @@
29 29 "FunctionSelectNode",
30 30 "AuditNode",
31 31
  32 + # Source nodes
32 33 "RowListSourceNode",
33 34 "RecordListSourceNode",
34 35 "StreamSourceNode",
35 36 "CSVSourceNode",
36 37 "YamlDirectorySourceNode",
  38 + "ESSourceNode",
37 39
  40 + # Target nodes
38 41 "RowListTargetNode",
39 42 "RecordListTargetNode",
40 43 "StreamTargetNode",
41 brewery/nodes/source_nodes.py
@@ -525,24 +525,37 @@ def finalize(self):
525 525
526 526 class ESSourceNode(SourceNode):
527 527 """Source node that reads from an ElasticSearch index.
  528 +
  529 + See ElasticSearch home page for more information:
  530 + http://www.elasticsearch.org/
528 531 """
  532 +
529 533 node_info = {
530   - "label" : "SQL Source",
531   - "icon": "sql_source_node",
532   - "description" : "Read data from a sql table.",
  534 + "label" : "ElasticSearch Source",
  535 + "icon": "generic_node",
  536 + "description" : "Read data from ElasticSearch engine",
533 537 "attributes" : [
534   - {
535   - "name": "uri",
536   - "description": "ElasticSearch URL"
  538 + {
  539 + "name": "document_type",
  540 + "description": "ElasticSearch document type name"
  541 + },
  542 + {
  543 + "name": "expand",
  544 + "description": "expand dictionary values and treat children as "\
  545 + " top-level keys with dot '.' separated key path to the child"
537 546 },
538   - {
539   - "name": "index",
540   - "description": "index name",
541   - },
542   - {
543   - "name": "type",
544   - "description": "type name",
545   - }
  547 + {
  548 + "name": "database",
  549 + "description": "database name"
  550 + },
  551 + {
  552 + "name": "host",
  553 + "description": "database server host, default is localhost"
  554 + },
  555 + {
  556 + "name": "port",
  557 + "description": "database server port, default is 27017"
  558 + }
546 559 ]
547 560 }
548 561 def __init__(self, *args, **kwargs):
73 doc/node_reference.rst
Source Rendered
@@ -44,6 +44,41 @@ read from the file header if specified by `read_header` flag. Field storage type
44 44 * - quotechar
45 45 - character used for quoting string values, default is double quote
46 46
  47 +.. _ESSourceNode:
  48 +
  49 +ElasticSearch Source
  50 +--------------------
  51 +
  52 +.. image:: nodes/generic_node.png
  53 + :align: right
  54 +
  55 +**Synopsis:** *Read data from ElasticSearch engine*
  56 +
  57 +**Identifier:** es_source (class: :class:`brewery.nodes.ESSourceNode`)
  58 +
  59 +Source node that reads from an ElasticSearch index.
  60 +
  61 +See ElasticSearch home page for more information:
  62 +http://www.elasticsearch.org/
  63 +
  64 +
  65 +.. list-table:: Attributes
  66 + :header-rows: 1
  67 + :widths: 40 80
  68 +
  69 + * - attribute
  70 + - description
  71 + * - document_type
  72 + - ElasticSearch document type name
  73 + * - expand
  74 + - expand dictionary values and treat children as top-level keys with dot '.' separated key path to the child
  75 + * - database
  76 + - database name
  77 + * - host
  78 + - database server host, default is localhost
  79 + * - port
  80 + - database server port, default is 27017
  81 +
47 82 .. _GeneratorFunctionSourceNode:
48 83
49 84 Callable Generator Source
@@ -213,7 +248,7 @@ Data Stream Source
213 248
214 249 **Identifier:** stream_source (class: :class:`brewery.nodes.StreamSourceNode`)
215 250
216   -Generic data stream source. Wraps a :mod:`brewery.ds` data source and feeds data to the
  251 +Generic data stream source. Wraps a :mod:`brewery.ds` data source and feeds data to the
217 252 output.
218 253
219 254 The source data stream should configure fields on initialize().
@@ -417,9 +452,9 @@ You can use ``**record`` to catch all or rest of the fields as dictionary:
417 452
418 453 def get_half(**record):
419 454 return record["i"] / 2
420   -
  455 +
421 456 node.formula = get_half
422   -
  457 +
423 458
424 459 The formula can be also a string with python expression where local variables are record field
425 460 values:
@@ -539,7 +574,7 @@ Merge Node
539 574
540 575 Merge two or more streams (join).
541 576
542   -Inputs are joined in a star-like fashion: one input is considered master and others are
  577 +Inputs are joined in a star-like fashion: one input is considered master and others are
543 578 details adding information to the master. By default master is the first input.
544 579 Joins are specified as list of tuples: (`input_tag`, `master_input_key`, `other_input_key`).
545 580
@@ -547,7 +582,7 @@ Following configuration code shows how to add region and category details:
547 582
548 583 .. code-block:: python
549 584
550   - node.keys = [ [1, "region_code", "code"],
  585 + node.keys = [ [1, "region_code", "code"],
551 586 [2, "category_code", "code"] ]
552 587
553 588 Master input should have fields `region_code` and `category_code`, other inputs should have
@@ -555,7 +590,7 @@ Master input should have fields `region_code` and `category_code`, other inputs
555 590
556 591 .. code-block:: python
557 592
558   - node.keys = [ [1, "region_code", "code"],
  593 + node.keys = [ [1, "region_code", "code"],
559 594 [2, ("category_code", "year"), ("code", "year")] ]
560 595
561 596 As a key you might use either name of a sigle field or list of fields for compound keys. If
@@ -566,7 +601,7 @@ The detail key might be omitted if it the same as in master input:
566 601
567 602 .. code-block:: python
568 603
569   - node.keys = [ [1, "region_code"],
  604 + node.keys = [ [1, "region_code"],
570 605 [2, "category_code"] ]
571 606
572 607 Master input should have fields `region_code` and `category_code`, input #1 should have
@@ -574,7 +609,7 @@ Master input should have fields `region_code` and `category_code`, input #1 shou
574 609
575 610 To filter-out fields you do not want in your output or to rename fields you can use `maps`. It
576 611 should be a dictionary where keys are input tags and values are either
577   -:class:`brewery.FieldMap` objects or dictionaries with keys ``rename`` and ``drop``.
  612 +:class:`FieldMap` objects or dictionaries with keys ``rename`` and ``drop``.
578 613
579 614 Following example renames ``source_region_name`` field in input 0 and drops field `id` in
580 615 input 1:
@@ -582,8 +617,8 @@ input 1:
582 617 .. code-block:: python
583 618
584 619 node.maps = {
585   - 0: brewery.FieldMap(rename = {"source_region_name":"region_name"}),
586   - 1: brewery.FieldMap(drop = ["id"])
  620 + 0: FieldMap(rename = {"source_region_name":"region_name"}),
  621 + 1: FieldMap(drop = ["id"])
587 622 }
588 623
589 624 It is the same as:
@@ -656,7 +691,7 @@ and rest is discarded. When it is true, then sample is discarded and rest is pas
656 691
657 692 * - attribute
658 693 - description
659   - * - sample_size
  694 + * - size
660 695 - Size of the sample to be passed to the output
661 696 * - discard
662 697 - flag whether the sample is discarded or included
@@ -690,9 +725,9 @@ You can use ``**record`` to catch all or rest of the fields as dictionary:
690 725
691 726 def is_big_enough(**record):
692 727 return record["i"] > 1000000
693   -
  728 +
694 729 node.condition = is_big_enough
695   -
  730 +
696 731
697 732 The condition can be also a string with python expression where local variables are record field
698 733 values:
@@ -774,8 +809,6 @@ Binning modes:
774 809 * n-tiles by count or by sum
775 810 * record rank
776 811
777   -
778   -
779 812
780 813 .. _CoalesceValueToTypeNode:
781 814
@@ -924,15 +957,14 @@ For example:
924 957 Generated field will be `amount_threshold` and will contain one of three possible values:
925 958 `low`, `medium`, `hight`
926 959
927   -Another possible use case might be for binning after data audit: we want to measure null
  960 +Another possible use case might be for binning after data audit: we want to measure null
928 961 record count and we set thresholds:
929   -
  962 +
930 963 * ratio < 5% is ok
931 964 * 5% <= ratio <= 15% is fair
932 965 * ratio > 15% is bad
933   -
  966 +
934 967 We set thresholds as ``(0.05, 0.15)`` and values to ``("ok", "fair", "bad")``
935   -
936 968
937 969
938 970 .. list-table:: Attributes
@@ -971,7 +1003,6 @@ Node that writes rows into a comma separated values (CSV) file.
971 1003 * resource: target object - might be a filename or file-like object
972 1004 * write_headers: write field names as headers into output file
973 1005 * truncate: remove data from file before writing, default: True
974   -
975 1006
976 1007
977 1008 .. list-table:: Attributes
@@ -1232,7 +1263,7 @@ Data Stream Target
1232 1263
1233 1264 **Identifier:** stream_target (class: :class:`brewery.nodes.StreamTargetNode`)
1234 1265
1235   -Generic data stream target. Wraps a :mod:`brewery.ds` data target and feeds data from the
  1266 +Generic data stream target. Wraps a :mod:`brewery.ds` data target and feeds data from the
1236 1267 input to the target stream.
1237 1268
1238 1269 The data target should match stream fields.
30 doc/stores.rst
Source Rendered
@@ -42,21 +42,21 @@ information see :mod:`metadata` where you can find more information.
42 42 Data Sources
43 43 ------------
44 44
45   -============= ========================================== ============================
46   -Data source Description Dataset reference
47   -============= ========================================== ============================
48   -csv Comma separated values (CSV) file/URI file path, file-like object,
49   - resource URL
  45 +============== ========================================== ============================
  46 +Data source Description Dataset reference
  47 +============== ========================================== ============================
  48 +csv Comma separated values (CSV) file/URI file path, file-like object,
  49 + resource URL
50 50
51   -xls MS Excel spreadsheet file path, URL
52   -gdoc Google Spreadsheet spreadsheet key or name
53   -sql Relational database table connection + table name
54   -mongodb MongoDB database collection connection + table name
55   -yamldir Directory containing yaml files directory
56   - - one file per record
57   -jsondir Directory containing json files directory
58   - - one file per record (not yet)
59   -============= ========================================== ============================
  51 +xls MS Excel spreadsheet file path, URL
  52 +gdoc Google Spreadsheet spreadsheet key or name
  53 +sql Relational database table connection + table name
  54 +mongodb MongoDB database collection connection + table name
  55 +yamldir Directory containing yaml files directory
  56 + - one file per record
  57 +elasticsearch Elastic Search – Open Source, Distributed,
  58 + RESTful, Search Engine
  59 +============== ========================================== ============================
60 60
61 61 Data sources should implement:
62 62
@@ -79,6 +79,8 @@ yamldir Directory containing yaml files - one file per record
79 79 jsondir Directory containing json files - one file per record
80 80 (not yet)
81 81 html HTML file or a string target
  82 +elasticsearch Elastic Search – Open Source, Distributed,
  83 + RESTful, Search Engine
82 84 ==================== ======================================================
83 85
84 86 Data targets should implement:

0 comments on commit 2f602fc

Please sign in to comment.
Something went wrong with that request. Please try again.