Skip to content

Frequency Transformer

Martin Danielsson edited this page Jul 30, 2015 · 2 revisions

Calculating Source Value Frequencies

Using the Frequency Transform from the statistics plugin, you can calculate value frequencies for arbitrary expressions (parameters).

In contrast to e.g. the SAP Transformer, this transform does not render any output fields, but rather outputs the results into an additional CSV file.

In order to define a frequency transforma, define it as follows:

<SourceTransform>
  <Transform config="[output config]">stats://frequency</Transform>
  <Parameters>
    <Parameter name="[param name]">[expression]</Parameter>
    ...
  </Parameters>
  <Settings>
    <Setting name="target">file://[path to target CSV]</Setting>
  </Settings>
</SourceTransform>

For a definition of output config, see the CSV Writer which is leveraged internally for writing the output CSV file. Note that the Frequency Plugin always omits the headers when writing the result.

The output CSV file is written into the file given in path to target CSV. Currently, only writing to files, and only to CSV files, is supported. If you try to pass anything else than file:// as a prefix, the transform will throw an error. Also, the suffix is ignore; even if you pass .xml, an CSV will be written.

In the parameters, you can specify for which expressions a frequency analysis should be performed. This can be any kind of expression which NFT is able to parse. In many cases, you just pass source fields (e.g. $LastName) here, but in some cases you may also want to use boolean expressions or concatenated fields.

In case the expression sometimes evaluates to an empty string, the frequency transform will insert an extra value (empty) which is also included in the frequency analysis.

Examples: <Parameter name="HasFirstName">Not(IsEmpty($FirstName))</Parameter> calculates whether a value is present in the field FirstName or not. This will render an output like this:

HasFirstName
Value;Frequency
false;29923
true;662886

<Parameter name="Country">$Country</Parameter> will create a histogram of the field value in Country; example output:

Country
Value;Frequency
Germany;129
France;38
UK;1159
Sweden;98
USA;2278

See also:

  • There is an example in the Nil Writer documentation
Clone this wiki locally