Permalink
Branch: master
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
83 lines (56 sloc) 3.08 KB

Transforms

Transforming Data

For datasource transforms like SourceData/SinkData, see :ref:`data_sources_link`. For the Expression transform , see :ref:`expression_transform_link`.

Column selection/dropping

  • SelectColumns(cols)
    • Selects columns from the table and drops everything else.
  • DropColumns(cols)
    • Drops columns from the table.

Working with nulls / bad values

  • BlanksToNones(cols)
    • Changes whitespace strings into None.
  • ProcessNones(cols, action)
    • Tests for None equivalents (NaT, nan, None, etc), and performs the action, either drop (drop the whole row) or zero (set the value to zero or blank).
  • StringToNumber(cols)
    • Converts all strings in the columns to floats.
  • DropDuplicates(cols, ignore_nones=False)
    • Uniquifies a column by values, by dropping all other rows.
    • Default setting collapses None/NULL/NaN records into one. Set ignore_nones=True to preserve all None records.

Distributions

  • DropNumericalOutliers(cols, drop_threshold_mult)
    • Drops numerical outliers outside drop_threshold_mult times the standard deviations from the average.

Working with dates

  • CanonicalizeDate(cols, format=None)
    • Converts all string and datetime-string columns into Python datetime columns.
    • You can optionally specify a format which will be passed to datetime.datetime.strptime, which can be substantially faster.
  • CombineDateAndTimeFragments(col_date, col_time, col_new, consume_originals)
    • Combines a date column and a time column into a new column col_new.
    • Setting consume_originals=True will drop the original columns.

Aggregations

  • CombineTables(merged_table, [tables], [relations])
    • Combine tables into merged_table. The tables will be joined on the first element of relations.
    • Only supports one relation right now and one column on each side, however.
    • Both tables and relations arguments have to each be a list.
  • NormalizeJson(cols)
    • Flattens JSON objects in a column and creates a new column/row for every key/value.
    • New column names will be assigned by using the original column name as a prefix and the JSON key as the suffix.
    • For example, column 'A' has object {"foo": 1, "bar": 2}. This will create two columns named A.foo and A.bar with values 1 and 2, respectively.
  • Expression(expr)

In future versions Datamode will have the ability to specify your own transforms or functions that can be run for multiple data levels (cell/row/column/table/multitable).