# Pivot
Copyright (c) Microsoft Corporation. All rights reserved.  
Licensed under the MIT License.

Azure ML Data Prep has the ability to perform Pivot on a Dataflow with the `pivot` transformation.

In [None]:
import azureml.dataprep as dprep

In [None]:
dflow = dprep.auto_read_file('../data/crime-full.csv')

To `pivot` a Dataflow, provide the `columns_to_pivot` and the `value_column`.

The columns to pivot are used to generate the new column names in the resulting Dataflow at design time. If multiple columns are selected, their respective values for each row will be concatenated to generate the new column names.

The value column is used to populate the new Dataflow with its values.

Then, provide a `summary_function` and `group_by_columns`, which are used to perform a summarization on the new Dataflow.

In the following example, pivot will generate a new Dataflow with a column for each value from the "Location Description" column in the original dataset. Then, it will populate these columns with the corresponding "Damage Cost". And finally, it will calculate the total sum for each column, and group them by "Arrest" and "Domestic".

In [None]:
dflow_pivoted = dflow.pivot(columns_to_pivot=['Location Description'],
                            value_column='Damage Cost',
                            summary_function=dprep.SummaryFunction.SUM,
                            group_by_columns=['Arrest', 'Domestic'])
dflow_pivoted.head(5)

Note that the new column names generated from `columns_to_pivot` are remembered in the Dataflow step. Also, if any error or null value are encountered when generating the new column names, `'ERROR'` and `'NULL'` will be used as default replacement strings. To overwrite the default replacement strings, specify them using `null_replacement_string` and `error_replacement_string`.

To have more control over the new column names generated from `columns_to_pivot`, create a builder using `Dataflow.builders.pivot`. The builder allows you to preview and modify the new column names before generating a new Dataflow.

In [None]:
builder = dflow.builders.pivot(columns_to_pivot=['Location Description'],
                               value_column='Damage Cost',
                               summary_function=dprep.SummaryFunction.MEAN,
                               group_by_columns=['Arrest', 'Domestic'])

To generate the new column names, call the `learn` method on the builder object.

In [None]:
builder.learn()

To preview the categorical labels, simply access them through the property `pivoted_columns` on the builder object.

In [None]:
builder.pivoted_columns

To modify the new column names, simply assign a new value to `pivoted_columns` or modify the existing one.

In [None]:
builder.pivoted_columns = builder.pivoted_columns[1:6]
builder.pivoted_columns

Once the desired results are achieved, call `builder.to_dataflow()` to get the new pivoted Dataflow.

In [None]:
dflow_pivoted = builder.to_dataflow()
dflow_pivoted.head(5)