Grouped DataFrame structure grouping DataFrame rows by column value.
df
DataFrame The DataFrame to group by.columnNames
groups
hashes
columnName
String The column used for the group by.
Convert GroupedDataFrame into collection (Array) of dictionnaries (Object).
groupedDF.toCollection();
Returns Array An Array containing group: {groupKey, group}.
Display the GroupedDataFrame as String Table.
quiet
Boolean Quiet mode. If true, it doesn't trigger console.log(). (optional, defaultfalse
)
groupedDf.show()
Returns String The GroupedDataFrame as String Table.
List GroupedDataFrame groups.
gdf.listGroups()
Returns Array An Array containing GroupedDataFrame group names.
List GroupedDataFrame groups as a hashCode.
gdf.listHashCodes()
Returns Array An Array containing GroupedDataFrame hash codes.
Map on DataFrame groups.
func
Function The function to apply to each row of each group.
groupedDF.map((row,i) => row.set('b', row.get('a')*i));
Returns DataFrame A new DataFrame containing the result.
Filter a grouped DataFrame.
condition
Function A filter function or a column/value object.
groupedDF.filter((row,i) => (i === 0));
Returns DataFrame A new filtered DataFrame.
Chain maps and filters functions on DataFrame by optimizing their executions. If a function returns boolean, it's a filter. Else it's a map. It can be 10 - 100 x faster than standard chains of .map() and .filter().
funcs
...Function Functions to apply on the DataFrame rows taking the row as parameter.
groupedDF.chain(
(row, i) => (i === 0), // filter
row => row.set('column1', 3), // map
row => row.get('column2') === '5' // filter
)
Returns DataFrame A new DataFrame with modified rows.
Create an aggregation from a function.
func
Function The aggregation function.columnName
String The column name created by the aggregation. (optional, default'aggregation'
)
groupedDF.aggregate(group => group.stat.sum('column1'));
Returns DataFrame A new DataFrame with a column 'aggregation' containing the result.
Pivot a GroupedDataFrame.
columnToPivot
String The column which will be transposed as columns.func
Function The function to define each column value from a DataFrame. (optional, default(gdf)=>gdf.count()
)
df.groupBy('carType').pivot('carModel', values => values.stat.sum('kms'))
Returns DataFrame The pivot DataFrame.
Melt a DataFrame to make it tidy. It's the reverse of GroupedDataFrame.pivot().
variableColumnName
String The column name containing values. (optional, default'value'
)valueColumnName
(optional, default"value"
)
df.groupBy('carType').melt('kms')
Returns DataFrame The tidy DataFrame.