Pivot table horizontal grouping, is it supported? #97

hfazai · 2022-04-14T13:24:34Z

Thanks for creating this great library 👍 !

I follow this documentation to create a pivot table.

Is it possible to make an horizontal grouping like in this example (add gender to rows) ?

pivot { col1 then col2 } .groupBy { row1 and row2 } col1 and c2 are grouped hierarchically but row1 and row2 are not.

The text was updated successfully, but these errors were encountered:

koperagen · 2022-04-16T22:04:19Z

Hi! Can you provide a screenshot of the desired table and what you get with pivot { col1 then col2 } .groupBy { row1 and row2 }? What i see on the link:

Not sure if this is the desired table

hfazai · 2022-04-17T14:51:02Z

Hi,
With pivot { "Party" } .groupBy { "Province" and "Gender" }.sum("Age") I want to get:

Using dataframe I get Province and Gender pivoted independently :

Alberta, Female, ...
.....
Alberta, Male, ...

hfazai · 2022-04-18T12:37:00Z

@koperagen Maybe we need to have the function then in ColumnsSelectionDsl like in PivotDsl. This way, my expression become: pivot { "Party" } .groupBy { "Province" then "Gender" }.sum("Age")

koperagen · 2022-04-18T15:42:41Z

Thank you for the use case, it's very interesting! Maybe df.pivot { Gender then Party }.groupBy { Province }.sum { Age } does the thing? It yields a dataframe with this schema:

hfazai · 2022-04-19T09:12:31Z

If I have to group only two columns, df.pivot { Gender then Party }.groupBy { Province }.sum { Age } does the thing.
But actualy I'm using dataframe to add the Pivot Table feature to the framework Galite. It should allow grouping data by rows and by columns. How could I make the table below with dataframe?

I have gender and smoker grouped horizontally / day and time are grouped vertically.

koperagen · 2022-04-19T12:02:34Z

Now that you mention horizontal / vertical grouping, documentation states https://kotlin.github.io/dataframe/pivot.html

df.pivot { pivotColumns }.groupBy { indexColumns }
pivotColumns — columns with values for horizontal data grouping and generation of new columns
indexColumns — columns with values for vertical data grouping

According to I have gender and smoker grouped horizontally / day and time are grouped vertically. this code should work:
df.pivot { smoker then gender }.groupBy { day and time }
Does it work as you would expect?

Also, can you share this dataset if it's public?

hfazai · 2022-04-19T12:19:53Z

~~df.pivot { smoker then gender }.groupBy { day and time }~~ Gives:
Update: df.pivot { day then time }.groupBy { gender and smoker } Gives:

~~My actual workaround is to sort Values on the rows and try to span them df.pivot { smoker then gender }.groupBy { day and time }.sortBy { day and time } which gives:~~
Update: My actual workaround is to sort Values on the rows and try to span them df.pivot { day then time }.groupBy { gender and smoker }.sortBy { gender and smoker } which gives:

koperagen · 2022-04-19T12:42:48Z

I see the problem, yes. Can you provide the dataset so i can give it a try myself? It would be easier for me to figure it out in REPL :) Or we can pick other dataset

nikitinas · 2022-04-19T12:52:00Z

Hi, @hfazai. As far as I understand, you need hierarchical grouping of rows, so that df.groupBy { a then b } will return DataFrame with a single row for every distinct a value and every row will contain nested DataFrame grouped by b. Currently DataFrame supports nesting only for columns, but not for rows. It is a good feature request, thank you! We'll consider it

nikitinas · 2022-04-19T12:56:49Z

Currently this is the only way to perform something similar to nested rows:

hfazai · 2022-04-19T13:00:55Z

I see the problem, yes. Can you provide the dataset so i can give it a try myself? It would be easier for me to figure it out in REPL :) Or we can pick other dataset

Try: val df = DataFrame.read("https://pivottable.js.org/examples/mps.csv")

hfazai · 2022-04-19T13:03:27Z

Hi, @hfazai. As far as I understand, you need hierarchical grouping of rows, so that df.groupBy { a then b } will return DataFrame with a single row for every distinct a value and every row will contain nested DataFrame grouped by b. Currently DataFrame supports nesting only for columns, but not for rows. It is a good feature request, thank you! We'll consider it

Hi @nikitinas, yes exactly :)

hfazai · 2022-04-19T13:08:28Z

Currently this is the only way to perform something similar to nested rows:

I'll try it, thanks!
Actually I did a workaround with the sortBy { } and it works fine and it would be great to add this feature by adding then to ColumnsSelectionDsl so that we can call df.groupBy { a then b }

nikitinas · 2022-04-19T13:20:15Z

My actual workaround is to sort Values on the rows and try to span them df.pivot { smoker then gender }.groupBy { day and time }.sortBy { day and time } which gives:

I can't match this image with your code. According to attached screenshot, you do
df.groupBy { gender and smoker }.pivot { day then time }.sortBy { gender and smoker }
And you want df.groupBy { gender then smoker }.pivot { day and time } to have all rows with the same gender stored consequently (by default, without sorting) and also merge equal values in gender vertically. Is it correct?

hfazai · 2022-04-19T13:35:15Z

Yes, sorry I was wrong I copied the last code in my tests, I mean df.groupBy { gender and smoker }.pivot { day then time }.sortBy { gender and smoker }

nikitinas · 2022-04-19T13:59:02Z

Let's discuss implementation.

An obvious solution is to add GroupedDataRow that will store value for the first grouping key and nested DataFrame with other grouping keys. It will be similar to FrameColumn from my previous comment:

but will have a better presentation: nested DataFrames will be injected into the root table without expanders and group column.

This approach raises several questions:

How many rows should return df.rowsCount(): 2 or 4?
Can gender and party columns be addressed directly as df.gender? If yes, will they have size 4 while province have size 2? This will break current restriction that all columns in DataFrame should have equal size.
Should df.sortBy { gender } work across the whole table (breaking original row grouping) or sort only within row groups?
...

I think that fair implementation of row grouping may overcomplicate DataFrame usage. I assume that in most cases grouped rows are needed only for better visual representation, so we can just add some info into DataColumn API with span ranges indicating subsequent values that should be rendered in a single merged cell. So internally nothing will change and such "merged" column will behave just as any other column. What do you think about it?

nikitinas · 2022-04-19T14:02:33Z

And we also should add groupBy { a then b }.

hfazai · 2022-04-19T14:24:57Z

we can just add some info into DataColumn API with span ranges indicating subsequent values that should be rendered in a single merged cell. So internally nothing will change and such "merged" column will behave just as any other column. What do you think about it?

I agree with you concerns about changing the internal implementation. The solution of adding the additional informations about the row grouping looks good to me.
I think to add span ranges indicating subsequent values that should be rendered in a single merged cell, the groupBy { a then b } should sort the rows before.

nikitinas · 2022-04-19T17:40:03Z

I think that sorting should be performed explicitly, because in some cases original order of rows must be preserved. We can add .asc() and .desc() modifiers in column selector, similar to sortBy:
df.groupBy { a.asc() and b.desc() }

hfazai · 2022-04-20T15:03:16Z

Sounds good 👍

Jolanrensen added the enhancement New feature or request label Apr 25, 2023

Jolanrensen added this to the 0.11.0 milestone Apr 25, 2023

zaleslaw modified the milestones: 0.11.0, Backlog Jun 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pivot table horizontal grouping, is it supported? #97

Pivot table horizontal grouping, is it supported? #97

hfazai commented Apr 14, 2022 •

edited

Loading

koperagen commented Apr 16, 2022 •

edited

Loading

hfazai commented Apr 17, 2022

hfazai commented Apr 18, 2022 •

edited

Loading

koperagen commented Apr 18, 2022

hfazai commented Apr 19, 2022

koperagen commented Apr 19, 2022

hfazai commented Apr 19, 2022 •

edited

Loading

koperagen commented Apr 19, 2022

nikitinas commented Apr 19, 2022

nikitinas commented Apr 19, 2022

hfazai commented Apr 19, 2022

hfazai commented Apr 19, 2022

hfazai commented Apr 19, 2022

nikitinas commented Apr 19, 2022 •

edited

Loading

hfazai commented Apr 19, 2022 •

edited

Loading

nikitinas commented Apr 19, 2022 •

edited

Loading

nikitinas commented Apr 19, 2022

hfazai commented Apr 19, 2022

nikitinas commented Apr 19, 2022 •

edited

Loading

hfazai commented Apr 20, 2022

Pivot table horizontal grouping, is it supported? #97

Pivot table horizontal grouping, is it supported? #97

Comments

hfazai commented Apr 14, 2022 • edited Loading

koperagen commented Apr 16, 2022 • edited Loading

hfazai commented Apr 17, 2022

hfazai commented Apr 18, 2022 • edited Loading

koperagen commented Apr 18, 2022

hfazai commented Apr 19, 2022

koperagen commented Apr 19, 2022

hfazai commented Apr 19, 2022 • edited Loading

koperagen commented Apr 19, 2022

nikitinas commented Apr 19, 2022

nikitinas commented Apr 19, 2022

hfazai commented Apr 19, 2022

hfazai commented Apr 19, 2022

hfazai commented Apr 19, 2022

nikitinas commented Apr 19, 2022 • edited Loading

hfazai commented Apr 19, 2022 • edited Loading

nikitinas commented Apr 19, 2022 • edited Loading

nikitinas commented Apr 19, 2022

hfazai commented Apr 19, 2022

nikitinas commented Apr 19, 2022 • edited Loading

hfazai commented Apr 20, 2022

hfazai commented Apr 14, 2022 •

edited

Loading

koperagen commented Apr 16, 2022 •

edited

Loading

hfazai commented Apr 18, 2022 •

edited

Loading

hfazai commented Apr 19, 2022 •

edited

Loading

nikitinas commented Apr 19, 2022 •

edited

Loading

hfazai commented Apr 19, 2022 •

edited

Loading

nikitinas commented Apr 19, 2022 •

edited

Loading

nikitinas commented Apr 19, 2022 •

edited

Loading