### group by

In [1]:
import pandas as pd

In [2]:
df_books = pd.read_csv('/work/bestsellers.csv', sep=',', header=0)
df_books.head(5)

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction


groupby() allows you to group data in a DataFrame by one or more variables, and apply aggregation functions to each group.

In [10]:
df_books.groupby('Author').count().head(3)

Unnamed: 0_level_0,Name,User Rating,Reviews,Price,Year,Genre
Author,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abraham Verghese,2,2,2,2,2,2
Adam Gasiewski,1,1,1,1,1,1
Adam Mansbach,1,1,1,1,1,1


In [9]:
df_books.groupby('Author').median().head(3)

Unnamed: 0_level_0,User Rating,Reviews,Price,Year
Author,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Abraham Verghese,4.6,4866.0,11.0,2010.5
Adam Gasiewski,4.4,3113.0,6.0,2017.0
Adam Mansbach,4.8,9568.0,9.0,2011.0


Groups the data by author and computes the sum of the row, and then returns the row corresponding to William Davis.

In [5]:
df_books.groupby('Author').sum().loc['William Davis']

User Rating        8.8
Reviews        14994.0
Price             12.0
Year            4025.0
Name: William Davis, dtype: float64

reset_index() is a method that is used to reset the index of a DataFrame to a default index starting from 0. When a DataFrame is created or manipulated, it is assigned a unique index for each row. 

In [6]:
df_books.groupby('Author').sum().reset_index().head(3)

Unnamed: 0,Author,User Rating,Reviews,Price,Year
0,Abraham Verghese,9.2,9732,22,4021
1,Adam Gasiewski,4.4,3113,6,2017
2,Adam Mansbach,4.8,9568,9,2011


agg() method is used to apply one or more functions to each group of the DataFrame. In this case, the "min" and "max" functions are being applied to each group of the DataFrame for all columns.

In [7]:
df_books.groupby('Author').agg(['min','max'])

Unnamed: 0_level_0,Name,Name,User Rating,User Rating,Reviews,Reviews,Price,Price,Year,Year,Genre,Genre
Unnamed: 0_level_1,min,max,min,max,min,max,min,max,min,max,min,max
Author,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
Abraham Verghese,Cutting for Stone,Cutting for Stone,4.6,4.6,4866,4866,11,11,2010,2011,Fiction,Fiction
Adam Gasiewski,Milk and Vine: Inspirational Quotes From Class...,Milk and Vine: Inspirational Quotes From Class...,4.4,4.4,3113,3113,6,6,2017,2017,Non Fiction,Non Fiction
Adam Mansbach,Go the F**k to Sleep,Go the F**k to Sleep,4.8,4.8,9568,9568,9,9,2011,2011,Fiction,Fiction
Adir Levy,What Should Danny Do? (The Power to Choose Ser...,What Should Danny Do? (The Power to Choose Ser...,4.8,4.8,8170,8170,13,13,2019,2019,Fiction,Fiction
Admiral William H. McRaven,Make Your Bed: Little Things That Can Change Y...,Make Your Bed: Little Things That Can Change Y...,4.7,4.7,10199,10199,11,11,2017,2017,Non Fiction,Non Fiction
...,...,...,...,...,...,...,...,...,...,...,...,...
Walter Isaacson,Leonardo da Vinci,Steve Jobs,4.5,4.6,3014,7827,20,21,2011,2017,Non Fiction,Non Fiction
William Davis,"Wheat Belly: Lose the Wheat, Lose the Weight, ...","Wheat Belly: Lose the Wheat, Lose the Weight, ...",4.4,4.4,7497,7497,6,6,2012,2013,Non Fiction,Non Fiction
William P. Young,The Shack: Where Tragedy Confronts Eternity,The Shack: Where Tragedy Confronts Eternity,4.6,4.6,19720,19720,8,8,2009,2017,Fiction,Fiction
Wizards RPG Team,Player's Handbook (Dungeons & Dragons),Player's Handbook (Dungeons & Dragons),4.8,4.8,16990,16990,27,27,2017,2019,Fiction,Fiction


It applies three aggregation functions to three different columns of the grouped data:

agg{'Reviews':['min','max']}) applies the "min" and "max" functions to the "Reviews" column, which calculates the minimum and maximum values of the "Reviews" column for each group.

agg({'User Rating':'sum'}) applies the "sum" function to the "User Rating" column, which calculates the sum of the "User Rating" column for each group.

Therefore, the output of this code will be a new DataFrame with one row for each unique author in the "Author" column. The columns of the new DataFrame will be "Reviews_min", "Reviews_max", and "User Rating_sum". The "Reviews_min" and "Reviews_max" columns will show the minimum and maximum values of the "Reviews" column for each author, and the "User Rating_sum" column will show the sum of the "User Rating" column for each author.

In [8]:
df_books.groupby('Author').agg({'Reviews':['min','max'], 'User Rating':'sum'})

Unnamed: 0_level_0,Reviews,Reviews,User Rating
Unnamed: 0_level_1,min,max,sum
Author,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Abraham Verghese,4866,4866,9.2
Adam Gasiewski,3113,3113,4.4
Adam Mansbach,9568,9568,4.8
Adir Levy,8170,8170,4.8
Admiral William H. McRaven,10199,10199,4.7
...,...,...,...
Walter Isaacson,3014,7827,13.7
William Davis,7497,7497,8.8
William P. Young,19720,19720,9.2
Wizards RPG Team,16990,16990,14.4


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=8af9fde7-6d07-4638-8fef-c27d1b3023f8' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>