# Grouping and Aggregating with Pandas:
<h1>Aggregation in Pandas:</h1>
<P>Aggregation means applying a mathematical function to summarize data. The function used for aggregation is <b>agg()</b> the parameter is the function we want to perform.<br>
Some functions used in the aggregation are:</P>
<p>Function	 :  Description<br>
<b>sum():</b>	Compute sum of column values<br>
<b>min():</b>	Compute min of column values<br>
<b>max():</b>	Compute max of column values<br>
<b>mean():</b>	Compute mean of column<br>
<b>size():</b>	Compute column sizes<br>
<b>describe():</b>	Generates descriptive statistics<br>
<b>first():</b>	Compute first of group values<br>
<b>last():</b>	Compute last of group values<br>
<b>count():</b>	Compute count of column values<br>
<b>std():</b>	Standard deviation of column<br>
<b>var():</b>	Compute variance of column<br>
<b>sem():</b>	Standard error of the mean of column</p>

In [1]:
import pandas as pd

df = pd.DataFrame([[9, 4, 8, 9],
                   [8, 10, 7, 6],
                   [7, 6, 8, 5]],
                  columns=['Maths',  'English', 
                           'Science', 'History'])

print(df)

   Maths  English  Science  History
0      9        4        8        9
1      8       10        7        6
2      7        6        8        5


<b>Applying Multiple Aggregations at Once (agg()):</b>

In [3]:
print(df.agg(["sum","min","max","mean"]))

      Maths    English    Science    History
sum    24.0  20.000000  23.000000  20.000000
min     7.0   4.000000   7.000000   5.000000
max     9.0  10.000000   8.000000   9.000000
mean    8.0   6.666667   7.666667   6.666667


<h1>Grouping in Pandas:</h1>
Grouping in Pandas means organizing your data into groups based on some columns.<br>
This method follows a <b>split-apply-combine</b> process:<br>
<p><b>Split</b> the data into groups.<br>
<b>Apply</b> some calculation like sum, average etc.<br>
<b>Combine</b> the results into a new table.</p>

In [5]:
import pandas as pd

data = {
    'Item': ['Cake', 'Cake', 'Bread', 'Pastry', 'Cake'],
    'Flavor': ['Chocolate', 'Vanilla', 'Whole Wheat', 'Strawberry', 'Chocolate'],
    'Price': [250, 220, 80, 120, 250]
}

df2 = pd.DataFrame(data)
print(df2)

     Item       Flavor  Price
0    Cake    Chocolate    250
1    Cake      Vanilla    220
2   Bread  Whole Wheat     80
3  Pastry   Strawberry    120
4    Cake    Chocolate    250


<h3>1. Grouping Data by One Column Using groupby():</h3>

In [7]:
grouped = df2.groupby("Item")
print(grouped)

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x00000243D045B830>


<P>This doesn't show the result directly it just creates a grouped object. To actually see the data we need to apply a method like .sum(), .mean() or first().</P>

In [8]:
# print the sum of item:
print(df2.groupby("Item")["Price"].sum())

Item
Bread      80
Cake      720
Pastry    120
Name: Price, dtype: int64


<h3>2. Grouping by Multiple Columns:</h3>

In [9]:
# get the sum of Item and Flavor:
print(df2.groupby(["Item","Flavor"])["Price"].sum())

Item    Flavor     
Bread   Whole Wheat     80
Cake    Chocolate      500
        Vanilla        220
Pastry  Strawberry     120
Name: Price, dtype: int64
