# Grouping Data

Group by column: df.groupby(by="col")

Groups the DataFrame by the values in the specified column.

In [1]:
import pandas as pd

data = {
    'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
    'Value': [10, 20, 30, 40, 50, 60]
}
df = pd.DataFrame(data)
df

Unnamed: 0,Category,Value
0,A,10
1,B,20
2,A,30
3,B,40
4,A,50
5,B,60


In [2]:
grouped_by_category = df.groupby(by="Category")
print(grouped_by_category.sum())


          Value
Category       
A            90
B           120


### Summary Functions

**size()**: Returns the size of each group.

In [3]:
print(grouped_by_category.size())


Category
A    3
B    3
dtype: int64


**agg(function):** Aggregates group using a specified function.

In [4]:
print(grouped_by_category.agg("mean"))


          Value
Category       
A          30.0
B          40.0


### Shift Function 

**shift(1)**: Shifts the values in the group by 1.

In [5]:
print(grouped_by_category['Value'].shift(1))


0     NaN
1     NaN
2    10.0
3    20.0
4    30.0
5    40.0
Name: Value, dtype: float64


**shift(-1):** Lags the values in the group by 1.

In [6]:
print(grouped_by_category['Value'].shift(-1))

0    30.0
1    40.0
2    50.0
3    60.0
4     NaN
5     NaN
Name: Value, dtype: float64


### Rank Functions

**rank(method='dense')**: Ranks with no gaps between ranks. 

In [7]:
print(grouped_by_category['Value'].rank(method='dense'))

0    1.0
1    1.0
2    2.0
3    2.0
4    3.0
5    3.0
Name: Value, dtype: float64


**rank(method='min')**: Ranks, ties get the minimum rank.

In [8]:
print(grouped_by_category['Value'].rank(method='min'))


0    1.0
1    1.0
2    2.0
3    2.0
4    3.0
5    3.0
Name: Value, dtype: float64


**rank(method='first')**: Ranks, ties go to the first value encountered.

In [9]:
print(grouped_by_category['Value'].rank(method='first'))

0    1.0
1    1.0
2    2.0
3    2.0
4    3.0
5    3.0
Name: Value, dtype: float64


**rank(pct=True)**: Ranks rescaled to interval [0, 1]

In [10]:
print(grouped_by_category['Value'].rank(pct=True))


0    0.333333
1    0.333333
2    0.666667
3    0.666667
4    1.000000
5    1.000000
Name: Value, dtype: float64


###  Cumulative Functions

**cumsum()**: Cumulative sum of the values in the group.

In [11]:
print(grouped_by_category['Value'].cumsum())

0     10
1     20
2     40
3     60
4     90
5    120
Name: Value, dtype: int64


**cummax()**: Cumulative maximum of the values in the group.

In [12]:
print(grouped_by_category['Value'].cummax())

0    10
1    20
2    30
3    40
4    50
5    60
Name: Value, dtype: int64


**cummin()**: Cumulative minimum of the values in the group.

In [13]:
print(grouped_by_category['Value'].cummin())

0    10
1    20
2    10
3    20
4    10
5    20
Name: Value, dtype: int64


**cumprod()**: Cumulative product of the values in the group.

In [14]:
print(grouped_by_category['Value'].cumprod())

0       10
1       20
2      300
3      800
4    15000
5    48000
Name: Value, dtype: int64
