# Groupping and aggregation

## Summary of `groupby` in Pandas

### Key Functions and Examples

1. **Grouping and Aggregation**
   - `carstocks.groupby("Symbol")["Close"].mean()`  
     Computes the mean of the "Close" column for each group identified by the "Symbol" column.

2. **Group Information**
   - `gbo = carstocks.groupby("Symbol")`  
     Creates a `GroupBy` object.
   - `gbo.ngroups`  
     Returns the number of groups.
   - `gbo.groups`  
     Returns a dictionary where the keys are group names, and the values are the indices of the rows in each group.

3. **Accessing Group Data**
   - `gbo.first()`  
     Returns the first row of each group.
   - `gbo.get_group("GM")`  
     Retrieves all rows belonging to the "GM" group.

4. **Iterating Over Groups**
   ```python
   for name, group in gbo:
       print(name)
       print("------------------")
       print(group)
    ```
5. **Common Aggregation Functions**
    - .mean(), .max(), .min()

Standard aggregation methods for numerical columns.

6. **Using agg for Multiple Aggregations**
- Example 1: Passing a list of functions.
    `gbo["Close"].agg(["min", "max", "mean"])`
- Example 2: Specifying different functions for different columns.
    `gbo.agg({"Close": ["min", "max"], "High": ["mean"]})`
7. **Custom Functions**
- You can pass your own function:
    `gbo["Close"].agg(range)`

8. **Renaming Columns in Aggregations**

```python
gbo.agg(
    min_open=("Open", "min"),
    max_open=("Open", "max"),
    min_close=("Close", "min"),
    max_close=("Close", "max")
)

In [3]:
import pandas as pd 
carstocks = pd.read_csv("../data/car_stocks.csv")

In [5]:
carstocks.groupby("Symbol")["Close"].mean()

Symbol
GM       62.164615
LCID     49.829231
RIVN    127.523077
Name: Close, dtype: float64

In [8]:
gbo = carstocks.groupby("Symbol")
gbo.ngroups

3

In [10]:
gbo.groups

{'GM': [26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38], 'LCID': [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], 'RIVN': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]}

In [12]:
gbo.first()

Unnamed: 0_level_0,Date,Open,High,Low,Close,Adj Close,Volume
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
GM,2021-11-10,57.849998,60.560001,57.73,59.27,59.27,22778600
LCID,2021-11-10,42.299999,45.0,39.341,40.75,40.75,79342800
RIVN,2021-11-10,106.75,119.459999,95.199997,100.730003,100.730003,103679500


In [13]:
gbo.get_group("GM")

Unnamed: 0,Symbol,Date,Open,High,Low,Close,Adj Close,Volume
26,GM,2021-11-10,57.849998,60.560001,57.73,59.27,59.27,22778600
27,GM,2021-11-11,59.82,62.150002,59.310001,61.82,61.82,29033900
28,GM,2021-11-12,61.59,64.029999,61.279999,63.400002,63.400002,31142800
29,GM,2021-11-15,63.66,63.73,62.630001,62.970001,62.970001,14372100
30,GM,2021-11-16,63.240002,63.279999,61.93,62.610001,62.610001,16115000
31,GM,2021-11-17,63.330002,65.07,62.380001,64.610001,64.610001,29983400
32,GM,2021-11-18,64.330002,65.18,62.09,62.330002,62.330002,23308100
33,GM,2021-11-19,62.48,62.970001,61.560001,61.799999,61.799999,19508500
34,GM,2021-11-22,61.950001,64.959999,61.759998,64.059998,64.059998,19836200
35,GM,2021-11-23,63.73,64.040001,62.259998,63.049999,63.049999,16225100


In [15]:
for name, group in gbo:
    print(name)
    print("------------------")
    print(group)
    print()

GM
------------------
   Symbol        Date       Open       High        Low      Close  Adj Close  \
26     GM  2021-11-10  57.849998  60.560001  57.730000  59.270000  59.270000   
27     GM  2021-11-11  59.820000  62.150002  59.310001  61.820000  61.820000   
28     GM  2021-11-12  61.590000  64.029999  61.279999  63.400002  63.400002   
29     GM  2021-11-15  63.660000  63.730000  62.630001  62.970001  62.970001   
30     GM  2021-11-16  63.240002  63.279999  61.930000  62.610001  62.610001   
31     GM  2021-11-17  63.330002  65.070000  62.380001  64.610001  64.610001   
32     GM  2021-11-18  64.330002  65.180000  62.090000  62.330002  62.330002   
33     GM  2021-11-19  62.480000  62.970001  61.560001  61.799999  61.799999   
34     GM  2021-11-22  61.950001  64.959999  61.759998  64.059998  64.059998   
35     GM  2021-11-23  63.730000  64.040001  62.259998  63.049999  63.049999   
36     GM  2021-11-24  62.299999  62.599998  61.630001  62.189999  62.189999   
37     GM  2021-11

In [19]:
gbo["Close"].agg(["min", "max", "mean"])

Unnamed: 0_level_0,min,max,mean
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
GM,59.27,64.610001,62.164615
LCID,40.75,55.52,49.829231
RIVN,100.730003,172.009995,127.523077


In [21]:
gbo.agg({"Close": ["min", "max"], "High": ["mean"]})

Unnamed: 0_level_0,Close,Close,High
Unnamed: 0_level_1,min,max,mean
Symbol,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
GM,59.27,64.610001,63.129231
LCID,40.75,55.52,51.811538
RIVN,100.730003,172.009995,135.30923


In [22]:
def range(x):
    return x.max() - x.min()
gbo["Close"].agg(range)

Symbol
GM       5.340001
LCID    14.770000
RIVN    71.279992
Name: Close, dtype: float64

In [24]:
gbo.agg(
    min_open=("Open", "min"),
    max_open=("Open", "max"),
    min_close=("Close", "min"),
    max_close=("Close", "max")
)

Unnamed: 0_level_0,min_open,max_open,min_close,max_close
Symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
GM,57.849998,64.330002,59.27,64.610001
LCID,42.299999,56.200001,40.75,55.52
RIVN,106.75,163.800003,100.730003,172.009995
