# Aggregate Methods 

# A. Aggregations in Series Object

### `agg()`

#### üìå Strings accepted by `.agg()` in Pandas (and their functionality)

| Method                          | Description                                                                                        |
| ------------------------------- | -------------------------------------------------------------------------------------------------- |
| **'all'**                       | Returns `True` if every value is truthy.                                                           |
| **'any'**                       | Returns `True` if any value is truthy.                                                             |
| **'autocorr'**                  | Pearson correlation of series with a shifted version of itself. Default `lag=1`, but can override. |
| **'corr'**                      | Pearson correlation of series with another series. Need to specify `other`.                        |
| **'count'**                     | Count of non-missing values.                                                                       |
| **'cov'**                       | Covariance of series with another series. Need to specify `other`.                                 |
| **'dtype' / 'dtypes'**          | Data type of the series.                                                                           |
| **'empty'**                     | `True` if the series has no values.                                                                |
| **'hasnans'**                   | `True` if series contains missing values (`NaN`).                                                  |
| **'idxmax'**                    | Index label of the maximum value.                                                                  |
| **'idxmin'**                    | Index label of the minimum value.                                                                  |
| **'is\_monotonic\_decreasing'** | `True` if values always decrease.                                                                  |
| **'is\_monotonic\_increasing'** | `True` if values always increase.                                                                  |
| **'kurt'**                      | Excess kurtosis (0 for normal distribution; >0 ‚Üí more outliers).                                   |
| **'max'**                       | Maximum value.                                                                                     |
| **'mean'**                      | Mean value.                                                                                        |
| **'median'**                    | Median value.                                                                                      |
| **'min'**                       | Minimum value.                                                                                     |
| **'nbytes'**                    | Number of bytes of the series data.                                                                |
| **'ndim'**                      | Number of dimensions (always `1` for a Series).                                                    |
| **'nunique'**                   | Number of unique values.                                                                           |
| **'quantile'**                  | Quantile value (defaults to median if no `q` specified).                                           |
| **'sem'**                       | Standard error of the mean.                                                                        |
| **'size'**                      | Size of the series (number of elements).                                                           |
| **'skew'**                      | Unbiased skewness (negative = left tail, positive = right tail).                                   |
| **'std'**                       | Standard deviation.                                                                                |
| **'sum'**                       | Sum of values.                                                                                     |


In [1]:
 # import the libraries 
 import pandas as pd 
 import numpy as np

In [2]:
s = pd.Series([1, 2, 3, 4, 5, None])

# Single aggregation
print(s.agg('mean'))   # 3.0

# Multiple aggregations
print(s.agg(['sum', 'count', 'nunique']))
# sum        15.0
# count       5.0
# nunique     5.0

# With parameters (quantile)
print(s.agg('quantile', q=0.75))   # 4.0

3.0
sum        15.0
count       5.0
nunique     5.0
dtype: float64
4.0


In [5]:
# Sample Series
s = pd.Series([1, 2, 3, 4, 5, np.nan])

print("Series:")
print(s)
print("-" * 50)

# Apply all aggregations
print("all:", s.agg("all"))  # all values truthy?
print("any:", s.agg("any"))  # any value truthy?

print("autocorr:", s.agg("autocorr", lag=1))  # autocorrelation with lag=1
print("corr:", s.agg("corr", other=pd.Series([1, 2, 1, 2, 1, 2])))  # correlation with another series
print("count:", s.agg("count"))  # count non-missing
print("cov:", s.agg("cov", other=pd.Series([2, 4, 6, 8, 10, 12])))  # covariance with another series

print("dtype:", s.agg("dtype"))
print("dtypes:", s.agg("dtypes"))
print("empty:", s.agg("empty"))
print("hasnans:", s.agg("hasnans"))

print("idxmax:", s.agg("idxmax"))  # index of max value
print("idxmin:", s.agg("idxmin"))  # index of min value

print("is_monotonic_increasing:", s.agg("is_monotonic_increasing"))
print("is_monotonic_decreasing:", s.agg("is_monotonic_decreasing"))

print("kurt:", s.agg("kurt"))  # kurtosis
print("max:", s.agg("max"))
print("mean:", s.agg("mean"))
print("median:", s.agg("median"))
print("min:", s.agg("min"))

print("nbytes:", s.agg("nbytes"))
print("ndim:", s.agg("ndim"))
print("nunique:", s.agg("nunique"))
print("quantile (0.25):", s.agg("quantile", q=0.25))

print("sem:", s.agg("sem"))  # standard error
print("size:", s.agg("size"))
print("skew:", s.agg("skew"))
print("std:", s.agg("std"))
print("sum:", s.agg("sum"))

Series:
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
5    NaN
dtype: float64
--------------------------------------------------
all: True
any: True
autocorr: 1.0
corr: 0.0
count: 5
cov: 5.0
dtype: float64
dtypes: float64
empty: False
hasnans: True
idxmax: 4
idxmin: 0
is_monotonic_increasing: False
is_monotonic_decreasing: False
kurt: -1.2000000000000002
max: 5.0
mean: 3.0
median: 3.0
min: 1.0
nbytes: 48
ndim: 1
nunique: 5
quantile (0.25): 2.0
sem: 0.7071067811865476
size: 6
skew: 0.0
std: 1.5811388300841898
sum: 15.0


### Other Series aggregation methods and properties
Let us now list **Series aggregation methods and properties** (the direct method calls like `s.mean()` and properties like `s.size`) ‚Äî these are the more explicit equivalents of what you were doing earlier with `.agg("mean")`.

Let me turn this into a well-organized reference table and then give you a code demo that uses all of them.

#### üìå Pandas Series Aggregation Methods & Properties

| Method / Property                                        | Description                                                                                     |
| -------------------------------------------------------- | ----------------------------------------------------------------------------------------------- |
| `s.agg(func=None, axis=0, *args, **kwargs)`              | Applies one or multiple aggregation functions. Returns scalar for single func, Series for list. |
| `s.all(axis=0, bool_only=None, skipna=True, level=None)` | `True` if **every value** is truthy.                                                            |
| `s.any(axis=0, bool_only=None, skipna=True, level=None)` | `True` if **at least one value** is truthy.                                                     |
| `s.autocorr(lag=1)`                                      | Pearson correlation between `s` and shifted `s` with lag.                                       |
| `s.corr(other, method='pearson')`                        | Correlation with another Series (`pearson`, `spearman`, `kendall`, or callable).                |
| `s.cov(other, min_periods=None)`                         | Covariance with another Series.                                                                 |
| `s.max(...)`                                             | Maximum value.                                                                                  |
| `s.min(...)`                                             | Minimum value.                                                                                  |
| `s.mean(...)`                                            | Mean value.                                                                                     |
| `s.median(...)`                                          | Median value.                                                                                   |
| `s.prod(...)`                                            | Product of values.                                                                              |
| `s.quantile(q=0.5, interpolation='linear')`              | Quantile (default = median).                                                                    |
| `s.sem(...)`                                             | Standard error of mean.                                                                         |
| `s.std(...)`                                             | Standard deviation.                                                                             |
| `s.var(...)`                                             | Variance.                                                                                       |
| `s.skew(...)`                                            | Skewness.                                                                                       |
| `s.kurtosis(...)` (alias `s.kurt`)                       | Kurtosis.                                                                                       |
| `s.nunique(dropna=True)`                                 | Count of unique items.                                                                          |
| `s.count(...)`                                           | Count of non-missing values.                                                                    |
| `s.size` (property)                                      | Number of items (including NaNs).                                                               |
| `s.is_unique` (property)                                 | `True` if all values are unique.                                                                |
| `s.is_monotonic_increasing` (property)                   | `True` if values are strictly non-decreasing.                                                   |
| `s.is_monotonic_decreasing` (property)                   | `True` if values are strictly non-increasing.                                                   |


In [7]:
# Sample Series
s = pd.Series([1, 2, 3, 4, 5, np.nan])

print("Series:")
print(s)
print("-" * 50)

# Using .agg directly
print("agg (mean):", s.agg("mean"))
print("agg (['min','max']):")
print(s.agg(["min", "max"]))
print("-" * 50)

# Boolean checks
print("all:", s.all())
print("any:", s.any())

# Correlations & covariance
print("autocorr:", s.autocorr(lag=1))
print("corr (with [1,2,3,4,5,6]):", s.corr(pd.Series([1,2,3,4,5,6])))
print("cov (with [2,4,6,8,10,12]):", s.cov(pd.Series([2,4,6,8,10,12])))

# Basic stats
print("max:", s.max())
print("min:", s.min())
print("mean:", s.mean())
print("median:", s.median())
print("prod:", s.prod(skipna=True))
print("quantile (0.75):", s.quantile(0.75))
print("sem:", s.sem())
print("std:", s.std())
print("var:", s.var())
print("skew:", s.skew())
print("kurtosis:", s.kurtosis())

# Counts
print("nunique:", s.nunique())
print("count (non-missing):", s.count())
print("size (all incl NaN):", s.size)

# Properties
print("is_unique:", s.is_unique)
print("is_monotonic_increasing:", s.is_monotonic_increasing)
print("is_monotonic_decreasing:", s.is_monotonic_decreasing)

Series:
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
5    NaN
dtype: float64
--------------------------------------------------
agg (mean): 3.0
agg (['min','max']):
min    1.0
max    5.0
dtype: float64
--------------------------------------------------
all: True
any: True
autocorr: 1.0
corr (with [1,2,3,4,5,6]): 0.9999999999999999
cov (with [2,4,6,8,10,12]): 5.0
max: 5.0
min: 1.0
mean: 3.0
median: 3.0
prod: 120.0
quantile (0.75): 4.0
sem: 0.7071067811865476
std: 1.5811388300841898
var: 2.5
skew: 0.0
kurtosis: -1.2000000000000002
nunique: 5
count (non-missing): 5
size (all incl NaN): 6
is_unique: True
is_monotonic_increasing: False
is_monotonic_decreasing: False


---

# B. Looping & Aggregations on DataFrame Object 

#### üìò DataFrame Methods with Axis Importance
| Method                                                                     | Description                                                                  | Axis Importance                                                                                                                              |
| -------------------------------------------------------------------------- | ---------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- |
| **`items()`**                                                          | Iterates over **(column\_name, Series)** pairs.                              | No axis (always column-wise).                                                                                                                |
| **`iterrows()`**                                                           | Iterates over **(index, Series)** pairs (row-wise). Type info not preserved. | No axis (always row-wise).                                                                                                                   |
| **`itertuples(index=True, name="Pandas")`**                                | Iterates over rows as **namedtuples**. Faster than `iterrows`.               | No axis (always row-wise).                                                                                                                   |
| **`sum(axis=0, skipna=True, level=None, numeric_only=None, min_count=0)`** | Returns sum.                                                                 | `axis=0` ‚Üí sum **down columns** (per column, across rows). <br> `axis=1` ‚Üí sum **across rows** (per row, across columns).                    |
| **`min(axis=0, ‚Ä¶)`**                                                       | Minimum value.                                                               | Same axis rule: `axis=0` ‚Üí per column, `axis=1` ‚Üí per row.                                                                                   |
| **`max(axis=0, ‚Ä¶)`**                                                       | Maximum value.                                                               | Same as above.                                                                                                                               |
| **`idxmin(axis=0, ‚Ä¶)`**                                                    | Index (row/col label) of the **first minimum**.                              | `axis=0` ‚Üí row label of min in each column. <br> `axis=1` ‚Üí column label of min in each row.                                                 |
| **`idxmax(axis=0, ‚Ä¶)`**                                                    | Index of the **first maximum**.                                              | Same as above.                                                                                                                               |
| **`agg(func=None, axis=0, ‚Ä¶)`**                                            | Aggregate using function(s). Supports string, list, dict.                    | `axis=0` ‚Üí aggregate each column. <br> `axis=1` ‚Üí aggregate each row.                                                                        |
| **`describe(percentiles=[.25,.5,.75], ‚Ä¶)`**                                | Summary statistics of DataFrame.                                             | `axis=0` (default) ‚Üí stats per column.                                                                         |
| **`apply(func=None, axis=0, ‚Ä¶)`**                                          | Apply custom function.                                                       | `axis=0` ‚Üí function applied **column-wise** (each column is a Series). <br> `axis=1` ‚Üí function applied **row-wise** (each row is a Series). |
| **`np.select(condlist, choicelist, default=0)`**                           | Simulates an **if-else** chain.                                              | Axis doesn‚Äôt apply ‚Äì operates on boolean conditions you pass.                                                                                |



#### üîë Axis Summary
- `axis=0` ‚Üí operate down columns (each function applied column-wise).
- `axis=1` ‚Üí operate across rows (each function applied row-wise).

In [None]:
# Sample DataFrame
df = pd.DataFrame({
    "A": [1, 2, 3, 4],
    "B": [10, 20, 30, 40],
    "C": [100, 200, 300, 400]
}, index=["row1", "row2", "row3", "row4"])

print("Original DataFrame:\n", df, "\n")

# 1. Iteration methods
print("---- Iteration Methods ----")
for col_name, col_data in df.items():
    print(f"items ‚Üí Column: {col_name}, Values:\n{col_data}\n")
    print(col_data.values) # Extracting the data of each column as an array

for idx, row in df.iterrows():
    print(f"iterrows ‚Üí Index: {idx}, Row:\n{row}\n")

for row in df.itertuples():
    # faster than iterrows()
    print("itertuples ‚Üí", row)
    print(row[1:]) # Extracting the rows as tuples 
print()

# 2. Aggregations with axis
print("---- Sum ----")
print("Sum over columns (axis=0):\n", df.sum(axis=0))   # per column
print("Sum over rows (axis=1):\n", df.sum(axis=1), "\n") # per row

print("---- Min ----")
print("Min over columns (axis=0):\n", df.min(axis=0))
print("Min over rows (axis=1):\n", df.min(axis=1), "\n")

print("---- Max ----")
print("Max over columns (axis=0):\n", df.max(axis=0))
print("Max over rows (axis=1):\n", df.max(axis=1), "\n")

print("---- idxmin ----")
print("Index of min per column:\n", df.idxmin(axis=0))
print("Column of min per row:\n", df.idxmin(axis=1), "\n")

print("---- idxmax ----")
print("Index of max per column:\n", df.idxmax(axis=0))
print("Column of max per row:\n", df.idxmax(axis=1), "\n")

# 3. agg with different functions
print("---- Aggregate ----")
print("Mean and Sum per column:\n", df.agg(["mean", "sum"], axis=0), "\n")
print("Mean and Sum per row:\n", df.agg(["mean", "sum"], axis=1), "\n")

# 4. Describe
print("---- Describe ----")
print("Describe:\n", df.describe(), "\n")

# 5. Apply custom function
print("---- Apply ----")
print("Apply column-wise (mean of each col):\n", df.apply(np.mean, axis=0), "\n")
print("Apply row-wise (sum of each row):\n", df.apply(np.sum, axis=1), "\n")

# 6. np.select (simulate if-else)
print("---- np.select ----")
conditions = [
    df["A"] < 2,
    df["A"] > 3
]
choices = ["Less than 2", "Greater than 3"]
df["A_category"] = np.select(conditions, choices, default="Between 2 and 3")
print(df, "\n")

Original DataFrame:
       A   B    C
row1  1  10  100
row2  2  20  200
row3  3  30  300
row4  4  40  400 

---- Iteration Methods ----
items ‚Üí Column: A, Values:
row1    1
row2    2
row3    3
row4    4
Name: A, dtype: int64

[1 2 3 4]
items ‚Üí Column: B, Values:
row1    10
row2    20
row3    30
row4    40
Name: B, dtype: int64

[10 20 30 40]
items ‚Üí Column: C, Values:
row1    100
row2    200
row3    300
row4    400
Name: C, dtype: int64

[100 200 300 400]
iterrows ‚Üí Index: row1, Row:
A      1
B     10
C    100
Name: row1, dtype: int64

iterrows ‚Üí Index: row2, Row:
A      2
B     20
C    200
Name: row2, dtype: int64

iterrows ‚Üí Index: row3, Row:
A      3
B     30
C    300
Name: row3, dtype: int64

iterrows ‚Üí Index: row4, Row:
A      4
B     40
C    400
Name: row4, dtype: int64

itertuples ‚Üí Pandas(Index='row1', A=1, B=10, C=100)
(1, 10, 100)
itertuples ‚Üí Pandas(Index='row2', A=2, B=20, C=200)
(2, 20, 200)
itertuples ‚Üí Pandas(Index='row3', A=3, B=30, C=300)
(3, 30, 30

### `np.select()`
Let us take some more examples to understand how `np.select` behaves! 

#### üîπ What is `np.select`?

`numpy.select` is a function that lets you **apply multiple conditions** on an array and choose **different outputs** depending on which condition is satisfied.
Think of it as a **vectorized if-elif-else** for NumPy arrays.


#### üîπ Syntax

```python
np.select(condlist, choicelist, default=0)
```

* **`condlist`** ‚Üí list of boolean arrays (conditions).
* **`choicelist`** ‚Üí list of results, same length as `condlist`.
* **`default`** ‚Üí value used when no condition is satisfied (optional, default = 0).


#### üîπ How it works

* It checks each condition in `condlist` **in order**.
* When a condition is `True`, it assigns the corresponding value from `choicelist`.
* If multiple conditions are `True`, it picks the **first match**.
* If no condition is `True`, it assigns `default`.

#### üîë Key Takeaways

* `np.select` = vectorized **if-elif-else** for arrays.
* `condlist` = list of boolean masks.
* `choicelist` = values assigned for each condition.
* `default` = fallback if no condition matches.
* Always evaluated **in order** ‚Üí first `True` wins.



In [14]:
# GRADING SYSTEM

marks = np.array([95, 67, 45, 80, 30])

# Define conditions
conditions = [
    marks >= 90,
    marks >= 60,
    marks >= 40
]

# Define choices
choices = ["A", "B", "C"]

grades = np.select(conditions, choices, default="F")
print(grades)

['A' 'B' 'C' 'B' 'F']


In [16]:
# Sample DataFrame
df = pd.DataFrame({
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Score": [95, 67, 45, 80, 30]
})

print("Original DF:\n", df, "\n")

# Define conditions on df["Score"]
conditions = [
    df["Score"] >= 90,
    df["Score"] >= 60,
    df["Score"] >= 40
]

# Define choices
choices = ["A", "B", "C"]

# Apply np.select
df["Grade"] = np.select(conditions, choices, default="F")

print("With Grades:\n", df)

Original DF:
       Name  Score
0    Alice     95
1      Bob     67
2  Charlie     45
3    David     80
4      Eva     30 

With Grades:
       Name  Score Grade
0    Alice     95     A
1      Bob     67     B
2  Charlie     45     C
3    David     80     B
4      Eva     30     F


In [None]:
# MULTIPLE NUMERIC OUTPUTS

x = np.array([1, 2, 3, 4, 5])

conditions = [
    x % 2 == 0,   # even
    x % 2 != 0    # odd
]

choices = [
    x**2,   # square if even
    x**3    # cube if odd
]

result = np.select(conditions, choices)
print(result)

[  1   4  27  16 125]


###  Compare `np.select` vs `np.where` vs `pd.cut` for the same grading problem.

In [17]:
# Sample DataFrame
df = pd.DataFrame({
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Score": [95, 67, 45, 80, 30]
})

print("Original DF:\n", df, "\n")

Original DF:
       Name  Score
0    Alice     95
1      Bob     67
2  Charlie     45
3    David     80
4      Eva     30 



In [19]:
# using np.select 
conditions = [
    df["Score"] >= 90,
    df["Score"] >= 60,
    df["Score"] >= 40
]
choices = ["A", "B", "C"]

df["Grade_select"] = np.select(conditions, choices, default="F")
df

Unnamed: 0,Name,Score,Grade_select
0,Alice,95,A
1,Bob,67,B
2,Charlie,45,C
3,David,80,B
4,Eva,30,F


- ‚úÖ Pros: Clean, readable for multiple conditions.
- ‚ùå Cons: Slightly verbose compared to pd.cut.

In [None]:
df["Grade_where"] = np.where(
    df["Score"] >= 90, "A", 
    np.where(df["Score"] >= 60, "B",
             np.where(df["Score"] >= 40, "C", "F")
             )
    )
df

Unnamed: 0,Name,Score,Grade_select,Grade_where
0,Alice,95,A,A
1,Bob,67,B,B
2,Charlie,45,C,C
3,David,80,B,B
4,Eva,30,F,F


- ‚úÖ Pros: Simple if only 2‚Äì3 conditions.
-  ‚ùå Cons: Gets messy with many nested conditions (harder to read).

In [21]:
bins = [0, 40, 60, 90, 100]      # boundaries
labels = ["F", "C", "B", "A"]    # labels for bins

df["Grade_cut"] = pd.cut(df["Score"], bins=bins, labels=labels, right=False)
df

Unnamed: 0,Name,Score,Grade_select,Grade_where,Grade_cut
0,Alice,95,A,A,A
1,Bob,67,B,B,B
2,Charlie,45,C,C,C
3,David,80,B,B,B
4,Eva,30,F,F,F


- ‚úÖ Pros: Super clean for continuous ranges.
- ‚ùå Cons: Less flexible if conditions are not numeric ranges (e.g., mixing categories with numbers).

Let us take this opportunity to understand hor **BINNING** works in `pd.cut`

üëç ‚Äî let‚Äôs break down the **syntax of `pd.cut`** using our example:

```python
df["Grade_cut"] = pd.cut(df["Score"], bins=bins, labels=labels, right=False)
```

#### üîπ General Syntax

```python
pd.cut(
    x,             # the data to bin (Series, array, or list)
    bins,          # how to split into intervals
    right=True,    # whether intervals include the right endpoint
    labels=None,   # assign custom labels instead of interval objects
    include_lowest=False,  # whether the first interval should be left-inclusive
    duplicates="raise",    # how to handle duplicate bin edges
    ordered=True           # whether labels are ordered categories
)
```

#### üîπ Parameter-by-Parameter Explanation

##### 1. **`x`**

* The data you want to **bin**.
* Here ‚Üí `df["Score"]` is the column of numbers (0‚Äì100).


##### 2. **`bins`**

* Defines the boundaries of the bins (intervals).
* You can pass:

  * **list of edges** ‚Üí `[0, 40, 60, 90, 100]` ‚Üí makes intervals:
    `[0,40), [40,60), [60,90), [90,100]`
  * **integer** ‚Üí e.g., `bins=4` ‚Üí pandas automatically splits the range of values into 4 equal-width bins.

Here ‚Üí `[0, 40, 60, 90, 100]` defines **custom cut points**.


##### 3. **`labels`**

* Labels assigned to each bin instead of showing interval ranges.
* If `None` ‚Üí bins will be shown like `(0, 40]`, `(40, 60]`, etc.
* If a list ‚Üí must have same length as `len(bins) - 1`.
* Here ‚Üí `["F", "C", "B", "A"]` ‚Üí directly map bins to grades.


##### 4. **`right`**

* Controls **interval closure**.
* If `right=True` (default): intervals are **right-inclusive** ‚Üí `(a, b]`
* If `right=False`: intervals are **left-inclusive** ‚Üí `[a, b)`

Here ‚Üí `right=False`, so bins are:

* `[0,40) ‚Üí F`
* `[40,60) ‚Üí C`
* `[60,90) ‚Üí B`
* `[90,100) ‚Üí A`


##### 5. **`include_lowest`**

* Whether to include the **lowest value** in the first bin.
* Useful if min value equals left boundary.

E.g., if `Score=0`, setting `include_lowest=True` makes sure it goes into the first bin `[0,40)`.


##### 6. **`ordered`**

* If `labels` are categorical, this decides if they are treated as **ordered categories**.
* Example: Grades `["F", "C", "B", "A"]` are naturally ordered, so keeping `ordered=True` (default) is helpful for sorting.


##### 7. **`duplicates`**

* If bin edges are repeated, it throws an error by default.
* `duplicates="drop"` ‚Üí drops duplicate bin edges silently.


#### üîπ Summary for Example

```python
df["Grade_cut"] = pd.cut(
    df["Score"],                  # data column
    bins=[0, 40, 60, 90, 100],    # custom bin edges
    labels=["F", "C", "B", "A"],  # grade labels
    right=False                   # intervals are left-inclusive
)
```

This means:

* `[0,40) ‚Üí F`
* `[40,60) ‚Üí C`
* `[60,90) ‚Üí B`
* `[90,100) ‚Üí A`

## Pandas `.apply()` Returning Series Demo

In [22]:
# Sample DataFrame
df = pd.DataFrame({
    "A": [1, 2, 3, 4],
    "B": [10, 20, 30, 40],
    "C": [100, 200, 300, 400]
}, index=["row1", "row2", "row3", "row4"])

print("Original DataFrame:\n", df, "\n")

# ---- Normal apply examples ----
print("Column-wise mean (axis=0):\n", df.apply(np.mean, axis=0), "\n")
print("Row-wise sum (axis=1):\n", df.apply(np.sum, axis=1), "\n")

# ---- Special Case: Apply returning a Series ----
print("Apply with lambda returning Series (expands to new columns):\n")

def row_stats(row):
    """Return multiple values for each row as a Series."""
    return pd.Series({
        "sum": row.sum(),
        "mean": row.mean(),
        "max": row.max()
    })

expanded = df.apply(row_stats, axis=1)
print(expanded, "\n")


Original DataFrame:
       A   B    C
row1  1  10  100
row2  2  20  200
row3  3  30  300
row4  4  40  400 

Column-wise mean (axis=0):
 A      2.5
B     25.0
C    250.0
dtype: float64 

Row-wise sum (axis=1):
 row1    111
row2    222
row3    333
row4    444
dtype: int64 

Apply with lambda returning Series (expands to new columns):

        sum   mean    max
row1  111.0   37.0  100.0
row2  222.0   74.0  200.0
row3  333.0  111.0  300.0
row4  444.0  148.0  400.0 



## Custom function to `.agg()` in Pandas

In [23]:
# Custome functions to a series 
s = pd.Series([1, 2, 3, 4, 5])

# Custom function: range = max - min
def my_range(x):
    return x.max() - x.min()

print(s.agg(my_range))   # ‚Üí 4

4


In [25]:
# Custome functions to a df
df = pd.DataFrame({
    "A": [1, 2, 3, 4],
    "B": [10, 20, 30, 40],
    "C": [100, 200, 300, 400]
})

# Apply custom range function to each column
print(df.agg(my_range, axis=0))

A      3
B     30
C    300
dtype: int64


In [27]:
# Multiple functions all at once
print(df.agg(["mean", "std", my_range]))

                 A          B           C
mean      2.500000  25.000000  250.000000
std       1.290994  12.909944  129.099445
my_range  3.000000  30.000000  300.000000


In [None]:
# Column-specific functions with a dict
print(df.agg({
    "A": ["sum", "mean"],
    "B": my_range,
    "C": lambda x: (x**2).sum()
}))

             A     B         C
sum       10.0   NaN       NaN
mean       2.5   NaN       NaN
my_range   NaN  30.0       NaN
<lambda>   NaN   NaN  300000.0


---

# C. Looping
We have already looked at the methods that help in looping in a dataframe. In this section we'll have a more detailed look at Looping.

In [30]:
# dataframe 
df

Unnamed: 0,A,B,C
0,1,10,100
1,2,20,200
2,3,30,300
3,4,40,400


### 1. **`df.items()`** (was `iteritems()` in older versions)

- **What it does:** Iterates over **columns**.
- **Returns:** `(column_name, column_series)` for each column.
- **Use when:** You need to process columns one at a time.


### 2. **`df.iterrows()`**

- **What it does:** Iterates over **rows**.
- **Returns:** `(index, row_series)` for each row.
- **Use when:** You need labels and row-wise access (though slower).
- **Drawback:** Each row is a **Series** ‚Äî not efficient for large DataFrames.


### 3. **`df.itertuples()`**

- **What it does:** Iterates over **rows as namedtuples**.
- **Returns:** One **namedtuple** per row.
- **Use when:** You want row-wise access **faster** than `iterrows()`.


### ‚ö° Performance Comparison

| Method        | Iterates Over | Returns        | Speed     | Best Use Case                     |
|---------------|---------------|----------------|-----------|-----------------------------------|
| `items()`     | Columns        | col name + Series | Fastest   | Processing or transforming columns |
| `iterrows()`  | Rows           | index + Series | Slow      | Easy-to-read but inefficient row access |
| `itertuples()`| Rows           | namedtuple     | Fast      | Efficient row-wise operations     |

### ‚úÖ Recommendation

- Use **`items()`** for column-wise logic.
- Use **`itertuples()`** if you must iterate over rows.
- Avoid `iterrows()` in performance-critical code.


In [31]:
for col, col_data in df.items():
    print(f"{col} : {col_data.values}")

A : [1 2 3 4]
B : [10 20 30 40]
C : [100 200 300 400]


In [34]:
for row in df.itertuples():
    print(f"{row.Index}: {row[1:]}")

0: (1, 10, 100)
1: (2, 20, 200)
2: (3, 30, 300)
3: (4, 40, 400)


---