# DataFrame - Looping and Aggregation

In [1]:
# import the libraries 
import pandas as pd
import numpy as np

In [2]:
# load a dataset 
data = pd.read_csv("./Data/titanic.csv")
data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [3]:
# iteration over columns (col_name , series) tuple
for col_name , col in data.items():
    print(col_name , type(col))
    break

PassengerId <class 'pandas.core.series.Series'>


In [4]:
# iteration over rows (index , row(as a series )) tuple
for idx, row in data.iterrows():
    print(idx, type(row))
    break

0 <class 'pandas.core.series.Series'>


In [21]:
# iteration over rows as namedtuple (index as first item)
for tup in data.itertuples():
    print(tup)
    print(tup[0], tup.Name)
    break

Pandas(Index=0, PassengerId=1, Survived=0, Pclass=3, Name='Braund, Mr. Owen Harris', Sex='male', Age=22.0, SibSp=1, Parch=0, Ticket='A/5 21171', Fare=7.25, Cabin=nan, Embarked='S')
0 Braund, Mr. Owen Harris


### 1. **`df.items()`** (was `iteritems()` in older versions)

- **What it does:** Iterates over **columns**.
- **Returns:** `(column_name, column_series)` for each column.
- **Use when:** You need to process columns one at a time.


### 2. **`df.iterrows()`**

- **What it does:** Iterates over **rows**.
- **Returns:** `(index, row_series)` for each row.
- **Use when:** You need labels and row-wise access (though slower).
- **Drawback:** Each row is a **Series** — not efficient for large DataFrames.


### 3. **`df.itertuples()`**

- **What it does:** Iterates over **rows as namedtuples**.
- **Returns:** One **namedtuple** per row.
- **Use when:** You want row-wise access **faster** than `iterrows()`.


### ⚡ Performance Comparison

| Method        | Iterates Over | Returns        | Speed     | Best Use Case                     |
|---------------|---------------|----------------|-----------|-----------------------------------|
| `items()`     | Columns        | col name + Series | Fastest   | Processing or transforming columns |
| `iterrows()`  | Rows           | index + Series | Slow      | Easy-to-read but inefficient row access |
| `itertuples()`| Rows           | namedtuple     | Fast      | Efficient row-wise operations     |

---

### ✅ Recommendation

- Use **`items()`** for column-wise logic.
- Use **`itertuples()`** if you must iterate over rows.
- Avoid `iterrows()` in performance-critical code.

---


## `.agg()`

The `.agg()` (short for **aggregate**) method in pandas is powerful for applying **one or multiple functions** to **DataFrame columns or groups**.

Let’s break it down with real-world examples.


### 🧠 Syntax Overview

```python
df.agg(func_or_dict)
```

- `func_or_dict` can be:
  - A string (`'mean'`, `'sum'`, etc.)
  - A list of functions
  - A dictionary mapping column names to functions


### ✅ Summary

| Use Case                              | Example                                 |
|---------------------------------------|-----------------------------------------|
| All numeric columns                   | `df.agg(['mean', 'max'])`              |
| Different agg for each column         | `df.agg({'Math': 'mean', 'Sci': 'max'})`|
| With groupby                          | `df.groupby('Class').agg(...)`         |
| Custom lambda                         | `df.agg({'Math': lambda x: ...})`      |

---


## `.pipe()`
The `.pipe()` function in pandas is a powerful tool for **clean, readable method chaining**, especially when you want to apply custom functions within a chain of DataFrame operations.

---

### 🧠 Basic Idea:
`.pipe(func)` passes the **whole DataFrame** (or Series) as the **first argument** to the function `func`.

---

### ✅ General Syntax:
```python
df.pipe(function, *args, **kwargs)
```

Which is equivalent to:
```python
function(df, *args, **kwargs)
```

---

### 🔧 Why Use `.pipe()`?

- Improves **readability** in method chaining
- Allows you to insert **custom functions** into a method chain
- Prevents breaking the flow of chained pandas operations

---

### 🔢 Example 1: Without `.pipe()`
```python
def add_5_and_square(df):
    return (df + 5) ** 2

result = add_5_and_square(df)
```

### ✅ Example 1: With `.pipe()`
```python
result = df.pipe(add_5_and_square)
```

---

### 🔢 Example 2: With Arguments

```python
def add_and_multiply(df, add_val, mul_val):
    return (df + add_val) * mul_val

df.pipe(add_and_multiply, add_val=2, mul_val=10)
```

This is the same as:
```python
add_and_multiply(df, 2, 10)
```

---

### ✅ Real-World Chaining Example:

```python
(df
 .dropna()
 .pipe(lambda d: d[d['value'] > 0])
 .assign(log_value=lambda d: np.log(d['value']))
)
```

- `dropna()` cleans the data
- `pipe()` filters rows where `'value' > 0`
- `assign()` adds a new column with log-transformed values

---

### ✨ Summary:
| Feature               | Benefit                                  |
|-----------------------|-------------------------------------------|
| Clean method chaining | Keeps data pipelines readable             |
| Passes full df/series | Enables functional composition            |
| Accepts arguments     | Supports complex custom transformations   |

---
