# Manipulation Methods 

## üîπ Pandas Series Manipulation Methods

| Method               | Syntax                                                                                                                                                     | Description                                                                                                                         |
| -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| **apply**            | `s.apply(func, convert_dtype=True, args=(), **kwds)`                                                                                                       | Apply a Python or NumPy function to each element of the Series. Returns a Series (or DataFrame if func returns Series).             |
| **where**            | `s.where(cond, other=np.nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)`                                                        | Keep values where `cond` is True; replace others with `other`. Works like an element-wise if-else.                                  |
| **fillna**           | `s.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)`                                                                   | Fill missing values (`NaN`) with scalar, dict, Series, or method (e.g. `"ffill"`, `"bfill"`).                                       |
| **interpolate**      | `s.interpolate(method='linear', axis=0, limit=None, inplace=False, limit_direction=None, limit_area=None, **kwargs)`                                       | Fill missing values using interpolation (linear, time, spline, etc.).                                                               |
| **clip**             | `s.clip(lower=None, upper=None, axis=None, inplace=False, *args, **kwargs)`                                                                                | Restrict values to a given range `[lower, upper]`. Values outside are clipped.                                                      |
| **sort\_values**     | `s.sort_values(axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)`                                 | Sort values in the Series. Choose algorithm via `kind`. NaNs go last by default.                                                    |
| **sort\_index**      | `s.sort_index(axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)` | Sort Series by index instead of values.                                                                                             |
| **drop\_duplicates** | `s.drop_duplicates(keep='first', inplace=False)`                                                                                                           | Remove duplicate values. Keep `"first"`, `"last"`, or remove all duplicates with `False`.                                           |
| **rank**             | `s.rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)`                                                         | Return ranks of values in the Series. Handles ties with `"average"`, `"min"`, `"max"`, `"first"`, `"dense"`.                        |
| **replace**          | `s.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')`                                                             | Replace values in Series using string, number, list, regex, or dictionary mapping.                                                  |
| **cut**              | `pd.cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False, duplicates='raise', ordered=True)`                             | Bin continuous values into discrete intervals defined by `bins`. Useful for grading, grouping.                                      |
| **qcut**             | `pd.qcut(x, q, labels=None, retbins=False, precision=3, duplicates='raise')`                                                                               | Bin values into quantile-based bins (e.g. quartiles, deciles). Useful for splitting into equal-sized groups.                        |
| **np.select**        | `np.select(condlist, choicelist, default=0)`                                                                                                               | Vectorized conditional logic. For each condition in `condlist`, apply the matching choice from `choicelist`. Returns a NumPy array. |


Let us now consider a DataFrame 

In [1]:
import pandas as pd
import numpy as np

df = pd.DataFrame({
    "A": [10, 20, 30, 40, np.nan, 60, 70, 70],
    "B": [5, 15, np.nan, 25, 35, 45, np.nan, 65],
    "C": ["x", "y", "z", "x", "y", "z", "x", "y"]
})

print(df)

      A     B  C
0  10.0   5.0  x
1  20.0  15.0  y
2  30.0   NaN  z
3  40.0  25.0  x
4   NaN  35.0  y
5  60.0  45.0  z
6  70.0   NaN  x
7  70.0  65.0  y


## 1.  `.apply()` ‚Üí Apply function across columns or rows

In [2]:
df['A_Squared'] = df['A'].apply(lambda x: x**2 if pd.notna(x) else x)
df

Unnamed: 0,A,B,C,A_Squared
0,10.0,5.0,x,100.0
1,20.0,15.0,y,400.0
2,30.0,,z,900.0
3,40.0,25.0,x,1600.0
4,,35.0,y,
5,60.0,45.0,z,3600.0
6,70.0,,x,4900.0
7,70.0,65.0,y,4900.0


## 2. `.where()` ‚Üí Conditional replacement

In [3]:
df['B_where'] = df['B'].where(df['B']>20, other=0)
df

Unnamed: 0,A,B,C,A_Squared,B_where
0,10.0,5.0,x,100.0,0.0
1,20.0,15.0,y,400.0,0.0
2,30.0,,z,900.0,0.0
3,40.0,25.0,x,1600.0,25.0
4,,35.0,y,,35.0
5,60.0,45.0,z,3600.0,45.0
6,70.0,,x,4900.0,0.0
7,70.0,65.0,y,4900.0,65.0


I can get similar results using `np.wehere` too! Let's do that!

In [5]:
df['B_np.where'] = np.where(
    df['B']>20,
    df['B'],
    0
)
df

Unnamed: 0,A,B,C,A_Squared,B_where,B_np.where
0,10.0,5.0,x,100.0,0.0,0.0
1,20.0,15.0,y,400.0,0.0,0.0
2,30.0,,z,900.0,0.0,0.0
3,40.0,25.0,x,1600.0,25.0,25.0
4,,35.0,y,,35.0,35.0
5,60.0,45.0,z,3600.0,45.0,45.0
6,70.0,,x,4900.0,0.0,0.0
7,70.0,65.0,y,4900.0,65.0,65.0


## 3. `.fillna()` ‚Üí Fill missing values

In [7]:
df["B_filled"] = df["B"].fillna(99)
df[["B", "B_filled"]]

Unnamed: 0,B,B_filled
0,5.0,5.0
1,15.0,15.0
2,,99.0
3,25.0,25.0
4,35.0,35.0
5,45.0,45.0
6,,99.0
7,65.0,65.0


## 4. `.interpolate()` ‚Üí Fill missing values smoothly

In [8]:
df["B_interp"] = df["B"].interpolate(method="linear")
print(df[["B", "B_interp"]])

      B  B_interp
0   5.0       5.0
1  15.0      15.0
2   NaN      20.0
3  25.0      25.0
4  35.0      35.0
5  45.0      45.0
6   NaN      55.0
7  65.0      65.0


## 5. `.clip()` ‚Üí Restrict values to a range

In [10]:
df['A_clipped'] = df['A'].clip(lower=15, upper=60)
print(df[["A", "A_clipped"]])

      A  A_clipped
0  10.0       15.0
1  20.0       20.0
2  30.0       30.0
3  40.0       40.0
4   NaN        NaN
5  60.0       60.0
6  70.0       60.0
7  70.0       60.0


- ‚úÖ Values in A less than 15 become 15; greater than 60 become 60.

## 6. `.sort_values()` ‚Üí Sort by column values

In [11]:
df.sort_values(by='B')

Unnamed: 0,A,B,C,A_Squared,B_where,B_np.where,B_filled,B_interp,A_clipped
0,10.0,5.0,x,100.0,0.0,0.0,5.0,5.0,15.0
1,20.0,15.0,y,400.0,0.0,0.0,15.0,15.0,20.0
3,40.0,25.0,x,1600.0,25.0,25.0,25.0,25.0,40.0
4,,35.0,y,,35.0,35.0,35.0,35.0,
5,60.0,45.0,z,3600.0,45.0,45.0,45.0,45.0,60.0
7,70.0,65.0,y,4900.0,65.0,65.0,65.0,65.0,60.0
2,30.0,,z,900.0,0.0,0.0,99.0,20.0,30.0
6,70.0,,x,4900.0,0.0,0.0,99.0,55.0,60.0


- Note - The `NaN` values go to the last!

## 7. `.sort_index()` ‚Üí Sort by index

In [12]:
df.sort_index(ascending=False)

Unnamed: 0,A,B,C,A_Squared,B_where,B_np.where,B_filled,B_interp,A_clipped
7,70.0,65.0,y,4900.0,65.0,65.0,65.0,65.0,60.0
6,70.0,,x,4900.0,0.0,0.0,99.0,55.0,60.0
5,60.0,45.0,z,3600.0,45.0,45.0,45.0,45.0,60.0
4,,35.0,y,,35.0,35.0,35.0,35.0,
3,40.0,25.0,x,1600.0,25.0,25.0,25.0,25.0,40.0
2,30.0,,z,900.0,0.0,0.0,99.0,20.0,30.0
1,20.0,15.0,y,400.0,0.0,0.0,15.0,15.0,20.0
0,10.0,5.0,x,100.0,0.0,0.0,5.0,5.0,15.0


## 8. `.drop_duplicates()` ‚Üí Remove duplicate rows

In [None]:
df.drop_duplicates(subset=['A']) # The subset allows to drop duplicates of that specified subset of rows

Unnamed: 0,A,B,C,A_Squared,B_where,B_np.where,B_filled,B_interp,A_clipped
0,10.0,5.0,x,100.0,0.0,0.0,5.0,5.0,15.0
1,20.0,15.0,y,400.0,0.0,0.0,15.0,15.0,20.0
2,30.0,,z,900.0,0.0,0.0,99.0,20.0,30.0
3,40.0,25.0,x,1600.0,25.0,25.0,25.0,25.0,40.0
4,,35.0,y,,35.0,35.0,35.0,35.0,
5,60.0,45.0,z,3600.0,45.0,45.0,45.0,45.0,60.0
6,70.0,,x,4900.0,0.0,0.0,99.0,55.0,60.0


- ‚úÖ Keeps first occurrence of duplicate A=70, drops the rest.

## 9. `.rank()` ‚Üí Rank values

In [18]:
df["A_rank"] = df["A"].rank(method="min")
df[["A", "A_rank"]]

Unnamed: 0,A,A_rank
0,10.0,1.0
1,20.0,2.0
2,30.0,3.0
3,40.0,4.0
4,,
5,60.0,5.0
6,70.0,6.0
7,70.0,6.0


### üëç Let‚Äôs build a **side-by-side comparison table** of all ranking methods in Pandas.


```python
import pandas as pd

df = pd.DataFrame({"A": [100, 200, 200, 300]})
```

#### üî¢ Ranking Comparison

| Value (A) | `average` | `min` | `max` | `first` | `dense` |
| --------- | --------- | ----- | ----- | ------- | ------- |
| **100**   | 1.0       | 1.0   | 1.0   | 1.0     | 1.0     |
| **200**   | 2.5       | 2.0   | 3.0   | 2.0     | 2.0     |
| **200**   | 2.5       | 2.0   | 3.0   | 3.0     | 2.0     |
| **300**   | 4.0       | 4.0   | 4.0   | 4.0     | 3.0     |

#### ‚úÖ Explanation

* **`average`** ‚Üí ties get the mean of ranks (200 gets (2+3)/2 = 2.5).
* **`min`** ‚Üí ties take the lowest rank available (200 ‚Üí 2).
* **`max`** ‚Üí ties take the highest rank available (200 ‚Üí 3).
* **`first`** ‚Üí ties broken by order of appearance (1st 200 ‚Üí 2, 2nd 200 ‚Üí 3).
* **`dense`** ‚Üí ties like `min`, but next rank is consecutive (300 ‚Üí 3, not 4).


## 10. `.replace()` ‚Üí Replace values

In [17]:
df["C_replaced"] = df["C"].replace({"x": "X-ray", 
                                    "y": "Yellow"})
df[["C", "C_replaced"]]

Unnamed: 0,C,C_replaced
0,x,X-ray
1,y,Yellow
2,z,z
3,x,X-ray
4,y,Yellow
5,z,z
6,x,X-ray
7,y,Yellow


## 11. `pd.cut()` ‚Üí Bin continuous values

In [19]:
df["A_bin"] = pd.cut(df["A"], bins=[0, 30, 60, 100], labels=["Low", "Mid", "High"])
df[["A", "A_bin"]]

Unnamed: 0,A,A_bin
0,10.0,Low
1,20.0,Low
2,30.0,Low
3,40.0,Mid
4,,
5,60.0,Mid
6,70.0,High
7,70.0,High


- ‚úÖ Groups A values into categories (Low, Mid, High).

## 12. `pd.qcut()` ‚Üí Quantile-based binning

In [20]:
df["B_qbin"] = pd.qcut(df["B"], q=3, labels=["Low", "Medium", "High"])
df[["B", "B_qbin"]]

Unnamed: 0,B,B_qbin
0,5.0,Low
1,15.0,Low
2,,
3,25.0,Medium
4,35.0,Medium
5,45.0,High
6,,
7,65.0,High


- ‚úÖ Splits B into 3 equal-sized groups by rank.


In **`pd.qcut`**, the parameter **`q`** stands for **quantiles**.

* It divides the data into **q equal-sized groups** based on rank (percentiles), not on fixed value ranges.
* Each bin has (approximately) the same number of observations.

### Example

```python
import pandas as pd

df = pd.DataFrame({"B": [10, 20, 30, 40, 50, 60, 70, 80, 90]})

df["B_qbin"] = pd.qcut(df["B"], q=3, labels=["Low", "Medium", "High"])
print(df)
```

**Output:**

| B  | B\_qbin |
| -- | ------- |
| 10 | Low     |
| 20 | Low     |
| 30 | Low     |
| 40 | Medium  |
| 50 | Medium  |
| 60 | Medium  |
| 70 | High    |
| 80 | High    |
| 90 | High    |


### Explanation

* `q=3` ‚Üí split into **3 equal groups**.
* Each group gets \~‚Öì of the data:

  * **Low** ‚Üí bottom 33%
  * **Medium** ‚Üí middle 33%
  * **High** ‚Üí top 33%



## 13. `np.select()` ‚Üí Multi-condition classification

In [21]:
condlist = [
    df["A"] < 30,
    (df["A"] >= 30) & (df["A"] <= 60),
    df["A"] > 60
]
choicelist = ["Small", "Medium", "Large"]

df["A_size"] = np.select(condlist, choicelist, default="Unknown")
print(df[["A", "A_size"]])

      A   A_size
0  10.0    Small
1  20.0    Small
2  30.0   Medium
3  40.0   Medium
4   NaN  Unknown
5  60.0   Medium
6  70.0    Large
7  70.0    Large


---

# Final Manipulated DataFrame

In [22]:
df

Unnamed: 0,A,B,C,A_Squared,B_where,B_np.where,B_filled,B_interp,A_clipped,A_rank,C_replaced,A_bin,B_qbin,A_size
0,10.0,5.0,x,100.0,0.0,0.0,5.0,5.0,15.0,1.0,X-ray,Low,Low,Small
1,20.0,15.0,y,400.0,0.0,0.0,15.0,15.0,20.0,2.0,Yellow,Low,Low,Small
2,30.0,,z,900.0,0.0,0.0,99.0,20.0,30.0,3.0,z,Low,,Medium
3,40.0,25.0,x,1600.0,25.0,25.0,25.0,25.0,40.0,4.0,X-ray,Mid,Medium,Medium
4,,35.0,y,,35.0,35.0,35.0,35.0,,,Yellow,,Medium,Unknown
5,60.0,45.0,z,3600.0,45.0,45.0,45.0,45.0,60.0,5.0,z,Mid,High,Medium
6,70.0,,x,4900.0,0.0,0.0,99.0,55.0,60.0,6.0,X-ray,High,,Large
7,70.0,65.0,y,4900.0,65.0,65.0,65.0,65.0,60.0,6.0,Yellow,High,High,Large


---