# Mutation

## Element-wise Operations

```python
# Assume value is a scalar
df + value  # Add value to all elements
df - value  # Subtract value from all elements
df * value  # Multiply all elements by value
df / value  # Divide all elements by value

# Ops between two dfs
df1 + df2   # Add corresponding elements
df1 - df2   # Subtract corresponding elements
df1 * df2   # Multiply corresponding elements
df1 / df2   # Divide corresponding elements
```

## DataFrame and Series Operations

Operations between DataFrames and Series align on the index by default:

```python
# DataFrame divided by Series (aligned by index)
df / series  # Series index must match DataFrame index

# Row-wise operations (using axis)
df.add(series, axis=0)       # Add Series to each row
df.subtract(series, axis=0)  # Subtract Series from each row

# Column-wise operations (default)
df.add(series, axis=1)       # Add Series to each column
df.subtract(series, axis=1)  # Subtract Series from each column
df.divide(series, axis=1)    # Divide each column by Series
```

## Apply Function

`df.apply` 
- Apply function to each row: `df.apply(func, axis=1)`
- Apply function to each column: `df.apply(func, axis=0)`
- Objects passed to the function are `pd.Series` whose index is either the DataFrame’s index (`axis=0`) or columns (`axis=1`).

```python
# Apply function to each column
df.apply(func, axis=0) 

# Apply function to each row
df.apply(func, axis=1)

# Using a custom function
df.apply(lambda col: f(col), axis=0)
df.apply(lambda row: f(row), axis=1)
```

## DataFrame.map()

- Apply a function to a Dataframe elementwise
- Applies a function that accepts and returns a scalar to every element of a DataFrame.

```python
df.map(lambda x: func(x))
```

## Transform Function

- Call `func` on self producing a DataFrame with the same axis shape as self.
- Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. 

```python
df.transform(func)
```

## Built-in Statistical Functions

### Basic Statistics
```python
df.mean()        # Mean of values
df.median()      # Median of values
df.mode()        # Mode of values
df.sum()         # Sum of values
df.min()         # Minimum
df.max()         # Maximum
df.count()       # Count of non-NA/null values
```

### Measures of Spread
```python
df.std()         # Standard deviation
df.var()         # Variance
df.sem()         # Standard error of mean
df.mad()         # Mean absolute deviation
```

### Quantiles and Percentiles
```python
df.quantile(0.5)               # Median (50th percentile)
df.quantile([0.25, 0.5, 0.75]) # Return multiple quantiles
df.describe()                  # Summary statistics including quantiles
```

### Axis Parameter
- Most statistical functions accept an `axis` parameter
- `axis=0`: Apply operation along columns (default)
- `axis=1`: Apply operation along rows

```python
df.mean(axis=0)  # Column means (default)
df.mean(axis=1)  # Row means
```

# Examples

In [47]:
import pandas as pd
pd.set_option("mode.copy_on_write", True)
import numpy as np

In [48]:
# Create a sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'Salary': [50000, 60000, 70000, 80000]
}
df = pd.DataFrame(data, index=['p1', 'p2', 'p3', 'p4'])
df

Unnamed: 0,Name,Age,Salary
p1,Alice,25,50000
p2,Bob,30,60000
p3,Charlie,35,70000
p4,David,40,80000


## Example: Element-wise Operations

In [49]:
df_numeric = df[['Age', 'Salary']]
df_numeric * 2

Unnamed: 0,Age,Salary
p1,50,100000
p2,60,120000
p3,70,140000
p4,80,160000


In [50]:
# Operations between DataFrames
df2 = pd.DataFrame({
    'Age': [1, 2, 3, 4],
    'Salary': [1000, 2000, 3000, 4000]
}, index=['p1', 'p2', 'p3', 'p4'])

# Addition between DataFrames
df_numeric + df2

Unnamed: 0,Age,Salary
p1,26,51000
p2,32,62000
p3,38,73000
p4,44,84000


## Example: Using apply() on DataFrame

In [51]:
# Apply to columns (axis=0)
df_numeric.apply(lambda col: col.max())

Age          40
Salary    80000
dtype: int64

In [52]:
# Apply to rows (axis=1)
def process_row(row):
    if row['Age'] >= 35:
        return f"{row['Name']} is senior with ${row['Salary']:,}"
    else:
        return f"{row['Name']} is junior with ${row['Salary']:,}"

df.apply(process_row, axis=1)

p1      Alice is junior with $50,000
p2        Bob is junior with $60,000
p3    Charlie is senior with $70,000
p4      David is senior with $80,000
dtype: object

## Example: Series Operations

In [53]:
# DataFrame and Series operations
# Create a Series with the same index as DataFrame
multiplier = pd.Series([2, 3, 1, 2], index=['p1', 'p2', 'p3', 'p4'])
print("Original DataFrame:")
print(df_numeric)

print("\nMultiplier Series:")
print(multiplier)

print("\nDataFrame divided by Series (row-wise):")
print(df_numeric.divide(multiplier, axis=0))

Original DataFrame:
    Age  Salary
p1   25   50000
p2   30   60000
p3   35   70000
p4   40   80000

Multiplier Series:
p1    2
p2    3
p3    1
p4    2
dtype: int64

DataFrame divided by Series (row-wise):
     Age   Salary
p1  12.5  25000.0
p2  10.0  20000.0
p3  35.0  70000.0
p4  20.0  40000.0


In [54]:
# DataFrame and Series operations
# Create a Series with the same index as DataFrame
print("Original DataFrame:")
print(df_numeric)

# Column-wise operation using Series with column labels
col_adjuster = pd.Series([10, 1000], index=['Age', 'Salary'])
print("\nColumn adjuster Series:")
print(col_adjuster)

print("\nAdd Series to DataFrame columns:")
print(df_numeric.add(col_adjuster, axis=1))

Original DataFrame:
    Age  Salary
p1   25   50000
p2   30   60000
p3   35   70000
p4   40   80000

Column adjuster Series:
Age         10
Salary    1000
dtype: int64

Add Series to DataFrame columns:
    Age  Salary
p1   35   51000
p2   40   61000
p3   45   71000
p4   50   81000


## Example: Transform Function

In [55]:
# Basic transform
# Standardize numeric columns (z-score normalization)
print("Original DataFrame:")
print(df)
print("\nAfter z-score standardization (numeric columns only):")
print(df_numeric.transform(lambda _df: (_df - _df.mean()) / _df.std()))

Original DataFrame:
       Name  Age  Salary
p1    Alice   25   50000
p2      Bob   30   60000
p3  Charlie   35   70000
p4    David   40   80000

After z-score standardization (numeric columns only):
         Age    Salary
p1 -1.161895 -1.161895
p2 -0.387298 -0.387298
p3  0.387298  0.387298
p4  1.161895  1.161895


## Example: DataFrame.map() Function

In [56]:
# Using map on DataFrame (applies elementwise)

# Create a sample numeric DataFrame
numeric_df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8]
})

# Apply elementwise function
result = numeric_df.map(lambda x: x**2 if x > 3 else x)
print("Original DataFrame:")
print(numeric_df)
print("\nAfter applying map function:")
print(result)

Original DataFrame:
   A  B
0  1  5
1  2  6
2  3  7
3  4  8

After applying map function:
    A   B
0   1  25
1   2  36
2   3  49
3  16  64


## Example: Built-in Statistical Functions

In [57]:
df_numeric = df[['Age', 'Salary']]
df_numeric

Unnamed: 0,Age,Salary
p1,25,50000
p2,30,60000
p3,35,70000
p4,40,80000


In [58]:
# Demonstrate basic statistical functions
print("Mean values by column:")
print(df_numeric.mean())

print("\nMedian values by column:")
print(df_numeric.median())

print("\nSum values by column:")
print(df_numeric.sum())

print("\nMin and max values:")
print("Min:\n", df_numeric.min())
print("Max:\n", df_numeric.max())

# Calculate statistics by row instead of column
print("\nMean values by row (axis=1):")
print(df_numeric.mean(axis=1))

Mean values by column:
Age          32.5
Salary    65000.0
dtype: float64

Median values by column:
Age          32.5
Salary    65000.0
dtype: float64

Sum values by column:
Age          130
Salary    260000
dtype: int64

Min and max values:
Min:
 Age          25
Salary    50000
dtype: int64
Max:
 Age          40
Salary    80000
dtype: int64

Mean values by row (axis=1):
p1    25012.5
p2    30015.0
p3    35017.5
p4    40020.0
dtype: float64


In [59]:
# Standard deviation and variance
print("Standard deviation:")
print(df_numeric.std())

print("\nVariance:")
print(df_numeric.var())

Standard deviation:
Age           6.454972
Salary    12909.944487
dtype: float64

Variance:
Age       4.166667e+01
Salary    1.666667e+08
dtype: float64


In [60]:
# Quantiles and percentiles
print("25th percentile (Q1):")
print(df_numeric.quantile(0.25))

print("\n50th percentile (Median):")
print(df_numeric.quantile(0.5))

print("\n75th percentile (Q3):")
print(df_numeric.quantile(0.75))

# Multiple quantiles at once
print("\nMultiple quantiles:")
print(df_numeric.quantile([0.1, 0.25, 0.5, 0.75, 0.9]))

25th percentile (Q1):
Age          28.75
Salary    57500.00
Name: 0.25, dtype: float64

50th percentile (Median):
Age          32.5
Salary    65000.0
Name: 0.5, dtype: float64

75th percentile (Q3):
Age          36.25
Salary    72500.00
Name: 0.75, dtype: float64

Multiple quantiles:
        Age   Salary
0.10  26.50  53000.0
0.25  28.75  57500.0
0.50  32.50  65000.0
0.75  36.25  72500.0
0.90  38.50  77000.0
