# NumPy Statistical Functions Guide

## 📊 Core Statistical Operations

### 1. **Max & Min Functions**
   - **Overall**: Find maximum/minimum across entire array
   - **Vertical (axis=0)**: Find max/min down each column
   - **Horizontal (axis=1)**: Find max/min across each row

### 2. **Element-wise Maximum & Minimum**
   - Returns a **new array** by comparing two arrays element-by-element
   - Uses broadcasting for different shaped arrays

### 3. **Argument Max & Min (argmax, argmin)**
   - Returns the **index position** of maximum/minimum values
   - Works with axis parameter for multi-dimensional arrays

### 4. **Peak-to-Peak (ptp)**
   - Calculates the **difference** between maximum and minimum values
   - Represents the range or spread of data

## 📈 Central Tendency & Distribution

### 5. **Percentile**
   - Find values at specific percentage positions (0-100%)
   - Useful for understanding data distribution

### 6. **Median**
   - The middle value when data is sorted
   - 50th percentile of the dataset

### 7. **Average (Mean)**
   - Arithmetic mean of all values
   - Sum of all values divided by count

### 8. **Variance & Standard Deviation**
   - **Variance (var)**: Measure of data spread from the mean
   - **Standard Deviation (std)**: Square root of variance

### 9. **Mode**
   - Most frequently occurring value in dataset
   - Requires: `from scipy import stats`

## 🔧 Array Operations

### 10. **Axis Understanding**
   - **Row operations**: `axis=1` (horizontal →)
   - **Column operations**: `axis=0` (vertical ↓)

### 11. **Flatten**
   - Convert multi-dimensional array to 1D
   - Useful for overall statistics across entire dataset

## Tip: Understanding the `Max-Min axis` Parameter

When using functions like `np.max()` or `np.min()` with the `axis` parameter:

- **No axis**: Operates on the **entire array** (all elements)
- **`axis=0`**: Operates **vertically** (↓ down the columns) - collapses rows
- **`axis=1`**: Operates **horizontally** (→ across the rows) - collapses columns

### Visual Guide:
```
axis= 0
            [2,  5,  1,  7]
            [3,  0,  6,  9]
            [4, 12,  5,  8]
             ↓   ↓   ↓   ↓
    Result: [4, 12, 6, 9]  (max of each column)

axis=1 (→)
    [2,  5,  1,  7] → Result: 1 (min of row)
    [3,  0,  6,  9] → Result: 0 (min of row)
    [4, 12,  5,  8] → Result: 4 (min of row)
```

**Remember**: The axis you specify is the one that gets **collapsed** or **reduced**.

In [30]:
# 1. Max, Min function
import numpy as np

arr = np.array([[2, 5, 1, 7], 
               [3, 0, 6, 9],
               [4, 12, 5, 8]])

print(np.max(arr))               
print(np.max(arr, axis = 0).tolist())
print(np.min(arr, axis = 1).tolist())

12
[4, 12, 6, 9]
[1, 0, 4]


In [None]:
# 2. Maximum and Minimum Function
# Returns new array
import numpy as np 

arr = np.array([[1,2,3,4,5],[6,7,8,9,10]])
arr2 = np.array([5,4,5,6,7])
max_result = np.maximum(arr, arr2)                     # Broadcasting works here
print(max_result)

arr = np.array([[1,2,3,4,5],[6,7,8,9,10]])
arr3 = arr.flatten()[:5]                              # Take first 5: [1,2,3,4,5]
arr2 = np.array([5,4,5,6,7])                          # 5 elements
max_result = np.maximum(arr3, arr2)
print(max_result)    


[[ 5  4  5  6  7]
 [ 6  7  8  9 10]]


In [17]:
# 3.argmax, argmin 
import numpy as np 
arr3 = np.random.randint(1,15,9)
comparemax = np.argmax(arr3)
comparemin = np.argmin(arr3)
print(arr3.tolist(), "\nThe max value's index is", f"{comparemax}")
print(arr3.tolist(), "\nThe max value's index is", f"{comparemin}")

[3, 13, 1, 12, 3, 14, 5, 1, 10] 
The max value's index is 5
[3, 13, 1, 12, 3, 14, 5, 1, 10] 
The max value's index is 2


In [None]:
# 4. ptp (range)
import numpy as np

arr = np.random.choice([10,50,60,70,80,90],2)
print(np.ptp(arr))

10


In [28]:
# 5. Percentile, median, average, var, std
import numpy as np 

arr = np.arange(3,50,5)
arr2 = np.random.randint(70,1000,20)

calMed = np.median(arr)
calPer = np.percentile(arr,70) # range 0 - 100 percentile
calAvg = np.average(arr2)
calVar = np.var(arr)
calStd = np.std(arr2)

print(calMed, calPer,calAvg,calVar, calStd)

25.5 34.5 545.05 206.25 222.71113016641087


## Understanding `scipy.stats.mode()` - Finding the Most Frequent Value

**Mode** is the value that appears most frequently in a dataset.

### How SciPy Calculates Mode:

NumPy doesn't have a built-in `mode()` function, so we use **`scipy.stats.mode()`** instead.

The function returns:
1. **`mode`**: The most frequent value(s)
2. **`count`**: How many times that value appears

### Key Points:
- Works with the **`axis`** parameter (just like `max`, `min`, etc.)
- **`axis=None`**: Finds mode across entire flattened array
- **`axis=0`**: Finds mode down each column (vertically ↓)
- **`axis=1`**: Finds mode across each row (horizontally →)
- If multiple values have the same highest frequency, it returns the **smallest** value
- **New in SciPy 1.9+**: Returns a `ModeResult` object with `.mode` and `.count` attributes

In [29]:
from scipy import stats
import numpy as np

# Example 1: Simple array with clear mode
arr1 = np.array([1, 2, 2, 3, 3, 3, 4, 4])
result1 = stats.mode(arr1)
print("Array 1:", arr1)
print("Mode:", result1.mode)
print("Count:", result1.count)
print()

# Example 2: 2D array
arr2 = np.array([[1, 2, 2, 3],
                 [2, 3, 3, 1],
                 [2, 1, 3, 3]])

print("Array 2:")
print(arr2)
print()

# Mode across entire array (axis=None)
result_all = stats.mode(arr2, axis=None)
print("Mode (entire array):", result_all.mode)
print("Count:", result_all.count)
print()

# Mode down each column (axis=0)
result_col = stats.mode(arr2, axis=0)
print("Mode (axis=0, down columns):", result_col.mode)
print("Count:", result_col.count)
print()

# Mode across each row (axis=1)
result_row = stats.mode(arr2, axis=1)
print("Mode (axis=1, across rows):", result_row.mode)
print("Count:", result_row.count)

Array 1: [1 2 2 3 3 3 4 4]
Mode: 3
Count: 3

Array 2:
[[1 2 2 3]
 [2 3 3 1]
 [2 1 3 3]]

Mode (entire array): 3
Count: 5

Mode (axis=0, down columns): [2 1 3 3]
Count: [2 1 2 2]

Mode (axis=1, across rows): [2 3 3]
Count: [2 2 2]


### Visual Breakdown of Mode Calculation:

For the array:
```
        Col 0  Col 1  Col 2  Col 3
Row 0:    1      2      2      3
Row 1:    2      3      3      1
Row 2:    2      1      3      3
```

**axis=0 (↓ down columns)**:
- Column 0: [1, 2, 2] → mode is **2** (appears 2 times)
- Column 1: [2, 3, 1] → mode is **1** (all appear once, returns smallest)
- Column 2: [2, 3, 3] → mode is **3** (appears 2 times)
- Column 3: [3, 1, 3] → mode is **3** (appears 2 times)

**axis=1 (→ across rows)**:
- Row 0: [1, 2, 2, 3] → mode is **2** (appears 2 times)
- Row 1: [2, 3, 3, 1] → mode is **3** (appears 2 times)
- Row 2: [2, 1, 3, 3] → mode is **3** (appears 2 times)

**Entire array (axis=None)**:
- All values: [1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3]
- Mode is **3** (appears 5 times)

## Mode - Example Code

In [4]:
import numpy as np
from scipy import stats

myArray = np.array([[10, 20, 20],
                    [50, 60, 70 ],
                    [10, 60, 10]])

print(stats.mode(myArray))

ModeResult(mode=array([10, 60, 10], dtype=int64), count=array([2, 2, 1], dtype=int64))


## Important: Default Behavior of `stats.mode()`

When **no axis** is specified, `stats.mode()` **flattens the array** (axis=None), not axis=0!