# Mathematical & Statistical Operations in NumPy

## Why These Operations Matter

In real-world data science and AI tasks, we often need to calculate **summaries or transformations** of large datasets — like totals, averages, deviations, or combined values. NumPy provides **efficient, vectorized operations** for this purpose that are not only faster but also cleaner than using loops.

These operations fall into two categories:

1. **Mathematical Operations** — performing math on arrays.
2. **Statistical & Aggregation Functions** — summarizing data.

These operations form the backbone of many ML algorithms (like gradient descent, mean squared error, standardization, etc.). So understanding them deeply is essential for building intelligent models.

### Mathematical Operations

Mathematical operations allow us to perform calculations on arrays element-wise, meaning the operation is applied to each element individually without needing explicit loops.

Some of the common mathematical operations include:

- **Addition, subtraction, multiplication, division**
    
    We can do simple arithmetic with arrays, just like with numbers.

In [1]:
import numpy as np

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Addition
print(arr1 + arr2)

# Subtraction
print(arr2 - arr1)

# Multiplication
print(arr1 * arr2)

# Division
print(arr2 / arr1)

[5 7 9]
[3 3 3]
[ 4 10 18]
[4.  2.5 2. ]


- **Exponential and logarithmic functions**
    
    We can calculate powers, exponentials (`np.exp()`), and logarithms (`np.log()`).

In [2]:
arr = np.array([1, 4, 9, 16])

sqrt_arr = np.sqrt(arr)          # Square root of each element
exp_arr = np.exp(arr)            # Exponential (e^x) of each element
log_arr = np.log(arr)            # Natural logarithm of each element

print("Square roots:", sqrt_arr)
print("Exponentials:", exp_arr)
print("Logarithms:", log_arr)

Square roots: [1. 2. 3. 4.]
Exponentials: [2.71828183e+00 5.45981500e+01 8.10308393e+03 8.88611052e+06]
Logarithms: [0.         1.38629436 2.19722458 2.77258872]


- **Trigonometric functions**
    
    NumPy provides sine (`np.sin()`), cosine (`np.cos()`), tangent (`np.tan()`), and their inverses.

In [3]:
sin_arr = np.sin(np.pi * arr)    # Sine of each element multiplied by pi
print("Sine values:", sin_arr)

Sine values: [ 1.22464680e-16 -4.89858720e-16  1.10218212e-15 -1.95943488e-15]


- **Rounding functions**
    
    Such as `np.round()`, `np.floor()`, and `np.ceil()` to control decimal values.

In [4]:
arr = np.array([1.2, 2.5, 3.7, 4.0, 5.9])

rounded = np.round(arr)   # Round to nearest integer
floored = np.floor(arr)   # Round down (floor)
ceiled = np.ceil(arr)     # Round up (ceil)

print("Original array:", arr)
print("Rounded:", rounded)
print("Floored:", floored)
print("Ceiled:", ceiled)

Original array: [1.2 2.5 3.7 4.  5.9]
Rounded: [1. 2. 4. 4. 6.]
Floored: [1. 2. 3. 4. 5.]
Ceiled: [2. 3. 4. 4. 6.]


### Statistical Operations in NumPy

Statistical operations help us summarize and describe data. These are essential when we want to understand data distribution, central tendency, and variability.

Common statistical operations we use are:

- **Mean (`np.mean()`)** — The Average Value
    
    The **mean** is what we usually call the average. It tells us the central value of a dataset. We calculate the mean by adding all the data points together and then dividing by the total number of points.
    
    $$
    \text{mean} = \frac{1}{n} \sum_{i=1}^{n} x_i
    $$

In [5]:
data = np.array([10, 20, 30, 40, 50])

mean_val = np.mean(data)
print("Mean:", mean_val)

Mean: 30.0


- **Median (`np.median()`)** — The Middle Value
    
    The **median** is the middle value in our dataset when we arrange all values in order from smallest to largest. It splits our data into two equal halves.
    
    - If we have an **odd number** of data points, the median is simply the middle one.
    - If we have an **even number**, the median is the average of the two middle values.
    
    The median is especially useful because it is **not affected by extreme values or outliers** — unlike the mean.

In [6]:
median_val = np.median(data)
print("Median:", median_val)

Median: 30.0


- **Standard Deviation (`np.std()`)** — How Spread Out Our Data Is
    
    The **standard deviation** measures how spread out or dispersed our data points are around the mean. A small standard deviation means data points are close to the mean, while a large one means they are spread out.
    
    $$
    \sigma = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2}
    
    $$

In [7]:
std_dev = np.std(data)
print("Standard Deviation:", std_dev)

Standard Deviation: 14.142135623730951


- **Variance (`np.var()`)** — The Average of Squared Deviations
    
    **Variance** tells us how much the data varies, but unlike standard deviation, it’s in squared units. It’s the average of the squared differences from the mean:
    
    $$
    \text{variance} = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2
    $$
    
    - Variance is simply the square of the standard deviation σ\sigmaσ.
    - It’s useful in statistics but less interpretable directly because it’s in squared units (like squared meters instead of meters).

In [8]:
variance = np.var(data)
print("Variance:", variance)

Variance: 200.0


- **Minimum and Maximum (`np.min()`, `np.max()`)** — Extremes of Our Data
      - **Minimum** is the smallest value in our dataset.
      - **Maximum** is the largest value.
    
    These values give us an idea of the range of our data and help detect outliers or anomalies.

In [9]:
min_val = np.min(data)
max_val = np.max(data)

print("Min:", min_val)
print("Max:", max_val)

Min: 10
Max: 50


- **Percentiles (`np.percentile()`)** — Position-Based Values in Our Data
    
    A **percentile** tells us the value below which a certain percentage of our data falls. For example, the 25th percentile (also called the first quartile) is the value below which 25% of data lies.
    
    This is helpful in understanding the distribution and spread of data without assuming a normal distribution.

In [10]:
percentile_25 = np.percentile(data, 25)
print("25th Percentile:", percentile_25)

25th Percentile: 20.0


### Key takeaways
    
  When we use NumPy’s statistical functions, we get quick and efficient ways to describe and understand our data:
    
  - **Mean** shows us the average value — a measure of central tendency.
  - **Median** gives the middle value, which is resistant to outliers.
  - **Standard deviation** tells us how spread out our data points are.
  - **Variance** quantifies that spread in squared units.
  - **Minimum and Maximum** tell us the boundaries of our data.
  - **Percentiles** help us understand data distribution by position.
    
  Together, these help us build a strong foundation to analyze datasets before moving on to more complex AI and ML tasks.

### Axis-Based Operations

NumPy lets us compute along rows or columns using the `axis` parameter.

In [11]:
mat = np.array([[1, 2, 3],
                [4, 5, 6]])

print(np.sum(mat, axis=0))  # Column-wise sum
print(np.sum(mat, axis=1))  # Row-wise sum

[5 7 9]
[ 6 15]


- `axis=0`: Column-wise (vertical)
- `axis=1`: Row-wise (horizontal)

### Cumulative Operations

Useful for tracking running totals or multiplying sequences.

In [12]:
x = np.array([1, 2, 3, 4])
print(np.cumsum(x))      
print(np.cumprod(x))     

[ 1  3  6 10]
[ 1  2  6 24]


### Handling Missing Data (NaN) in NumPy

In real-world datasets, missing or undefined values are often represented as `NaN` (Not a Number). Standard NumPy functions like `np.mean()` or `np.sum()` will return `NaN` if any element in the array is `NaN`, which can lead to incorrect or unusable results.

To handle this, NumPy provides **NaN-aware functions** that ignore these missing values while performing calculations. These include:

| Function | Description |
| --- | --- |
| `np.nanmean()` | Computes mean ignoring `NaN` values |
| `np.nansum()` | Computes sum ignoring `NaN` values |
| `np.nanstd()` | Computes standard deviation ignoring `NaN` |
| `np.nanvar()` | Computes variance ignoring `NaN` values |
| `np.nanmin()` | Computes minimum ignoring `NaN` values |
| `np.nanmax()` | Computes maximum ignoring `NaN` values |

These functions are essential for robust statistical analysis on datasets with missing entries.

### Example

In [13]:
data = np.array([10, 20, np.nan, 40, 50])

print("Mean with np.mean():", np.mean(data))
print("Mean with np.nanmean():", np.nanmean(data))

print("Sum with np.sum():", np.sum(data))
print("Sum with np.nansum():", np.nansum(data))

Mean with np.mean(): nan
Mean with np.nanmean(): 30.0
Sum with np.sum(): nan
Sum with np.nansum(): 120.0


In this example, `np.mean()` and `np.sum()` return `NaN` because the input contains a `NaN` value. But the NaN-aware functions ignore the missing value and compute the mean and sum correctly.

## NumPy Mathematical, Statistical, Axis-Based & Cumulative Operations

These are the most commonly used and essential operations in NumPy for AI/ML tasks — helping us analyze, preprocess, and manipulate numerical data efficiently and cleanly.

| Category | Operation | NumPy Function | Plain-Text Formula / Explanation |
| --- | --- | --- | --- |
| **Mathematical** | Addition | `np.add()` or `+` | Element-wise addition: result[i] = a[i] + b[i] |
|  | Subtraction | `np.subtract()` or `-` | result[i] = a[i] - b[i] |
|  | Multiplication | `np.multiply()` or `*` | result[i] = a[i] * b[i] |
|  | Division | `np.divide()` or `/` | result[i] = a[i] / b[i] |
|  | Exponentiation | `np.power()` or `**` | result[i] = a[i] ** 2 |
|  | Square Root | `np.sqrt()` | result[i] = √a[i] |
|  | Logarithm (natural) | `np.log()` | result[i] = ln(a[i]) |
|  | Exponential | `np.exp()` | result[i] = e ** a[i] |
|  | Trigonometric | `np.sin()`, `np.cos()` | Applies trig function element-wise (in radians) |
|  | Rounding | `np.round()`, `np.floor()`, `np.ceil()` | Rounds elements to nearest, down, or up |
|  | Clipping | `np.clip()` | Clips values between min and max: x = min if x < min, x = max if x > max |
| **Statistical** | Mean | `np.mean()` | mean = (1 / n) * sum(x) |
|  | Median | `np.median()` | Middle value in sorted data |
|  | Standard Deviation | `np.std()` | std = sqrt((1 / n) * sum((x - mean)^2)) |
|  | Variance | `np.var()` | var = (1 / n) * sum((x - mean)^2) |
|  | Minimum & Maximum | `np.min()`, `np.max()` | Smallest or largest value in the array |
|  | Percentile | `np.percentile()` | Value below which a certain percentage of data falls (e.g., 25th percentile) |
| **Axis-Based** | Sum across axis | `np.sum(axis=...)` | axis=0 → column-wise sum, axis=1 → row-wise sum |
|  | Mean across axis | `np.mean(axis=...)` | Same as above, but for average |
|  | Std/Var across axis | `np.std(axis=...)`, `np.var(axis=...)` | Standard deviation / variance row-wise or column-wise |
| **Cumulative** | Cumulative Sum | `np.cumsum()` | Running total: [1, 2, 3] → [1, 3, 6] |
|  | Cumulative Product | `np.cumprod()` | Running product: [1, 2, 3] → [1, 2, 6] |

### Exercises

Q1. Create an array of numbers from 1 to 10. Compute their square root and exponential values.

In [14]:
arr = np.arange(1,11)
sqr_root = np.sqrt(arr)
exp_vals = np.exp(arr)

print(sqr_root)

[1.         1.41421356 1.73205081 2.         2.23606798 2.44948974
 2.64575131 2.82842712 3.         3.16227766]


Q2. Create a 3×3 matrix with random numbers between 0 and 1. Compute row-wise and column-wise means.

In [15]:
matrix = np.random.rand(3, 3)

row_means = np.mean(matrix, axis=1)
col_means = np.mean(matrix, axis=0)

print("Matrix:\n", matrix)
print("Row-wise means:", row_means)
print("Column-wise means:", col_means)

Matrix:
 [[0.45080786 0.50651084 0.42056793]
 [0.39880731 0.52739195 0.60951747]
 [0.82984523 0.13110932 0.61413349]]
Row-wise means: [0.45929554 0.51190558 0.52502935]
Column-wise means: [0.55982013 0.38833737 0.54807296]


Q3. Given an array `[5, 10, 15, 20, 25]`, calculate the mean, median, variance, and standard deviation.

In [16]:
array = np.array([5, 10, 15, 20, 25])

mean = np.mean(array)
median = np.median(array)
variance = np.var(array)
standard_deviation = np.std(array)

print(f'Mean: {mean}\n')
print(f'Median: {median}\n')
print(f'Variance: {variance}\n')
print(f'Standard Deviation: {standard_deviation}')

Mean: 15.0

Median: 15.0

Variance: 50.0

Standard Deviation: 7.0710678118654755


Q4. Generate a 1D array of 50 elements using `np.random.randn()`. Compute its cumulative sum and standard deviation.

In [17]:
array = np.random.randn(50)

cumulative_sum = np.cumsum(array)
standard_deviation = np.std(array)

print("Array:\n", array)
print("\nCumulative sum:\n", cumulative_sum)
print("\nStandard deviation:", standard_deviation)

Array:
 [ 0.69501592 -0.34571229 -0.48841055 -2.89742547 -0.49443368  0.19206757
 -0.01603356  0.69031621 -0.04582029 -1.38123119 -0.1068228  -0.8031473
  0.95542854 -1.28445272  0.98285541 -0.20374313 -0.6633023  -0.90100581
  0.64068999  1.31908736  1.05064224  2.12329012 -0.83638833  1.86960797
 -1.44619088 -1.00334417  0.48332542  0.41467908 -0.45284347  1.51702818
 -1.34251157  0.34248723  1.00767271 -0.20925123 -1.36627708 -0.23889153
 -2.06305363 -0.10284394  1.49743159 -0.0203543  -0.76366849  1.11422219
  0.26417732  0.89948184  0.87501179  1.73464941 -0.3404871  -0.14574515
  0.97403136  0.86345383]

Cumulative sum:
 [ 0.69501592  0.34930363 -0.13910692 -3.03653239 -3.53096607 -3.3388985
 -3.35493205 -2.66461584 -2.71043613 -4.09166733 -4.19849012 -5.00163742
 -4.04620888 -5.3306616  -4.34780619 -4.55154931 -5.21485161 -6.11585742
 -5.47516743 -4.15608007 -3.10543783 -0.98214771 -1.81853603  0.05107194
 -1.39511894 -2.39846311 -1.9151377  -1.50045861 -1.95330209 -0.43627391
 

Q5. Create a matrix and clip all values below 2 and above 8 using `np.clip()`.

In [18]:
matrix = np.random.randint(0, 11, size=(4, 4))
print("Original matrix:\n", matrix)

clipped_matrix = np.clip(matrix, 2, 8)
print("\nClipped matrix:\n", clipped_matrix)

Original matrix:
 [[ 9  5  8  2]
 [ 0  3  7 10]
 [ 4  8  1  1]
 [ 9 10  6  1]]

Clipped matrix:
 [[8 5 8 2]
 [2 3 7 8]
 [4 8 2 2]
 [8 8 6 2]]


### Summary

Mathematical and statistical operations in NumPy are essential tools that give us deep insight into our data. These include **element-wise math**, **summarization stats**, **axis-based calculations**, and **cumulative functions** — all of which are fast, memory-efficient, and crucial for AI workflows.

Whether we’re normalizing features using the formula:

$$
z = \frac{x - \mu}{\sigma}
$$

or calculating loss over thousands of predictions, NumPy’s functions make these tasks simple and scalable. These operations aren’t just academic — they’re used **daily** in ML pipelines, deep learning training loops, and exploratory data analysis (EDA).

NumPy’s operations allow loop-free, fast, vectorized computations, which are critical for scaling up to big data and neural networks By mastering these, we unlock the full power of numerical computing in Python, ready to apply them in machine learning, deep learning, and beyond.