## Basic Statistics Formulas

---

**Mean**

The mean (average) is:

$\mu = \frac{1}{n} \sum_{i=1}^{n} x_i$

---

**Variance (Population)**

The variance is:

$\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2$

---

**Standard Deviation**

The standard deviation is the square root of the variance:

$\sigma = \sqrt{\sigma^2}$

---

**Where:**

- $n$ is the number of data points  
- $x_i$ is each individual data point  
- $\mu$ is the mean  
- $\sigma^2$ is the variance  
- $\sigma$ is the standard deviation


In [2]:
# implement formula to calculate mean, variance and standard deviation

import numpy as np

# Sample data: e.g., vibration measurements in mm
data = np.array([2.3, 2.5, 2.4, 2.6, 2.7, 2.4, 2.5, 2.6, 2.2, 2.5, 2.2])

# Step 1: Calculate mean
mean = np.mean(data)

# Step 2: Calculate variance (population)
variance = np.sum((data - mean) ** 2) / len(data)

# Step 3: Calculate standard deviation
std_dev = np.sqrt(variance)

print("Mean:", mean)
print("Variance:", variance)
print("Standard Deviation:", std_dev)


Mean: 2.4454545454545453
Variance: 0.02429752066115703
Standard Deviation: 0.15587661999529318


## Least Squares Linear Regression Equations

The goal is to fit a line of the form:

$y = mx + c$

where:  
- $m$ is the slope of the line.  
- $c$ is the intercept.

---

### **Formulas**

**Slope ($m$)**

$m = \frac{n \sum xy - \sum x \sum y}{n \sum x^2 - (\sum x)^2}$

---

**Intercept ($c$)**

$c = \frac{\sum y - m \sum x}{n}$

---

Where:  
- $n$ = number of data points  
- $x$ = independent variable (e.g., strain)  
- $y$ = dependent variable (e.g., stress)


In [4]:
# Suppose you measure the stress (force per unit area) and the corresponding strain (deformation) in a material test.
# The relationship is approximately linear in the elastic region:

# Stress = E * Strain
# where E = Young’s Modulus (slope of the line)

# Goal: Given a set of (strain, stress) pairs, fit a straight line:
#  y=mx+c , using Least Squares Method

import numpy as np

def linear_regression(x_values, y_values):
    """
    Perform simple linear regression (least squares) on two lists.

    Args:
        x_values (list of float): Independent variable (e.g., strain)
        y_values (list of float): Dependent variable (e.g., stress)

    Returns:
        slope (float): Slope of the best fit line
        intercept (float): Intercept of the best fit line
    """
    # Calculate sums
    n = len(x_values)
    sum_x = np.sum(x_values)
    sum_y = np.sum(y_values)
    sum_xy = np.sum(x_values * y_values)
    sum_x2 = np.sum(x_values**2)
    
    # Least squares formulas
    m = (n * sum_xy - sum_x * sum_y) / (n * sum_x2 - sum_x**2)
    c = (sum_y - m * sum_x) / n

    return m, c



# Example data
x = np.array([0.001, 0.002, 0.003, 0.004, 0.005]) # strain
y = np.array([200, 400, 610, 800, 1000]) # stress


m, c = linear_regression(x, y)

print(f"Slope (Young's Modulus) = {m:.2e} MPa")
print(f"Intercept = {c:.2f} MPa")

# Predict
new_strain = 0.0035
predicted_stress = m * new_strain + c
print(f"Predicted Stress at Strain {new_strain} = {predicted_stress:.2f} MPa")


Slope (Young's Modulus) = 2.00e+05 MPa
Intercept = 2.00 MPa
Predicted Stress at Strain 0.0035 = 702.00 MPa
