# Weighted Least Squares

## What is Weighted Least Squares (WLS)?

**Weighted Least Squares (WLS)** is a variation of the Ordinary Least Squares (OLS) method. In OLS, it's assumed that the variance of the errors (residuals) is constant across all observations. This assumption is called **homoscedasticity**. However, in many real-world scenarios, this assumption doesn't hold; the errors might be larger for some observations than for others. This situation is called **heteroscedasticity**.

When heteroscedasticity is present, OLS gives equal weight to all data points. This can lead to inefficient parameter estimates (meaning the estimates are not the most precise possible) and incorrect standard errors, which in turn affect the reliability of confidence intervals and hypothesis tests.

WLS addresses this by assigning different **weights** to each data point in the regression. The goal of WLS is to minimize the sum of the *weighted* squared residuals:

$$\sum_{i=1}^{n} w_i (y_i - f(x_i, \beta))^2$$

where:
* $w_i$ is the weight for the $i$-th data point.
* $y_i$ is the observed dependent variable for the $i$-th point.
* $f(x_i, \beta)$ is the predicted value from the model for the $i$-th point, with parameters $\beta$.

**How are weights determined?**
The weights are typically inversely proportional to the variance of the errors for each observation. If $\sigma_i^2$ is the variance of the error for the $i$-th observation, then the weight $w_i$ is usually $1/\sigma_i^2$. This means:
* Observations with smaller errors (lower variance) get larger weights, influencing the fit more.
* Observations with larger errors (higher variance) get smaller weights, influencing the fit less.

## Standard deviation



### Formal Notes on Measurement Uncertainty, Standard Deviation, and Weighted Least Squares

In experimental sciences and data analysis, observations are inherently subject to **measurement uncertainty** (often colloquially referred to as "error"). This uncertainty reflects the lack of perfect knowledge about the true value of a quantity due to limitations of instruments, environmental variations, or inherent stochastic processes.

#### 1. Measurement Uncertainty and Standard Deviation

The **standard deviation ($\sigma$)** is the most common statistical measure used to quantify the spread or dispersion of a set of data points around their mean. In the context of individual experimental measurements, the standard deviation of a measurement (or its uncertainty) refers to the expected variability if that measurement were to be repeated multiple times under identical conditions.

* If a measurement $Y$ is reported as $Y \pm \delta Y$, where $\delta Y$ represents the uncertainty, this $\delta Y$ is frequently taken to be the **standard deviation** of that measurement. It implies that approximately 68.3% of repeated measurements would fall within the range $[Y - \delta Y, Y + \delta Y]$, assuming a normal distribution of errors.

#### 2. Interpretation of Individual Data Points in Weighted Least Squares

When performing regression analysis, especially Weighted Least Squares (WLS), each data point $(x_i, y_i)$ is treated as follows:

* **$x_i$ (Independent Variable):** Typically assumed to be known precisely, or to have negligible uncertainty compared to $y_i$.
* **$y_i$ (Dependent Variable):** This value is considered the **best estimate** (or the sample mean) of the true underlying value of the dependent variable at $x_i$. This implies that if multiple independent measurements of $y$ were taken at $x_i$, $y_i$ would represent their average, aiming to minimize random errors.
* **$\sigma_i$ (Uncertainty/Standard Deviation of $y_i$):** This parameter, provided to the fitting algorithm (e.g., via the `sigma` argument in `scipy.optimize.curve_fit`), quantifies the **standard deviation of the measurement $y_i$**. It reflects the precision with which $y_i$ was determined. A smaller $\sigma_i$ indicates a more precise (less uncertain) measurement, and vice-versa.

#### 3. Application to Provided Data

Given your requirement: "each Y-point now has 0.5 error ($\pm 0.5$ for Y values) except the first point which is Y = 100."

* **For $y_i$ where $i > 0$ (all points except the first):** The stated "$\pm 0.5$ error" is directly interpreted as the **standard deviation ($\sigma_i$)** of that specific measurement. Therefore, for these points, we set $\sigma_i = 0.5$. This indicates that the measured $y_i$ is expected to deviate from its true value with a standard deviation of $0.5$ units.

* **For $y_0$ (the first point, $Y = 100$):** This point is specified as having "no error." In the context of WLS, this implies that $y_0$ is known with exceptionally high precision, effectively acting as a fixed or constrained point. To implement this in a weighted least squares framework, we assign an extremely small standard deviation, such as $\sigma_0 = 1 \times 10^{-9}$. This value is chosen to be practically zero, thereby assigning an exceptionally large weight to this data point in the minimization process, compelling the fitted curve to pass almost exactly through $(x_0, y_0)$.

#### 4. Role in Weighted Least Squares (WLS)

The `scipy.optimize.curve_fit` function, when provided with the `sigma` array and `absolute_sigma=True`, performs a Weighted Least Squares minimization.

* **Weight Calculation:** For each data point $(x_i, y_i)$ with associated standard deviation $\sigma_i$, a weight $w_i$ is calculated as the inverse of the variance: $w_i = \frac{1}{\sigma_i^2}$.
* **Minimization Objective:** The Levenberg-Marquardt algorithm (the default for `curve_fit`) then seeks to minimize the **weighted sum of squared residuals (WSSR)**:
    $$\text{WSSR} = \sum_{i=1}^{N} w_i (y_i - f(x_i, \beta))^2 = \sum_{i=1}^{N} \frac{(y_i - f(x_i, \beta))^2}{\sigma_i^2}$$
    where $f(x_i, \beta)$ is the model's predicted value and $\beta$ represents the model parameters (A, B in your case).
* **Impact of Weights:** Measurements with smaller $\sigma_i$ (higher precision) receive larger weights ($w_i$), thus exerting a greater influence on the determination of the fitted parameters. Conversely, measurements with larger $\sigma_i$ (lower precision) receive smaller weights, having less impact on the fit. This ensures that the fitting process prioritizes minimizing deviations for the more reliable data points.

By incorporating these standard deviations, WLS provides more statistically efficient (more precise) estimates of the model parameters when the assumption of constant error variance (homoscedasticity) is violated, as is the case when different data points have different known uncertainties.