Ordinary Least Squares (OLS) is not scalable in linear regression for large datasets or high-dimensional problems due to the computational and memory inefficiencies of its core operations. Here's why:

### 1. **Matrix Inversion Complexity**
   - In OLS, the coefficients are calculated using the formula:  
     $[
     \hat{\beta} = (X^T X)^{-1} X^T y
     ]$
   - The step $((X^T X)^{-1})$ involves inverting a matrix of size $(n \times n)$, where \(n\) is the number of features (predictors).
   - The time complexity for matrix inversion is \(O(n^3)\), making it computationally expensive as the number of features grows.

### 2. **Memory Usage**
   - The matrix $(X^T X)$ has size $(n \times n)$, and storing it in memory becomes impractical for high-dimensional datasets (large \(n\)).
   - Similarly, storing the input matrix \(X\) itself can be infeasible for datasets with many observations (\(m\)) and features (\(n\)).

### 3. **Numerical Stability**
   - The matrix inversion process can be numerically unstable, especially if \(X^T X\) is close to singular (i.e., not invertible or ill-conditioned). This instability can lead to inaccurate results.

### 4. **Scaling with Large Data**
   - When the number of observations (\(m\)) is very large, computing $(X^T X)$ requires $(O(mn^2))$ operations. This scales poorly with \(m\), making it unsuitable for large datasets.

### Alternatives for Scalability
To address these limitations, more scalable approaches are used for linear regression:

1. **Gradient Descent**  
   - Iterative optimization algorithms like Stochastic Gradient Descent (SGD) compute updates for the coefficients in smaller batches of data. This avoids the need for matrix inversion and reduces memory requirements.
   - Time complexity per iteration: $(O(mn)$) for full-batch or \(O(n)\) for mini-batch SGD.

2. **Stochastic Methods**
   - Methods like coordinate descent focus on updating one parameter at a time, reducing computational overhead.

3. **Distributed Computing**
   - For massive datasets, distributed frameworks like Apache Spark implement linear regression using iterative methods across clusters.

4. **Regularization Techniques**
   - When using regularization (e.g., Ridge or Lasso), the design of the problem inherently changes, sometimes allowing better scalability and handling of ill-conditioning.

OLS remains useful for small to medium-sized datasets due to its simplicity and exact solutions, but its scalability issues necessitate alternative methods for large-scale applications.