# Mean-Reversion and Optimization (Kakushadze)

**Abstract**
- **Systematic Quantitative Framework**:
  - Designed in a **pedagogical** manner to discuss **mean-reversion** and **optimization**.

- **Sequential Approach to Complexity**:
  - **Pair Trading**
  - **Mean-Reversion via Demeaning**
  - **Regression**
  - **Weighted Regression**
  - **(Constrained) Optimization**
  - **Factor Models**

- **Mean-Reversion Implementation**:
  - Detailed methodology based on the established sequence.
  - **Common Pitfalls**:
    - Differentiating between **maximizing the Sharpe ratio** and **minimizing an objective function** when **trading costs** are considered.

- **Optimization Algorithms**:
  - Explicit algorithms addressing:
    - **Linear Costs**
    - **Constraints**
    - **Bounds**

- **Practical Illustration**:
  - Example of an explicit **intraday mean-reversion alpha**.

## Introduction

- **Statistical Arbitrage (StatArb)**:
  - **Definition**: Highly technical **short-term mean-reversion strategies**.
  - **Characteristics**:
    - Involves **large numbers of securities** (hundreds to thousands).
    - Utilizes **very short holding periods** (days to seconds).
    - Requires substantial **computational**, **trading**, and **IT infrastructure** (Lo, 2010).

- **Mean-Reversion Concept**:
  - **Basic Idea**:
    - Certain quantities are **historically correlated**.
    - **Temporary disruptions** in correlations occur due to unusual market conditions.
    - **Expectation**: Correlations will **restore** in the future.
  - **Objective**: Capture profit from **temporary mispricings**.

- **Purpose of the Notes**:
  - Provide a **systematic quantitative framework** for **mean-reversion** and **optimization** in a **pedagogical** manner.
  - **Implementation Approach**:
    - Sequence: **Mean-reversion via demeaning** → **Regression** → **Weighted regression** → **(Constrained) Optimization** → **Factor models**.
    - Start with **pair trading** and progressively add complexity.

- **Framework Details**:
  - **Mean-Reversion Around Means of Returns**.
  - **Regression** and **Weighted Regression**:
    - **Weighted regression** is viewed as a **zero specific risk limit** of **optimization with a factor model**.
  - **Optimization** and **Factor Models**.

- **Practical Intricacies and Pitfalls**:
  - **Common Issues**:
    - Determining if **regression weights** should be based on **historical risk** or **specific risk**.
    - Methods to **optimize regressed returns**.
    - Techniques to **include constraints** in optimization.
    - Distinguishing between **maximizing the Sharpe ratio** and **minimizing an objective function** when **trading costs** are involved.
    - Strategies to **optimize with linear costs**, **constraints**, and **bounds**.

- **Organization of the Notes**:
  - **Section 2: Mean Reversion**:
    - Topics: Pair trading → Multiple stocks → Multiple binary clusters (industries) → Regression → Non-binary generalization → Weighted regression.
  - **Section 3: Optimization**:
    - Topics: Maximizing Sharpe ratio → Adding multiple linear constraints (including dollar neutrality) → Regression as a limit of optimization → Factor models → Optimization with a factor model with linear constraints (including pitfalls).
  - **Section 5: Optimization with Constraints and Costs**:
    - Topics: Difference between **Sharpe ratio maximization** and **objective function minimization**, and scenarios where the latter approximates the former.
  - **Section 6: Explicit Algorithms for Optimization**:
    - Focus: Optimization with **linear costs**, **constraints**, and **bounds**, including within **factor models**.
  - **Section 7: Practical Illustration**:
    - Example: **Intraday mean-reversion alpha** with a **5-year simulated performance**.
    - Based on **overnight returns** and **industry classification**.
    - Includes **risk management** and **outlier handling**.
  - **Section 8: Concluding Remarks**.

- **Footnotes**:
  - **Mean-Reversion Strategy**:
    - Primarily **trader terminology**.
    - Referred to as **contrarian investment strategy** in **academic finance literature**.
    - This paper consistently uses **"mean-reversion (strategy)"**.
  - **Specific Risk**:
    - Also known as **"idiosyncratic risk"** in **multi-factor risk model terminology**.

## **Section 2: Mean-Reversion**

### **2.1 Pair Trading**
- **Statistical Arbitrage (StatArb) Example**: Commonly illustrated using **pair trading**.
- **Strategy Overview**:
  - **Historical Correlation**: Involves two historically correlated stocks within the same sector, e.g., **Exxon Mobil (XOM)** and **Royal Dutch Shell (RDS.A)**.
  - **Scenario**:
    - **Stock A (A) Increases**: A becomes **rich**.
    - **Stock B (B) Decreases**: B becomes **cheap**.
  - **Action**:
    - **Short A** and **Buy B**.
    - **Dollar Neutrality**: Ensures the total position is **dollar neutral**, making it **insensitive to overall market movements** and acting as a **hedge against market risk**.
- **Mean-Reversion Assumption**:
  - **Spread Convergence**: The expectation that the **spread between A and B** will **revert to historical values**.
  - **Profit Source**: Capitalizes on **temporary mispricings** between the two stocks.
- **Challenges in Quantifying "Rich" and "Cheap"**:
  - **Different Price Levels**: Stocks A and B typically have **different price levels**.
  - **Non-Constant Averages**: Prices often exhibit an **upward drift**, making average prices **non-constant**.
  - **Solution**: Utilize **returns** instead of prices to determine relative richness or cheapness.

### **2.2 Returns, Not Prices**
- **Defining "Rich" and "Cheap" via Returns**:
  - **Relative Movement**: If **Stock A's return ($R_A$)** exceeds **Stock B's return ($R_B$)**, then **A is rich** and **B is cheap**.
  - **Action**: **Short A** and **Buy B** based on returns.
- **Return Calculations**:
  - **Price Definitions**:
    - $P_A(t_1)$ and $P_B(t_1)$: Prices of A and B at initial time $t_1$ (e.g., yesterday's close), adjusted for splits and dividends.
    - $P_A(t_2)$ and $P_B(t_2)$: Prices of A and B at later time $t_2$ (e.g., today's open).
  - **Simple Returns**:
    $$
    \begin{align*}
    & R_A = \frac{P_A(t_2)}{P_A(t_1)} - 1 \quad \text{(1)}\\
    & R_B = \frac{P_B(t_2)}{P_B(t_1)} - 1 \quad \text{(2)}
    \end{align*}
    $$
  - **Logarithmic Returns** (for small returns approximation):
    $$
    \begin{align*}
    R_A & \equiv \ln\left(\frac{P_A(t_2)}{P_A(t_1)}\right) \quad \text{(3)}\\
    R_B & \equiv \ln\left(\frac{P_B(t_2)}{P_B(t_1)}\right) \quad \text{(4)}
    \end{align*}
    $$
- **Demeaned Returns**:
  - **Mean Return Calculation**:
    $$
    \bar{R} \equiv \frac{1}{2}(R_A + R_B) \quad \text{(5)}
    $$
  - **Demeaned Returns**:
    $$
    \begin{align*}
    \widetilde{R}_A & \equiv R_A - \bar{R} \quad \text{(6)}\\
    \widetilde{R}_B & \equiv R_B - \bar{R} \quad \text{(7)}
    \end{align*}
    $$
  - **Interpretation**:
    - **$\widetilde{R}_i > 0$**: Stock $i$ is **rich**.
    - **$\widetilde{R}_i < 0$**: Stock $i$ is **cheap**.
  - **Strategy**:
    - **Short** stocks with **positive demeaned returns**.
    - **Buy** stocks with **negative demeaned returns**.
- **Dollar Neutral Positioning**:
  - **Number of Shares**: $Q_i$ for each stock $i$ (A, B).
  - **Conditions**:
    $$
    \begin{align*}
    & P_A |Q_A| + P_B |Q_B| = I \quad \text{(8)}\\
    & P_A Q_A + P_B Q_B = 0 \quad \text{(9)}
    \end{align*}
    $$
    - **$I$**: Total desired **dollar investment**.
    - **$Q_i < 0$**: Indicates **short sales**.
    - **$Q_i > 0$**: Indicates **buy positions**.
    - **Assumptions**: No leverage and **zero margins** (see footnote 6).

### **2.3 Generalization to Multiple Stocks**
- **Extending Beyond Pairs**:
  - **Example Set**: Multiple historically correlated stocks within the same sector, e.g., **Exxon Mobil (XOM)**, **Royal Dutch Shell (RDS.A)**, **Total (TOT)**, **Chevron (CVX)**, and **BP (BP)**.
  - **Objective**: Develop a **mean-reversion strategy** for the entire set rather than individual pairs.
  - **Solution**: Utilize **demeaned returns** to simplify the strategy.
- **Definitions for Multiple Stocks**:
  - **Returns** for $N$ historically correlated stocks:
    $$
    \begin{align*}
    R_i & = \ln\left(\frac{P_i(t_2)}{P_i(t_1)}\right) \quad \text{(10)}\\
    \bar{R} & \equiv \frac{1}{N} \sum_{i=1}^{N} R_i \quad \text{(11)}\\
    \widetilde{R}_i & \equiv R_i - \bar{R} \quad \text{(12)}
    \end{align*}
    $$
  - **Interpretation**:
    - **$\widetilde{R}_i > 0$**: Stock $i$ is **rich**.
    - **$\widetilde{R}_i < 0$**: Stock $i$ is **cheap**.
  - **Strategy**:
    - **Short** stocks with **positive demeaned returns ($\widetilde{R}_i > 0$)**.
    - **Buy** stocks with **negative demeaned returns ($\widetilde{R}_i < 0$)**.
- **Position Sizing Conditions**:
  - **Total Investment and Dollar Neutrality**:
    $$
    \begin{align*}
    & \sum_{i=1}^{N} P_i |Q_i| = I \quad \text{(13)}\\
    & \sum_{i=1}^{N} P_i Q_i = 0 \quad \text{(14)}
    \end{align*}
    $$
    - **$I$**: Total desired **dollar investment**.
    - **Dollar Neutrality**: Ensures $\sum_{i=1}^{N} P_i Q_i = 0$.
    - **Challenge**: With **$N > 2$**, there are **more unknowns** ($Q_i$) than **equations**.
- **Specifying the Number of Shares ($Q_i$)**:
  - **Dollar Positions Definition**:
    $$
    D_i \equiv P_i Q_i \quad \text{(15)}
    $$
  - **Proportional to Demeaned Returns**:
    $$
    D_i = -\gamma \widetilde{R}_i \quad \text{(16)}
    $$
    - **$\gamma > 0$**: Scaling factor.
    - **Implications**:
      - **Short** stocks where **$\widetilde{R}_i > 0$**.
      - **Buy** stocks where **$\widetilde{R}_i < 0$**.
  - **Ensuring Dollar Neutrality**:
    - Since $\sum_{i=1}^{N} \widetilde{R}_i = 0$, **Equation (14)** is automatically satisfied.
  - **Determining $\gamma$**:
    $$
    \gamma = \frac{I}{\sum_{i=1}^{N} |\widetilde{R}_i|} \quad \text{(17)}
    $$
  - **Strategy Definition**: **Equation (16)** outlines one possible **mean-reversion strategy**.
  - **Drawback**:
    - **Volatility Bias**: Positions are **larger in more volatile stocks** due to larger **$|\widetilde{R}_i|$**.
  - **Future Considerations**:
    - **Risk Management**: Addressing the volatility bias and other risks.
    - **Alternative Strategies**: Exploring different methods for constructing **$D_i$**.
    - **Further Generalizations**: Additional enhancements to the mean-reversion strategy.

- **Footnotes**:
    - **Footnote 5**:
        - **$\bar{R}$**: Refers to the **cross-sectional mean return**, not the **time series mean return**.
        - **$\widetilde{R}_A$, $\widetilde{R}_B$, and $\widetilde{R}_i$**: Represent the **deviation from the mean return ($\bar{R}$)**.
    - **Footnote 6**:
        - We assume no leverage and 0 margins. Nontrivial leverage simply rescales the investment level $I$. If margins are present, on top of $I$ invested in stocks, we need an additional amount $I^{\prime}$ to maintain margins, which simply reduces the strategy return due to the borrowing interest rate.

### **2.4 Generalization to Multiple Clusters**
- **Clusters Definition**:
  - **Clusters**: Groups of stocks analyzed collectively.
  - **Classification**: Based on industry classification schemes (e.g., industries, sub-industries).
  - **Notation**:
    - **$K$ Clusters**: Labeled as $A = 1, \ldots, K$.
    - **$\Lambda_{iA}$**: **Loadings matrix** ($N \times K$) where:
      - $\Lambda_{iA} = 1$ if **stock $i$** belongs to **cluster $A$**.
      - $\Lambda_{iA} = 0$ otherwise.
  - **Assumptions**:
    - **Exclusive Membership**: Each stock belongs to **one and only one cluster**.
    - **Equations**:
      $$
      \begin{align*}
      & N_A \equiv \sum_{i=1}^{N} \Lambda_{iA} > 0 \quad \text{(18)}\\
      & N = \sum_{A=1}^{K} N_A \quad \text{(19)}
      \end{align*}
      $$
    - **Mapping Function**:
      $$
      \begin{align*}
      & \Lambda_{iA} = \delta_{G(i), A} \quad \text{(20)}\\
      & G: \{1, \ldots, N\} \mapsto \{1, \ldots, K\} \quad \text{(21)}
      \end{align*}
      $$
      - **$G(i)$**: Maps **stock $i$** to **cluster $A$**.
      - **$\delta_{ab}$**: **Kronecker delta** (1 if $a = b$, else 0).
- **Mean-Reversion Across Clusters**:
  - **Separate Mean-Reversion**: Perform mean-reversion **individually for each cluster**.
  - **Regression Integration**: Facilitates **compact representation** of demeaned returns across all clusters.

- **Footnotes**:
  - **Footnote 7**:
    - **Cluster Examples**: 
      - Oil sector
      - Technology sector
      - Healthcare sector

### **2.5 Regression**
- **Linear Regression Setup**:
  - **Objective**: Regress **stock returns ($R_i$)** on **cluster indicators ($\Lambda_{iA}$)**.
  - **Model Specification**:
    - **Without Intercept**: Assumes cluster means capture necessary information.
    - **Unit Weights**: Each cluster treated equally in regression.
    - **R Notation**:
      $$
      R \sim -1 + \Lambda \quad \text{(22)}
      $$
      - **$R$**: **$N$-vector** of stock returns ($R_i$).
      - **$\Lambda$**: **$N \times K$ loadings matrix**.
- **Regression Equation**:
  $$
  R_i = \sum_{A=1}^{K} \Lambda_{iA} f_A + \varepsilon_i \quad \text{(23)}
  $$
  - **Components**:
    - **$f_A$**: **Regression coefficients** for each cluster.
    - **$\varepsilon_i$**: **Regression residuals** (demeaned returns).
- **Regression Coefficients Calculation**:
  $$
  \begin{align*}
  & f = Q^{-1} \Lambda^T R \quad \text{(24)}\\
  & Q \equiv \Lambda^T \Lambda \quad \text{(25)}
  \end{align*}
  $$
  - **$Q$**: **Covariance matrix** of the loadings.
- **Residuals Interpretation**:
  $$
  \begin{align*}
  & \varepsilon = R - \Lambda Q^{-1} \Lambda^T R \quad \text{(26)}\\
  & Q_{AB} = N_A \delta_{AB} \quad \text{(27)}\\
  & \bar{R}_A \equiv \frac{1}{N_A} \sum_{j \in J_A} R_j \quad \text{(28)}\\
  & \varepsilon_i = R_i - \bar{R}_{G(i)} = \widetilde{R}_i \quad \text{(29)}
  \end{align*}
  $$
  - **Explanation**:
    - **$\bar{R}_A$**: **Mean return** for **cluster $A$**.
    - **$\widetilde{R}_i$**: **Demeaned return** for **stock $i$** (residual).
- **Demeaned Returns Properties**:
  $$
  \begin{align*}
  & \sum_{i=1}^{N} \widetilde{R}_i \Lambda_{iA} = 0, \quad A = 1, \ldots, K \quad \text{(30)}\\
  & \sum_{i=1}^{N} \widetilde{R}_i \nu_i = 0 \quad \text{(31)}\\
  & \sum_{A=1}^{K} \Lambda_{iA} = \nu_i \quad \text{(32)}
  \end{align*}
  $$
  - where $\nu_{i} \equiv 1, i=1, \ldots, N$, i.e., the $N$-vector $\nu$ is the unit vector. In the regression language, $\nu$ is referred to as the intercept. 
  - **Interpretations**:
    - **(30)**: **Cluster Neutrality** - Demeaned returns sum to zero within each cluster.
    - **(31)**: **Dollar Neutrality** - Sum of demeaned returns across all stocks is zero.
    - **(32)**: **Loadings Sum** - Each row of $\Lambda$ sums to **1** (since each stock belongs to one cluster).
  - **Implications**:
    - **Intercept Subsumed**: No need to add an **intercept** to the regression model as it's already incorporated via cluster means.
    - **General Case**: To satisfy **(31)**, an **intercept column** would need to be added to the **loadings matrix**.
    - **Dollar Neutrality vs. (31)**:
      - **(31)** Equivalence: **Dollar neutrality** in the strategy aligns with **(31)**.
      - **General Requirement**: **Dollar neutrality** does **not inherently require (31)** unless specified by the model.

- **Footnotes**:
  - **Footnote 8**:
    - **R Package**: Refers to **R**, the programming language used for statistical computing.
    - **Model Notation**: The symbol **"$\sim$"** in equation (22) denotes a **linear model** in R.

### **2.6 Non-binary Generalization**
- **Orthogonality Condition**:
  - **Cluster Neutrality**: Demeaned returns are **orthogonal** to cluster vectors.
  - **Mathematical Representation**:
    $$
    \widetilde{R}^T v^{(A)} = 0 \quad \text{(33)}
    $$
    - **$v_i^{(A)}$**: Defined as $\Lambda_{iA}$.
  - **Applicability**: This orthogonality holds for **any loadings matrix**, not just binary.
- **Generalized Loadings Matrix ($\Omega_{iA}$)**:
  - **Composition**:
    - **Binary Columns**: Represent **industry (cluster) based risk factors**.
    - **Non-binary Columns**: Represent **non-industry based risk factors**.
  - **Regressed Returns**:
    - **Definition**: Residuals from the regression of $R_i$ over $\Omega_{iA}$.
    - **Formula**:
      $$
      \widetilde{R} \equiv R - \Omega Q^{-1} \Omega^T R \quad \text{(35)}
      $$
      $$
      Q \equiv \Omega^T \Omega \quad \text{(36)}
      $$
- **Intercept Inclusion**:
  - **Purpose**: To satisfy the condition $\sum_{i=1}^{N} \widetilde{R}_i = 0$.
  - **Implementation**:
    - **R Notation**:
      $$
      R \sim \Omega \quad \text{(37)}
      $$
    - **Effect**: Adds a **unit column** to $\Omega$:
      $$
      \Omega_{iA_1} \equiv \nu_i = 1, \quad \forall i \in \{1, \ldots, N\}
      $$


### **2.7 Weighted Regression**
- **Objective**: Address the **volatility bias** where positions are dominated by **volatile stocks**.
- **Weighted Regression Strategy**:
  - **Modification**:
    - Scale $\widetilde{R}_i$ by $\sigma_i^2$ to reduce volatility influence.
    - **Weighted Returns**:
      $$
      \widehat{R}_i \equiv \frac{\widetilde{R}_i}{\sigma_i^2} \quad \text{(Preferred Scaling)}
      $$
      - **Reasoning**: Maximizes the **Sharpe ratio** by incorporating risk scaling.
  - **Challenges**:
    - **Dollar Neutrality**:
      - Scaling $\widetilde{R}_i$ by $\sigma_i^2$ can **violate dollar neutrality**:
        $$
        \sum_{i=1}^{N} \widehat{R}_i \neq 0
        $$
    - **Solution**: Utilize **weighted regression** to maintain dollar neutrality.
- **Weighted Regression Formulation**:
  - **Definitions**:
    $$
    \begin{align*}
    & \varepsilon \equiv R - \Omega Q^{-1} \Omega^T Z R \quad \text{(39)}\\
    & Z \equiv \text{diag}(z_i) \quad \text{(40)}\\
    & Q \equiv \Omega^T Z \Omega \quad \text{(41)}\\
    & \widetilde{R} \equiv Z \varepsilon \quad \text{(42)}
    \end{align*}
    $$
    - **$Z$**: **Weight Matrix** with $z_i = \frac{1}{\sigma_i^2}$.
  - **Orthogonality Condition**:
    $$
    \sum_{i=1}^{N} \widetilde{R}_i \Omega_{iA} = 0, \quad \forall A \in \{1, \ldots, K\} \quad \text{(43)}
    $$
  - **Intercept Inclusion**:
    - **If Intercept is Included**:
      $$
      \sum_{i=1}^{N} \widetilde{R}_i = 0
      $$
  - **Benefits**:
    - **Dollar Neutrality Maintained**: Ensures $\sum_{i=1}^{N} D_i = 0$.
    - **Risk Management**: Positions are **neutral** with respect to risk factors in $\Omega$.
    - **Volatility Control**: Positions are **not dominated** by volatile stocks.
  - **Resulting Strategy**:
    - **Real Mean-Reversion Strategy**: Combines **dollar neutrality** with **risk management**.

- **Footnotes**:
  - **Footnote 9**:
    - **Clarification**: The choice between $\frac{\widetilde{R}_i}{\sigma_i}$ and $\frac{\widetilde{R}_i}{\sigma_i^2}$ is addressed in the context of **optimization**.
    - **Rationale**: Scaling by $\sigma_i^2$ maximizes the **Sharpe ratio**, despite initial appearances.


### **2.8 Remarks**
- **Diverse Strategies for Specifying $D_i$**:
  - **(16) as One Approach**:
    - **Dollar Neutrality**: Achieved automatically if regression includes an **intercept**.
    - **Risk Management**: Weighted regression ($z_i = \frac{1}{\sigma_i^2}$) downweights contributions from **high volatility stocks**.
- **Other Mean-Reversion Strategies**:
  - **Equally Weighted Holdings**:
    - **Formula**:
      $$
      D_i = -\gamma \operatorname{sign}(\widetilde{R}_i) \quad \text{(44)}
      $$
      - **$\gamma > 0$**: Sets **equal absolute dollar amounts** for longs and shorts.
    - **Shortcomings**:
      - **Dollar Neutrality Issue**:
        $$
        \sum_{i=1}^{N} D_i = \frac{N_- - N_+}{N} I \quad \text{(45)}
        $$
        - **$N_+$**: Number of stocks with **positive regressed returns**.
        - **$N_-$**: Number of stocks with **negative regressed returns**.
        - **Impact**: **Mismatch edge** of order $\frac{I}{\sqrt{N}}$, e.g., **2%** for $N \sim 2,500$.
      - **Discontinuity in $\operatorname{sign}(x)$**:
        - **Issue**: **Instability** due to **sign flips** near zero.
        - **Consequence**: **Unnecessary portfolio turnover** and **additional trading costs**.
    - **Mitigation**:
      - **Smoothing $\operatorname{sign}(x)$**:
        - **Approximation**:
          $$
          D_i = -\gamma \tanh\left(\frac{\widetilde{R}_i}{\kappa}\right) \quad \text{(46)}
          $$
          - **$\kappa$**: **Cross-sectional standard deviation** of $\widetilde{R}_i$.
        - **Behavior**:
          - **For $\left|\widetilde{R}_i\right| \ll \kappa$**: Approximates to **(16)**.
          - **For $\left|\widetilde{R}_i\right| \gtrsim \kappa$**: **Dollar holdings are "squashed"**.
        - **Impact**:
          - **Unit Weights**: More volatile stocks have larger $\left|\widetilde{R}_i\right|$, leading to **suppression**.
          - **Weighted Regression** ($z_i = \frac{1}{\sigma_i^2}$):
            - **Effect**: **Suppresses contributions** from **more volatile stocks**.
            - **Dollar Neutrality**: Achieved by **scaling down** or **setting $D_i$ to zero** for volatile stocks.
  - **Alternative Approaches**:
    - **Nonlinear Alphas**:
      - **Formula**:
        $$
        D_i = -\gamma \widetilde{R}_i |\widetilde{R}_i| \quad \text{(47)}
        $$
      - **General Form**:
        $$
        D_i = -\gamma \widetilde{R}_i f(\widetilde{R}_i) \quad \text{(48)}
        $$
        - **$f(x)$**: **Nonlinear function** (e.g., $\tanh$).
      - **Characteristics**:
        - **Requires Additional Adjustments**: To achieve **dollar neutrality**.
        - **Alpha Selection**: Based on **backtesting performance**; **alphas are ephemeral**.
        - **Empirical Nature**: **No theoretical prescription** for choosing $f(x)$.
  - **Ranking-Based Strategies**:
    - **Method**:
      - **Rank Stocks**: Cross-sectionally by $\left|\widetilde{R}_i\right|$, assign **integer rank** $r_i$.
      - **Formula**:
        $$
        D_i = -\gamma \operatorname{sign}(\widetilde{R}_i) r_i \quad \text{(49)}
        $$
    - **Alternative**:
      - **Thresholding**: Set $D_i = 0$ for stocks with $r_i < r_*$.
    - **Considerations**:
      - **Risk Management** and **Dollar Neutrality**: Similar issues as with other strategies.
      - **Nonlinear Functions**: Possible to use **nonlinear functions** of $r_i$.
  - **Asymmetric Dollar Neutrality**:
    - **Alternative Approaches**:
      - **Long Cash and Short Futures**:
        - **Example**: **S&P outperformance portfolio**.
        - **Constraints**: **Lower bounds** $D_i \geq 0$.
      - **Shorting Tracking Portfolios**:
        - **Options**:
          - **Futures**.
          - **Minimum Variance Portfolio**.
        - **Purpose**: **Diversified index exposure** or **proprietary trading universe**.
  - **Summary of Remarks**:
    - **Variety of Strategies**: Numerous ways to specify **$D_i$** based on **regressed returns**.
    - **Dollar Neutrality**: Often requires additional adjustments to maintain.
    - **Risk Management**: Essential to prevent **volatility dominance** and **overtrading**.
    - **Empirical Nature**: **Alpha selection** is **dynamic** and **empirical**, lacking a **theoretical foundation**.

- **Footnotes**:
  - **Footnote 10**:
    - **Context**: Discusses that the described strategies are **limits of optimization**.
  - **Footnote 11**:
    - **Clarification**: **Instability** due to **changes** or **computational uncertainties** can lead to **overtrading**.
  - **Footnote 12**:
    - **Additional Methods**:
      - **Minimum Variance Portfolio**: Short positions can be based on a **minimum variance** approach rather than index tracking.



## **Section 3: Optimization**

### **3.1 Maximizing Sharpe Ratio**
- **Covariance and Correlation Matrices**:
  - **Sample Covariance Matrix ($C_{ij}$)**: Represents the covariance of $N$ stock returns $R_i(t_s)$ over $M+1$ time periods.
  - **Correlation Matrix ($\Psi_{ij}$)**:
    $$
    C_{ij} = \sigma_i \sigma_j \Psi_{ij} \quad \text{(50)}
    $$
    - **$\Psi_{ii} = 1$**: Diagonal elements are unity.
- **Portfolio Metrics**:
  - **Portfolio P&L ($P$)**:
    $$
    P = \sum_{i=1}^{N} R_i D_i \quad \text{(51)}
    $$
  - **Portfolio Volatility ($V$)**:
    $$
    V = \sqrt{\sum_{i,j=1}^{N} C_{ij} D_i D_j} \quad \text{(52)}
    $$
  - **Sharpe Ratio ($S$)**:
    $$
    S = \frac{P}{V} \quad \text{(53)}
    $$
- **Holding Weights**:
  - **Dimensionless Weights ($w_i$)**:
    $$
    w_i \equiv \frac{D_i}{I} \quad \text{(54)}
    $$
    - **Normalization Condition**:
      $$
      \sum_{i=1}^{N} |w_i| = 1 \quad \text{(55)}
      $$
    - **Expressions in Terms of $w_i$**:
      $$
      P = I \widetilde{P} \equiv I \sum_{i=1}^{N} R_i w_i \quad \text{(56)}
      $$
      $$
      V = I \widetilde{V} \equiv I \sqrt{\sum_{i,j=1}^{N} C_{ij} w_i w_j} \quad \text{(57)}
      $$
- **Sharpe Ratio Maximization**:
  - **Objective**:
    $$
    S \rightarrow \max \quad \text{(58)}
    $$
  - **Solution Without Constraints**:
    $$
    w_i = \gamma \sum_{j=1}^{N} C_{ij}^{-1} R_j \quad \text{(59)}
    $$
    - **$\gamma$**: Normalization coefficient determined by (55).
    - **Note**: **Covariance Matrix Invertibility** is assumed; discuss later.
  - **Issue**:
    - **Non-Dollar Neutrality**: Holding weights from (59) do not ensure $\sum_{i=1}^{N} w_i = 0$.
    - **Example**: If $C_{ij}$ is diagonal and all $R_i > 0$, then all $w_i > 0$.

### **3.2 Linear Constraints; Dollar Neutrality**
- **Achieving Dollar Neutrality**:
  - **Scale Invariance of Sharpe Ratio**: Sharpe ratio remains unchanged under scaling $w_i \rightarrow \zeta w_i$, $\zeta > 0$.
  - **Reformulation as Quadratic Minimization**:
    $$
    \begin{align*}
    g(w, \lambda) & \equiv \frac{\lambda}{2} \sum_{i,j=1}^{N} C_{ij} w_i w_j - \sum_{i=1}^{N} R_i w_i \quad \text{(60)} \\
    g(w, \lambda) & \rightarrow \min \quad \text{(61)}
    \end{align*}
    $$
    - **$\lambda > 0$**: Scaling parameter.
    - **Solution**:
      $$
      w_i = \frac{1}{\lambda} \sum_{j=1}^{N} C_{ij}^{-1} R_j \quad \text{(62)}
      $$
  - **Introducing Constraints**:
    - **Lagrangian with Constraints**:
      $$
      \begin{align*}
      g(w, \mu, \lambda) & \equiv \frac{\lambda}{2} \sum_{i,j=1}^{N} C_{ij} w_i w_j - \sum_{i=1}^{N} R_i w_i - \sum_{a=1}^{m} \sum_{i=1}^{N} w_i Y_{ia} \mu_a \quad \text{(63)} \\
      g(w, \mu, \lambda) & \rightarrow \min \quad \text{(64)}
      \end{align*}
      $$
      - **Loadings Matrix ($Y_{ia}$)**: $N \times m$ matrix representing $m$ homogeneous linear constraints.
      - **Lagrange Multipliers ($\mu_a$)**: Associated with each constraint.
    - **First-Order Conditions**:
      $$
      \begin{align*}
      \lambda \sum_{j=1}^{N} C_{ij} w_j &= R_i + \sum_{a=1}^{m} Y_{ia} \mu_a \quad \text{(65)} \\
      \sum_{i=1}^{N} w_i Y_{ia} &= 0 \quad \text{(66)}
      \end{align*}
      $$
      - **Example Constraint**: Dollar neutrality if $Y_{i1} = \nu_i = 1$ for some $a_1$.
  - **Solution with Constraints**:
    - **Matrix Formulation**:
      $$
      \begin{align*}
      w &= \frac{1}{\lambda} \left[ C^{-1} - C^{-1} Y \left( Y^T C^{-1} Y \right)^{-1} Y^T C^{-1} \right] R \quad \text{(67)} \\
      \mu &= -\left( Y^T C^{-1} Y \right)^{-1} Y^T C^{-1} R \quad \text{(68)}
      \end{align*}
      $$
    - **Expanded Notation**:
      $$
      \begin{align*}
      \omega &= \frac{1}{\lambda} \Gamma^{-1} \rho \quad \text{(69)} \\
      \omega^T &\equiv \left( w^T, -\lambda^{-1} \mu^T \right) \quad \text{(70)} \\
      \rho^T &\equiv \left( R^T, O^T \right) \quad \text{(71)} \\
      \Gamma &\equiv \begin{pmatrix} C & Y \\ Y^T & \mathbb{O} \end{pmatrix} \quad \text{(72)}
      \end{align*}
      $$
      - **$\omega$**: $(N+m)$-vector combining $w$ and $\mu$.
      - **$\rho$**: $(N+m)$-vector combining $R$ and zero vector $O$.
      - **$\Gamma$**: $(N+m) \times (N+m)$ matrix combining $C$, $Y$, and zero matrices.
  - **Handling Inhomogeneous Constraints**:
    - **Extension**: Constraints of the form $\sum_{i=1}^{N} w_i Y_{ia} + y_a = 0$.
    - **Modification**:
      - **$\rho_a = -\lambda y_a$**.
    - **Implications**:
      - **Different from Sharpe Ratio Maximization**: Inhomogeneous constraints break scale invariance.
      - **Additional Considerations**: Refer to Section 5 for detailed handling.
  - **Dollar Neutrality Constraint**:
    - **Implementation**: Set $Y_{i1} = \nu_i = 1$ for dollar neutrality.
    - **Number of Constraints**: $m = 1$ suffices for single constraint.

- **Footnotes**
  - **Footnote 12**:
    - **Context**: Discusses the portfolio composition when short positions are based on a minimum variance portfolio.
  - **Footnote 13**:
    - **Extension to Inhomogeneous Constraints**:
      - **Constraints Form**: $\sum_{i=1}^{N} w_i Y_{ia} + y_a = 0$.
      - **Solution Adjustment**: $\rho_a = -\lambda y_a$.
      - **Impact**:
        - **Sharpe Ratio Maximization**: Inhomogeneous constraints disrupt scale invariance.
        - **Additional Care Needed**: Refer to Section 5 for handling inhomogeneous constraints.
      - **Note**: Homogeneous constraints ($y_a = 0$) align with Sharpe ratio maximization via objective function minimization.

### **3.3 Regression as Constrained Diagonal Optimization**
  
  - **Diagonal Covariance Matrix Assumption**:
    - **Form**: $C_{ij} = \sigma_i^2 \delta_{ij}$.
  
  - **Solution under Diagonal Covariance**:
    $$
    w = \frac{1}{\lambda} \left[ Z - Z Y \left( Y^T Z Y \right)^{-1} Y^T Z \right] R = \frac{1}{\lambda} Z \varepsilon = \frac{1}{\lambda} \widetilde{R} \quad \text{(73)}
    $$
    - **Definitions**:
      - **$Z$**:
        $$
        Z \equiv \operatorname{diag}\left( \frac{1}{\sigma_i^2} \right)
        $$
      - **$\varepsilon_i$**: Residuals from the weighted regression with weights $z_i = \frac{1}{\sigma_i^2}$.
      - **$\widetilde{R}_i$**: Regressed returns.
  
  - **Interpretation**:
    - **Equivalence**: Diagonal constrained optimization **is equivalent** to the **weighted regression** discussed in Subsection 2.7.
    - **Loadings Matrix Identification**: $\Omega$ is identified with the constraint matrix $Y$.
    - **Regression Weights**: $z_i = \frac{1}{\sigma_i^2}$.
    - **Holding Weights**: Given by **regressed returns** $\widetilde{R}_i$ up to normalization via (55).
    - **Dollar Neutrality**:
      - **Condition**: If the constraint matrix includes the **intercept** (unit vector), then holding weights correspond to a **dollar neutral** portfolio.

### **3.4 Regression as Limit of Optimization**
  
  - **Auxiliary Matrix Construction**:
    $$
    \begin{align*}
    \Theta &\equiv \Xi + \zeta \Omega \Omega^T \quad \text{(74)} \\
    \Xi &\equiv Z^{-1} \quad \text{(75)}
    \end{align*}
    $$
    - **Parameters**:
      - **$\zeta$**: Scaling parameter.
      - **$\Omega$**: Loadings matrix.
  
  - **Inverse of $\Theta$**:
    $$
    \begin{align*}
    \Theta^{-1} &= Z - \zeta Z \Omega \widetilde{Q}^{-1} \Omega^T Z \quad \text{(76)} \\
    \widetilde{Q}_{AB} &\equiv \delta_{AB} + \zeta \sum_{i=1}^{N} z_i \Omega_{iA} \Omega_{iB} \quad \text{(77)}
    \end{align*}
    $$
  
  - **Limit as $\zeta \rightarrow \infty$**:
    $$
    \begin{align*}
    \Theta^{-1} &= Z - Z \Omega Q^{-1} \Omega^T Z \quad \text{(78)} \\
    Q &\equiv \Omega^T Z \Omega \quad \text{(79)} \\
    \widetilde{R} &= \Theta^{-1} R \quad \text{(80)}
    \end{align*}
    $$
    - **Implications**:
      - **Regression as a Limit**: **Regression** is a **limit of optimization** where the covariance matrix is $\Theta$.
      - **Relation to Factor Models**: Aligns with **factor model forms** with a subtle distinction (discussed below).

### **3.5 Factor Models**
  
  - **Multi-Factor Risk Model Structure**:
    $$
    \begin{align*}
    \Theta &\equiv \Xi + \widetilde{\Omega} \Phi \widetilde{\Omega}^T \quad \text{(81)} \\
    \Xi_{ij} &\equiv \xi_i^2 \delta_{ij} \quad \text{(82)}
    \end{align*}
    $$
    - **Components**:
      - **$\xi_i$**: **Specific (Idiosyncratic) Risk** for each stock.
      - **$\widetilde{\Omega}_{iA}$**: $N \times K$ **Factor Loadings Matrix**.
      - **$\Phi_{AB}$**: **Factor Covariance Matrix**, $A, B = 1, \ldots, K$.
  
  - **Model Representation**:
    $$
    \begin{align*}
    \Upsilon_i &= \chi_i + \sum_{A=1}^{K} \widetilde{\Omega}_{iA} f_A \quad \text{(83)} \\
    \langle \chi_i, \chi_j \rangle &= \Xi_{ij} \quad \text{(84)} \\
    \langle \chi_i, f_A \rangle &= 0 \quad \text{(85)} \\
    \langle f_A, f_B \rangle &= \Phi_{AB} \quad \text{(86)} \\
    \langle \Upsilon_i, \Upsilon_j \rangle &= \Theta_{ij} \quad \text{(87)}
    \end{align*}
    $$
    - **Interpretations**:
      - **$\Upsilon_i$**: Random process for stock $i$.
      - **$\chi_i$**: **Specific Risk Component**.
      - **$f_A$**: **Factor Risk Components**.
  
  - **Covariance Matrix Construction**:
    $$
    \begin{align*}
    \Theta &= \Xi + \Omega \Omega^T \quad \text{(88)} \\
    \Omega &\equiv \widetilde{\Omega} \widetilde{\Phi} \quad \text{(89)} \\
    \widetilde{\Phi} \widetilde{\Phi}^T &= \Phi \quad \text{(90)}
    \end{align*}
    $$
    - **$\widetilde{\Phi}_{AB}$**: **Cholesky Decomposition** of $\Phi_{AB}$.
    - **Normalization**: $\zeta = 1$ in previous subsection.
  
  - **Factor Model Advantages**:
    - **Stability**: $\Theta_{ij}$ is expected to be **more stable out-of-sample** compared to $C_{ij}$.
    - **Dimensionality Reduction**: Factor covariance matrix $\Phi_{AB}$ involves **$K \ll N$** parameters.
    - **Singular Covariance Matrix Handling**:
      - **Condition**: If $M < N$, then $C_{ij}$ is **singular** with only $M < N$ nonzero eigenvalues.
      - **Factor Model Solution**: Ensures **invertibility** of $\Theta_{ij}$ by incorporating specific risks $\xi_i > 0$ and a **positive-definite** factor covariance matrix $\Phi_{AB}$.
  
  - **Model Representation Summary**:
    - **Factor Model Form**:
      $$
      \Theta = \Xi + \widetilde{\Omega} \Phi \widetilde{\Omega}^T
      $$
    - **Random Processes**:
      $$
      \begin{align*}
      \Upsilon_i &= \chi_i + \sum_{A=1}^{K} \widetilde{\Omega}_{iA} f_A \\
      \langle \chi_i, \chi_j \rangle &= \Xi_{ij} \\
      \langle \chi_i, f_A \rangle &= 0 \\
      \langle f_A, f_B \rangle &= \Phi_{AB} \\
      \langle \Upsilon_i, \Upsilon_j \rangle &= \Theta_{ij}
      \end{align*}
      $$

### 3.6 Optimization with Factor Model

Suppose we have a **factor model covariance matrix** $\Theta_{ij}$. When we maximize the Sharpe ratio using this factor model covariance matrix, the resulting holding weights are given by (where $\lambda$ is fixed via equation (55)):

$$
\begin{align*}
w_i &= \frac{1}{\lambda \xi_i^2} \left( R_i - \sum_{j=1}^{N} \frac{R_j}{\xi_j^2} \sum_{A,B=1}^{K} \Omega_{iA} \Omega_{jB} \widetilde{Q}_{AB}^{-1} \right) \tag{91} \\
\widetilde{Q}_{AB} &\equiv \delta_{AB} + \sum_{i=1}^{N} \frac{1}{\xi_i^2} \Omega_{iA} \Omega_{iB} \tag{92}
\end{align*}
$$

As in the general case, these holding weights are **not dollar neutral**.

#### 3.6.1 Linear Constraints

Similar to the general case, in the context of the factor model, we can incorporate multiple **homogeneous linear constraints** (equation (66)). Let $\widehat{\Omega}_{i\alpha}$, where $\alpha \in H \equiv \{a\} \cup \{A\}$ (i.e., the index $\alpha$ has $m$ values corresponding to the index $a$ and $K$ values corresponding to the index $A$), be the following $N \times (K+m)$ matrix:

$$
\begin{align*}
\widehat{\Omega}_{ia} &\equiv Y_{ia} \tag{93} \\
\widehat{\Omega}_{iA} &\equiv \Omega_{iA} \tag{94}
\end{align*}
$$

The corresponding holding weights are then given by:

$$
\begin{align*}
w_i &= \frac{1}{\lambda \xi_i^2} \left( R_i - \sum_{j=1}^{N} \frac{R_j}{\xi_j^2} \sum_{\alpha,\beta \in H} \widehat{\Omega}_{i\alpha} \widehat{\Omega}_{j\beta} \widehat{Q}_{\alpha\beta}^{-1} \right) \tag{95} \\
\widehat{Q}_{\alpha\beta} &\equiv \varphi_{\alpha\beta} + \sum_{i=1}^{N} \frac{1}{\xi_i^2} \widehat{\Omega}_{i\alpha} \widehat{\Omega}_{i\beta} \tag{96}
\end{align*}
$$

where $\widehat{Q}_{\alpha\beta}^{-1}$ is the inverse of $\widehat{Q}_{\alpha\beta}$, and:

$$
\sum_{i=1}^{N} w_i Y_{ia} = \sum_{i=1}^{N} w_i \widehat{\Omega}_{ia} = \sum_{\alpha,\beta \in H} \sum_{j=1}^{N} \frac{R_j}{\xi_j^2} \widehat{\Omega}_{j\beta} \varphi_{a\alpha} \widehat{Q}_{\alpha\beta}^{-1} = 0 \tag{97}
$$

Thus, the holding weights $w_i$ satisfy the constraints (66).

#### 3.6.2 Optimization with Constraints

The constraints (66) typically relate to **risk management**. Apart from **dollar neutrality** (i.e., roughly, the market neutrality constraint), other constraints often include requirements for **neutrality with respect to other risk factors**, such as:

- **Industry Neutrality**
- **Style Risk Factors Neutrality** (e.g., size, liquidity, volatility, momentum)
- **Other Non-industry Risk Factors** (e.g., principal component-based risk factors or betas)

In practice, one often uses the same **risk factors** in $Y_{ia}$ as those in the **factor loadings matrix** $\Omega_{iA}$.¹ If that is the case, there is a certain **redundancy** in the matrix $\widehat{\Omega}_{i\alpha}$, which we address next.

¹ **Footnote 14**:
> More precisely, usually one would use the **unrotated factor loadings** $\widetilde{\Omega}_{iA}$—recall that $\Omega = \widetilde{\Omega} \widetilde{\Phi}$, where $\widetilde{\Phi}$ is the Cholesky decomposition of the factor covariance matrix $\Phi$. However, a rotation $Y \rightarrow YU$ by an arbitrary nonsingular $m \times m$ matrix $U_{ab}$ does not change the constraints (66).

Since we can always **rotate the constraints** (66) by an arbitrary **nonsingular $m \times m$ matrix**, we can separate these constraints into two sets:

- $\{a\} = \{a'\} \cup \{a''\} \equiv J' \cup J''$
  
  such that $Y_{ia''}$ are **orthogonal** to $\Omega_{iA}$ and no further rotation can make $Y_{ia'}$ orthogonal to $\Omega_{iA}$:

$$
\sum_{i=1}^{N} \frac{1}{\xi_i^2} \Omega_{iA} Y_{ia''} = 0, \quad A = 1, \ldots, K, \quad a'' \in J'' \tag{98}
$$

Assume $J''$ is **not empty**—if it is empty, we can still proceed as below, except that $\epsilon_i'' = R_i$ in this case.

Let $H' \equiv \{A\} \cup J' = H \setminus J''$. Then:

$$
\begin{align*}
w_i &= \frac{1}{\lambda \xi_i^2} \left( R_i - \sum_{j=1}^{N} \frac{R_j}{\xi_j^2} \sum_{\alpha,\beta \in H} \widehat{\Omega}_{i\alpha} \widehat{\Omega}_{j\beta} \widehat{Q}_{\alpha\beta}^{-1} \right) \\
&= \frac{1}{\lambda \xi_i^2} \left( \varepsilon_i'' - \sum_{j=1}^{N} \frac{\varepsilon_j''}{\xi_j^2} \sum_{\alpha', \beta' \in H'} \widehat{\Omega}_{i\alpha'} \widehat{\Omega}_{j\beta'} \widehat{Q}_{\alpha'\beta'}^{-1} \right) \tag{99}
\end{align*}
$$

where:

$$
\begin{align*}
\varepsilon_i'' &\equiv R_i - \sum_{j=1}^{N} \frac{R_j}{\xi_j^2} \sum_{a'', b'' \in J''} Y_{ia''} Y_{jb''} Q_{a''b''}^{-1} \tag{100} \\
Q_{a''b''} &\equiv \sum_{i=1}^{N} \frac{1}{\xi_i^2} Y_{ia''} Y_{ib''} \tag{101}
\end{align*}
$$

Thus, $\varepsilon_i''$ are **regression residuals** of $R_i$ regressed over $Y_{ia''}$ with regression weights $z_i'' \equiv 1/\xi_i^2$. In other words, our original constrained optimization has reduced to constrained optimization with a subset of the original constraints:

$$
\sum_{i=1}^{N} w_i Y_{ia'} = 0, \quad a' \in J' \tag{102}
$$

This is because the original matrix $\widehat{Q}_{\alpha\beta}$ is **block-diagonal**:

$$
\begin{align*}
\widehat{Q}_{\alpha' \beta'} &= 0, \quad \alpha' \in H', \quad \beta' \in J'' \tag{103} \\
\widehat{Q}_{a'' b''} &= Q_{a'' b''}, \quad a'' , b'' \in J'' \tag{104}
\end{align*}
$$

In fact, we can break this down further. Assume that the $\left|J'\right|$ columns in the remaining loadings $Y_{ia'}$, $a' \in J'$, are a subset of the columns in the factor loadings $\Omega_{iA}$. Thus:

- $\{A\} = \{A'\} \cup J' \equiv F' \cup J'$
- $K' \equiv |F'| = K - |J'|$
- The index $A'$ runs over the columns in $\Omega_{iA}$ that differ from those in $Y_{ia'}$.
  
To avoid notational confusion, denote $\Omega_{iA}$ for $A = A' \in F'$ as:

$$
\Omega_{iA'}' \equiv \Omega_{iA'}, \quad A' \in F' \tag{105}
$$

It is then not difficult to show that:

$$
\widehat{Q}_{\alpha' \beta'}^{-1} = \begin{pmatrix}
D^{-1} & \mathbb{O} & -D^{-1} E \Delta^{-1} \\
\mathbb{O} & \mathbb{I} & -\mathbb{I} \\
-\Delta^{-1} E^T D^{-1} & -\mathbb{I} & \mathbb{I} + \Delta^{-1} + \Delta^{-1} E^T D^{-1} E \Delta^{-1}
\end{pmatrix} \tag{105}
$$

where:

$$
\begin{align*}
D &\equiv \widetilde{Q}' - E \Delta^{-1} E^T \tag{106} \\
\widetilde{Q}_{A' B'}' &\equiv \delta_{A' B'} + \sum_{i=1}^{N} \frac{1}{\xi_i^2} \Omega_{iA'}' \Omega_{iB'}', \quad A', B' \in F' \tag{107} \\
E_{A' b'} &\equiv \sum_{i=1}^{N} \frac{1}{\xi_i^2} \Omega_{iA'}' Y_{ib'}, \quad A' \in F', \quad b' \in J' \tag{108} \\
\Delta_{a' b'} &\equiv \sum_{i=1}^{N} \frac{1}{\xi_i^2} Y_{ia'} Y_{ib'}, \quad a', b' \in J' \tag{109}
\end{align*}
$$

Therefore, we have:

$$
w = \frac{1}{\lambda} \left\{ \Xi^{-1} - \Xi^{-1} \left[ \Omega' D^{-1} \Omega' - \Omega' D^{-1} E \Delta^{-1} Y^T - Y \Delta^{-1} E^T D^{-1} \Omega' + Y \left( \Delta^{-1} + \Delta^{-1} E^T D^{-1} E \Delta^{-1} \right) Y^T \right] \Xi^{-1} \right\} \varepsilon'' \tag{110}
$$

Furthermore:

$$
Y^T w = 0 \tag{111}
$$

In fact, $w_i$ given by (110) correspond to optimizing the residuals $\varepsilon_i''$ using a **reduced factor model** with the same specific risk but the factor loadings given by $\Omega_{iA'}'$, subject to the constraints:

$$
\sum_{i=1}^{N} w_i Y_{ia'} = 0, \quad a' \in J' \tag{113}
$$

The solution to this optimization problem is given by:

$$
w_i = \frac{1}{\lambda \xi_i^2} \left( \varepsilon_i'' - \sum_{j=1}^{N} \frac{\varepsilon_j''}{\xi_j^2} \sum_{\alpha^*, \beta^* \in F^*} \widehat{\Omega}_{i\alpha^*} \widehat{\Omega}_{j\beta^*} \widehat{Q}_{\alpha^*\beta^*}^{-1} \right) \tag{114}
$$

where:

- $F^* \equiv F' \cup J'$
- $\widehat{\Omega}_{i\alpha^*}$ is defined as:
  - $\widehat{\Omega}_{i\alpha^*} = \Omega_{iA'}'$, for $\alpha^* = A' \in F'$
  - $\widehat{\Omega}_{i\alpha^*} = Y_{ia'}$, for $\alpha^* = a' \in J'$
  
And:

$$
\widehat{Q}_{\alpha^* \beta^*}^{-1} = \begin{pmatrix}
D^{-1} & -D^{-1} E \Delta^{-1} \\
-\Delta^{-1} E^T D^{-1} & \Delta^{-1} + \Delta^{-1} E^T D^{-1} E \Delta^{-1}
\end{pmatrix} \tag{115}
$$

It then follows that $w_i$ in (110) and (114) are **identical**.

**Summary**: Optimization is performed with the columns in the factor loadings matrix $\Omega_{iA}$ corresponding to the columns in $Y_{ia}$ omitted.


### 3.7 Pitfalls

So, what happens if we run **constrained optimization** with the **factor loadings matrix** as $Y_{ia}$? Specifically, set $m = K$, let the index $a$ take the same values as the index $A$, and define $\left.Y_{ia}\right|_{a=A} = \Omega_{iA}$. In this scenario, using the results from the previous subsection:

$$
\begin{align*}
w_i &= \frac{1}{\lambda \xi_i^2} \left( R_i - \sum_{j=1}^{N} \frac{R_j}{\xi_j^2} \sum_{A,B=1}^{K} \Omega_{iA} \Omega_{jB} Q_{AB}^{-1} \right) \tag{116} \\
Q_{AB} &\equiv \sum_{i=1}^{N} \frac{1}{\xi_i^2} \Omega_{iA} \Omega_{iB} \tag{117}
\end{align*}
$$

Thus, $w_i$ are identical to those obtained from the **weighted regression** with regression weights $z_i = 1 / \xi_i^2$.

#### 3.7.1 Specific Risk or Total Risk?

In the **optimization context**, the regression weights in equation (116) naturally emerge as $z_i = 1 / \xi_i^2$, which are the inverses of the **specific volatility squared**, not the **total volatility** (i.e., $\left.z_i \neq 1 / \sigma_i^2\right.$). This addresses the subtlety mentioned at the end of Subsection 3.4. However, **a priori**, there is nothing inherently wrong with using $z_i = 1 / \sigma_i^2$ in the weighted regression outside the factor model context.

- **Specific Risk ($\xi_i$)**:
  - **Definition**: The idiosyncratic risk component of each stock.
  - **Availability**: Not known unless a factor model is available or carefully constructed.
  
- **Total Risk ($\sigma_i$)**:
  - **Definition**: The overall volatility of each stock.
  - **Usage**: Typically available and can be used as regression weights when specific risk is not known.

**Implications**:
- **When Specific Risk is Available**:
  - Use $z_i = 1 / \xi_i^2$ for regression weights.
  
- **When Specific Risk is Not Available**:
  - Use $z_i = 1 / \sigma_i^2$ as regression weights.
  - **Note**: This is acceptable outside the factor model context.

#### 3.7.2 Optimization of Regression Residuals

Instead of imposing constraints directly in optimization, one might be tempted to **first regress** the returns $R_i$ over some factor loadings $\Omega_{iA}$ to obtain regression residuals $\varepsilon_i$, and then perform optimization based on these residuals (as opposed to the returns $R_i$ themselves). The rationale is that the **regressed returns** $\widetilde{R}_i = z_i \varepsilon_i$ (where $z_i$ are the regression weights) are neutral with respect to the loadings used in the regression. However, unless this is done correctly, the resulting holding weights will **not** be neutral with respect to $\Omega_{iA}$, as the optimization process can **undo** any such neutrality.

Consider the following strategy:

$$
\begin{align*}
\varepsilon_i &\equiv R - Y \left( Y^T Z Y \right)^{-1} Y^T Z R \quad \text{(118)} \\
Z &\equiv \operatorname{diag}(z_i) \quad \text{(119)} \\
w_i &\equiv \frac{1}{\lambda \xi_i^2} \left( \varepsilon_i - \sum_{j=1}^{N} \frac{\varepsilon_j}{\xi_j^2} \sum_{A,B=1}^{K} \Omega_{iA} \Omega_{jB} \widetilde{Q}_{AB}^{-1} \right) \quad \text{(120)} \\
\widetilde{Q}_{AB} &\equiv \delta_{AB} + \sum_{i=1}^{N} \frac{1}{\xi_i^2} \Omega_{iA} \Omega_{iB} \quad \text{(121)}
\end{align*}
$$

Here, we have purposefully kept the loadings $Y_{ia}$ in the weighted regression (first equation above) **distinct** from the factor loadings $\Omega_{iA}$ in the optimization (third equation above). Note that the **optimization** is performed on the **regression residuals** $\varepsilon_i$, not directly on the returns $R_i$.

**Key Insight**:
- **If the Purpose of the Regression**:
  - **Objective**: To ensure that the holding weights are neutral with respect to $Y_{ia}$.
  
- **To Ensure Neutrality**:
  - The **loadings matrix** $Y_{ia}$ must be **identical** to $\Omega_{iA}$ (up to immaterial rotations—see Footnote 14).
  - Specifically:
    - Set $m = K$.
    - Define $\left.Y_{ia}\right|_{a=A} = \Omega_{iA}$.
    - Choose regression weights as $z_i = 1 / \xi_i^2$.

From equation (118), we then have:

$$
\begin{align*}
\sum_{i=1}^{N} \widetilde{R}_i \Omega_{iA} &= 0 \quad \text{(122)} \\
\widetilde{R}_i &\equiv z_i \varepsilon_i = \frac{1}{\xi_i^2} \varepsilon_i \quad \text{(123)}
\end{align*}
$$

Thus:

$$
\begin{align*}
w_i &= \frac{1}{\lambda} \widetilde{R}_i \quad \text{(124)} \\
\sum_{i=1}^{N} w_i \Omega_{iA} &= 0 \quad \text{(125)}
\end{align*}
$$

**Conclusion**:
- **Optimization of Regression Residuals**:
  - Up to an overall proportionality constant $1/\lambda$, the holding weights $w_i$ correspond to the **regressed returns** $\widetilde{R}_i$.
  - These holding weights are **neutral** with respect to $\Omega_{iA}$.

**Summary**:
- **Directly Optimizing Regression Residuals**:
  - Requires careful alignment of regression and optimization parameters.
  - Specifically, regression loadings $Y_{ia}$ must match factor loadings $\Omega_{iA}$.
  - Regression weights should be set to $z_i = 1 / \xi_i^2$ to ensure neutrality.
  
- **Potential Pitfall**:
  - **Mismatch Between Regression and Optimization**:
    - If $Y_{ia}$ does not align with $\Omega_{iA}$, neutrality constraints may not hold.
    - Optimization can inadvertently **undo** neutrality, leading to unintended exposures.



## 4. Intermezzo

In **Section 2**, we began with simple **pair trading**, and by the end of the last section, the discussion had become substantially more involved. This trend is expected to continue in the following sections, making this an ideal place for an "intermezzo." We'll aim to keep it light.

Consider two stocks, **A** and **B**, with the following **sample covariance matrix**:

$$
C = \begin{pmatrix}
\sigma_A^2 & \rho \sigma_A \sigma_B \\
\rho \sigma_A \sigma_B & \sigma_B^2
\end{pmatrix} \tag{126}
$$

- **$\sigma_A$** and **$\sigma_B$**: Volatilities of stocks A and B, respectively.
- **$\rho$**: Correlation between stocks A and B.

Let our portfolio have dollar holdings **$D_A$** and **$D_B$** in stocks A and B, respectively. Let **$R_A$** and **$R_B$** be the expected returns for stocks A and B. The **expected Sharpe ratio** of this portfolio is:

$$
S = \frac{D_A R_A + D_B R_B}{\sqrt{(\sigma_A D_A)^2 + (\sigma_B D_B)^2 + 2\rho (\sigma_A D_A)(\sigma_B D_B)}} \tag{127}
$$

This Sharpe ratio is maximized by:

$$
\begin{align*}
D_A &= \gamma \left( \frac{R_A}{\sigma_A^2} - \frac{\rho R_B}{\sigma_A \sigma_B} \right) \tag{128} \\
D_B &= \gamma \left( \frac{R_B}{\sigma_B^2} - \frac{\rho R_A}{\sigma_A \sigma_B} \right) \tag{129}
\end{align*}
$$

- **$\gamma > 0$**: An arbitrary constant resulting from the **scale invariance** of the Sharpe ratio under simultaneous rescalings $D_A \rightarrow \zeta D_A$, $D_B \rightarrow \zeta D_B$ for $\zeta > 0$.
- **Normalization**: $\gamma$ is determined by the requirement that $|D_A| + |D_B| = I$, where **$I$** is the investment level.

### Equal Volatilities Assumption

Assume that the volatilities are identical: $\sigma_A = \sigma_B \equiv \sigma$. The holdings simplify to:

$$
\begin{align*}
D_A &= \frac{\gamma}{\sigma^2} (R_A - \rho R_B) \tag{130} \\
D_B &= \frac{\gamma}{\sigma^2} (R_B - \rho R_A) \tag{131}
\end{align*}
$$

As **$\rho \rightarrow 1$**, we observe that:

$$
D_A + D_B \rightarrow 0 \tag{132}
$$

This convergence to **dollar neutrality** occurs because, in the limit where volatilities are identical and correlation approaches unity, the portfolio achieves a net zero dollar exposure.

### Why Dollar Neutrality Emerges as $\rho \rightarrow 1$

There are two primary reasons for this phenomenon:

1. **Singular Covariance Matrix**:
   
   - When $\sigma_A = \sigma_B$ and $\rho = 1$, the covariance matrix $C$ becomes **singular**.
   - The **eigenvector** corresponding to the **null eigenvalue** is $V^T = (1, -1)$.
   - In this eigen-direction, the **portfolio volatility** vanishes, and the **Sharpe ratio** tends to infinity.
   - Consequently, the optimal holdings satisfy $D_B = -D_A$, ensuring **dollar neutrality**.

   $$
   V^T C V = 0 \quad \text{implies} \quad D_A = -D_B \tag{133}
   $$

   **Footnote 15**:
   > As $\rho \rightarrow 1$ and volatilities are equal, the covariance matrix loses rank, and the portfolio can exploit this to achieve infinite Sharpe ratios by constructing a **market-neutral** position. For $\rho= \pm 1$ the Sharpe ratio goes to infinity if $R_{A} \neq \pm R_{B}$ : the volatility vanishes for $D_{B}=\mp D_{A}$ (recall that $\sigma_{A}=\sigma_{B}$ ). If $R_{A}= \pm R_{B}$, then the two instruments $A$ and - long for plus sign and short for minus sign $-B$ are indistinguishable for optimization purposes.

2. **Connection to Factor Models (Subsection 3.4)**:
   
   - Consider a **one-factor model** for two stocks A and B:
     
     $$
     \Theta = \Xi + \zeta \Omega \Omega^T \tag{134}
     $$
     
     where:
     
     $$
     \Xi = \begin{pmatrix}
     \xi_A^2 & 0 \\
     0 & \xi_B^2
     \end{pmatrix}, \quad \Omega^T = (1, 1) \tag{135}
     $$
     
   - This setup implies:
     
     $$
     \Theta = C \quad \text{with} \quad \sigma_A^2 = \xi_A^2 + \zeta, \quad \sigma_B^2 = \xi_B^2 + \zeta, \quad \rho = \frac{\zeta}{\sigma_A \sigma_B} \tag{136}
     $$
     
   - In the **limit** as $\zeta \rightarrow \infty$ (while keeping $\xi_A$ and $\xi_B$ fixed):
     
     $$
     \sigma_A^2 \rightarrow \zeta \equiv \sigma^2, \quad \rho \rightarrow 1 \tag{137}
     $$
     
   - This mirrors the earlier scenario where volatilities are equal, and correlation approaches unity.
   
   - **From Subsection 3.4**:
     - In this limit, **optimization reduces** to a **regression** over $\Omega$, which in this case is the **intercept**.
     - Consequently, the holding weights **achieve dollar neutrality**.

   $$
   w_i = \frac{1}{\lambda} \widetilde{R}_i \quad \text{with} \quad \sum_{i=1}^{N} w_i = 0 \tag{138}
   $$

#### Summary

- **As Correlation Approaches Unity**:
  - The optimal portfolio becomes **dollar neutral**.
  - This arises due to the **singular covariance matrix** and the **alignment with factor models**.
  


## 5. Optimization with Costs

In previous sections, we have streamlined the portfolio optimization process by **ignoring trading costs**. However, in real-world scenarios, trading costs can significantly impact portfolio performance. This section introduces **transaction costs** into the optimization framework, starting with **linear costs** and later expanding to more complex cost structures.

### 5.1 Linear Costs

**Linear costs** are the simplest form of transaction costs, assuming that the cost incurred is directly proportional to the volume traded. This model does not account for **market impact**, meaning that trading does not influence stock prices. Linear costs encompass various fixed fees associated with trading, such as **SEC fees**, **exchange fees**, **broker-dealer fees**, and **linear slippage**.

**Footnote 16**:
> For the sake of simplicity, the transaction costs for buys and sells are assumed to be the same.

#### Incorporating Linear Costs into Portfolio P&L

Previously, the **Portfolio Profit & Loss (P&L)** was defined without considering trading costs:
$$
P = \sum_{i=1}^{N} R_i D_i \tag{51}
$$

When **linear costs** are introduced, the P&L adjusts to account for the costs incurred when adjusting positions from current holdings to desired holdings. The modified P&L becomes:
$$
\widetilde{P} = \frac{P}{I} = \sum_{i=1}^{N} R_i w_i - \sum_{i=1}^{N} L_i \left| w_i - w_i^* \right| \tag{133}
$$

- **$L_i$**: Represents the **per-dollar trading cost** for stock $i$, encompassing all fixed trading fees and linear slippage.
- **$w_i$**: **Desired holding weight** for stock $i$.
- **$w_i^*$**: **Current holding weight** for stock $i$.
- **$I$**: **Investment level**, serving as a scaling factor to transition from dollar holdings ($D_i$) to dimensionless holding weights ($w_i$).

#### Constraints with Current Holdings

To ensure consistency and feasibility in portfolio adjustments, we impose the following constraints:

1. **Linear Constraints**:
   $$
   \sum_{i=1}^{N} w_i Y_{ia} = 0, \quad a = 1, \ldots, m \tag{134}
   $$
   
   These constraints typically enforce **risk management** objectives, such as **dollar neutrality** and **factor neutrality**.

2. **Current Holdings Compliance**:
   $$
   \sum_{i=1}^{N} w_i^* Y_{ia} = 0, \quad a = 1, \ldots, m \tag{135}
   $$
   
   This ensures that the **current portfolio** already satisfies the desired constraints, facilitating smoother transitions to the **desired portfolio** without violating risk management principles.

3. **Normalization Condition**:
   $$
   \sum_{i=1}^{N} |w_i| = 1 \tag{55}
   $$
   
   This condition maintains the **scale invariance** of holding weights, ensuring that the portfolio maintains a consistent investment level.

#### Optimization Objective with Linear Costs

The introduction of linear costs modifies the **optimization objective**. Instead of merely maximizing the Sharpe ratio, we now aim to **maximize the adjusted Sharpe ratio** that accounts for transaction costs. The optimization problem can be formulated as:

**Objective Function**:
$$
\widetilde{P} = \sum_{i=1}^{N} R_i w_i - \sum_{i=1}^{N} L_i \left| w_i - w_i^* \right| \tag{133}
$$

**Sharpe Ratio**:
$$
S = \frac{\widetilde{P}}{\widetilde{V}} \tag{136}
$$

Where:
$$
\widetilde{V} = \sqrt{\sum_{i,j=1}^{N} C_{ij} w_i w_j} \tag{137}
$$

**Maximization Problem**:
$$
S \rightarrow \max \tag{138}
$$

#### Handling Absolute Values in Optimization

The presence of **absolute values** in the cost term introduces **nonlinearity** into the optimization problem. To efficiently solve this, we can employ techniques from **convex optimization**, specifically **linear programming (LP)**, by linearizing the absolute value expressions.

#### Reformulating the Optimization Problem

To convert the problem into a form suitable for LP, introduce auxiliary variables to handle the absolute values:

1. **Auxiliary Variables**:
   For each stock $i$, introduce a non-negative variable $t_i \geq 0$ such that:
   $$
   t_i \geq w_i - w_i^* \quad \text{and} \quad t_i \geq -(w_i - w_i^*) \tag{139}
   $$

2. **Rewritten Objective Function**:
   The optimization objective becomes:
   $$
   \widetilde{P} = \sum_{i=1}^{N} R_i w_i - \sum_{i=1}^{N} L_i t_i \tag{140}
   $$

3. **Constraints**:
   - **Linear Constraints** (as before):
     $$
     \sum_{i=1}^{N} w_i Y_{ia} = 0, \quad a = 1, \ldots, m \tag{134}
     $$
     
   - **Auxiliary Variable Constraints**:
     $$
     t_i \geq w_i - w_i^*, \quad t_i \geq -(w_i - w_i^*), \quad t_i \geq 0, \quad \forall i \tag{141}
     $$

4. **Normalization**:
   $$
   \sum_{i=1}^{N} |w_i| = 1 \tag{55}
   $$

#### Complete Linear Programming Formulation

The **linear programming (LP)** formulation of the optimization problem is as follows:

- **Variables**:
  - **$w_i$**: Desired holding weights.
  - **$t_i$**: Auxiliary variables representing the absolute deviations from current holdings.

- **Objective**:
  $$
  \text{Maximize} \quad \sum_{i=1}^{N} R_i w_i - \sum_{i=1}^{N} L_i t_i \tag{142}
  $$

- **Subject to**:
  $$
  \begin{align*}
  & \sum_{i=1}^{N} w_i Y_{ia} = 0, \quad a = 1, \ldots, m \tag{134} \\
  & t_i \geq w_i - w_i^*, \quad \forall i \tag{141} \\
  & t_i \geq -(w_i - w_i^*), \quad \forall i \tag{141} \\
  & t_i \geq 0, \quad \forall i \tag{141} \\
  & \sum_{i=1}^{N} |w_i| = 1 \tag{55}
  \end{align*}
  $$

#### Solution Approach

Given the LP formulation, standard optimization solvers can be employed to find the optimal holding weights $w_i$ and auxiliary variables $t_i$. The steps include:

1. **Define the Objective Function**:
   - **Maximize** the expected returns **minus** the total linear costs.

2. **Incorporate Constraints**:
   - **Risk Management**: Enforce linear constraints to maintain neutrality with respect to specified factors.
   - **Transaction Costs**: Ensure that $t_i$ accurately captures the absolute deviations in holdings.

3. **Normalization**:
   - Maintain the investment scale through the normalization condition.

#### Implications of Linear Costs on Optimal Holdings

- **Trade-Off Between Returns and Costs**:
  - **Higher Expected Returns**: Favor positions with higher $R_i$, but may incur higher transaction costs if deviating significantly from $w_i^*$.
  - **Lower Transaction Costs**: Encourage smaller adjustments to existing positions, potentially sacrificing some expected returns.

- **Sparsity in Holdings**:
  - **High Costs ($L_i$)**: Discourage frequent trading and large adjustments, leading to **sparser changes** in holdings.
  - **Low Costs ($L_i$)**: Allow for more significant adjustments, enabling the portfolio to better capitalize on expected returns.

- **Stability of Portfolio**:
  - **Reduced Turnover**: Linear costs penalize large shifts in holdings, promoting **portfolio stability** over time.

#### Example Illustration

Consider a simple case with two stocks, A and B, where:

- **Current Holdings**:
  $$
  w_A^* = 0.4, \quad w_B^* = 0.6
  $$

- **Expected Returns**:
  $$
  R_A = 0.05, \quad R_B = 0.04
  $$

- **Volatilities**:
  $$
  \sigma_A = \sigma_B = 0.2
  $$

- **Correlation**:
  $$
  \rho = 0.8
  $$

- **Transaction Costs**:
  $$
  L_A = L_B = 0.001 \quad (\$0.10 \text{ per \$100 traded})
  $$

**Optimization Goal**: Determine the new holdings $w_A$ and $w_B$ that maximize the adjusted Sharpe ratio while adhering to the constraints.

**Steps**:

1. **Formulate the LP**:
   - **Objective**:
     $$
     \text{Maximize} \quad 0.05 w_A + 0.04 w_B - 0.001 t_A - 0.001 t_B
     $$
     
   - **Constraints**:
     $$
     \begin{align*}
     & w_A + w_B = 1 \quad (\text{Normalization}) \\
     & t_A \geq w_A - 0.4 \\
     & t_A \geq -(w_A - 0.4) \\
     & t_A \geq 0 \\
     & t_B \geq w_B - 0.6 \\
     & t_B \geq -(w_B - 0.6) \\
     & t_B \geq 0 \\
     \end{align*}
     $$

2. **Solve the LP**:
   - Utilize an LP solver (e.g., **CPLEX**, **Gurobi**, **GLPK**) to find the optimal values of $w_A$, $w_B$, $t_A$, and $t_B$.

3. **Interpret the Results**:
   - **Optimal Holdings**: Adjustments to $w_A$ and $w_B$ reflect a balance between maximizing returns and minimizing transaction costs.
   - **Transaction Costs Impact**: Higher $L_i$ values lead to smaller adjustments from current holdings.

#### Conclusion

Incorporating **linear transaction costs** into the portfolio optimization framework introduces a trade-off between **expected returns** and **costs incurred** from adjusting holdings. By reformulating the optimization problem into a **linear programming** model, we can efficiently determine the optimal portfolio that balances these factors while adhering to specified **risk management constraints**.

In subsequent subsections, we will explore more complex cost structures, such as **quadratic costs** and **market impact models**, to further refine the optimization process.



### 5.2 Optimization with Costs and Homogeneous Constraints

More generally, **transaction costs** can be modeled by some function $ f(w) $ of the holding weights $ w_i $, which also depends on the current holding weights $ w_i^* $. The precise form of this dependence is not crucial for our discussion here. The **adjusted Portfolio Profit & Loss (P&L)** incorporating costs is given by:

$$
\widetilde{P} = \sum_{i=1}^{N} R_i w_i - f \tag{136}
$$

Here, $ f $ represents the total transaction costs, which are a function of the desired holding weights $ w_i $ and the current holdings $ w_i^* $.

#### Breaking Down the Sharpe Ratio with Costs

The **Sharpe ratio** with transaction costs is defined as:

$$
S = \frac{\widetilde{P}}{\widetilde{V}} = \frac{\sum_{i=1}^{N} R_i w_i - f}{\sqrt{\sum_{i,j=1}^{N} C_{ij} w_i w_j}} \tag{137}
$$

**Key Point**: Unlike the cost-free scenario, the introduction of transaction costs $ f(w) $ **spoils the scale invariance** of the Sharpe ratio. Specifically, the Sharpe ratio is no longer invariant under the rescaling $ w_i \rightarrow \zeta w_i $ for $ \zeta > 0 $, **except** when $ f(w) $ takes the special form:

$$
f_{\text{special}}(w) \equiv \sum_{i=1}^{N} L_i |w_i| \tag{138}
$$

Here, $ L_i $ are positive constants representing **linear transaction costs** for each dollar traded in stock $ i $. This special form applies when:
- **Only linear costs** are present.
- The **current holdings** are zero ($ w_i^* = 0 $), meaning we are establishing new positions.

For our purposes, we **assume** that $ f(w) $ **does not** take this special form.

#### Sharpe Ratio Maximization with Constraints and Costs

In the presence of transaction costs, the **Sharpe ratio maximization problem** becomes more complex due to the loss of scale invariance. To address this, we introduce **constraints** and utilize **Lagrange multipliers**. The optimization problem is formulated as follows:

**Objective Function with Constraints**:
$$
\widetilde{S} \equiv S + \sum_{a=1}^{m} \sum_{i=1}^{N} w_i Y_{ia} \mu_a + \widetilde{\mu} \left( \sum_{i=1}^{N} |w_i| - 1 \right) \tag{139}
$$

**Explanation**:
- **Maximize** $ \widetilde{S} $ with respect to $ w_i $ and the Lagrange multipliers $ \mu_a $ and $ \widetilde{\mu} $.
- **Constraints**:
  - $ \sum_{i=1}^{N} w_i Y_{ia} = 0 $ for $ a = 1, \ldots, m $ (homogeneous linear constraints).
  - $ \sum_{i=1}^{N} |w_i| = 1 $ (normalization condition).

**First-Order Conditions**:
The optimization yields the following set of equations:

$$
\begin{align*}
& \frac{1}{\widetilde{V}} \left[ R_i - f_i - \lambda \sum_{j=1}^{N} C_{ij} w_j \right] + \sum_{a=1}^{m} Y_{ia} \mu_a + \widetilde{\mu} \, \text{sign}(w_i) = 0 \tag{140} \\
& \sum_{i=1}^{N} w_i Y_{ia} = 0 \tag{141} \\
& \sum_{i=1}^{N} |w_i| = 1 \tag{142} \\
& \lambda \equiv \frac{\sum_{i=1}^{N} R_i w_i - f}{\sum_{i,j=1}^{N} C_{ij} w_i w_j} \tag{143} \\
& f_i \equiv \frac{\partial f}{\partial w_i} \tag{144}
\end{align*}
$$

**Footnote 17**:
> The derivatives with respect to $ w_i $ are defined only for $ w_i \neq 0 $. For instance, in the case of linear costs when $ w_i \neq w_i^* $, refer to Subsection 6.2 for detailed considerations.

**Deriving $ \widetilde{\mu} $**:
By multiplying the first equation by $ w_i $ and summing over all $ i $, we obtain:

$$
\widetilde{\mu} = \frac{1}{\widetilde{V}} \left[ \sum_{i=1}^{N} f_i w_i - f \right] \tag{145}
$$

**Key Insight**:
- Unless $ f(w) $ has the **special form** (138), $ \widetilde{\mu} \neq 0 $.
- This indicates that the introduction of general transaction costs introduces additional complexity into the optimization problem.

#### Reformulating the Optimization Problem

Despite the loss of scale invariance, we can **formally recast** the Sharpe ratio maximization as a minimization problem by introducing an **objective function**. The objective function to be minimized is:

$$
\begin{align*}
g(w, \mu', \widetilde{\mu}', \lambda') &\equiv \frac{\lambda'}{2} \sum_{i,j=1}^{N} C_{ij} w_i w_j - \sum_{i=1}^{N} R_i w_i + f \\
&\quad - \sum_{a=1}^{m} \sum_{i=1}^{N} w_i Y_{ia} \mu_a' - \widetilde{\mu}' \left( \sum_{i=1}^{N} |w_i| - 1 \right) \tag{146} \\
& g(w, \mu', \widetilde{\mu}', \lambda') \rightarrow \min \tag{147}
\end{align*}
$$

**Minimization Equations**:
The first-order conditions for minimization lead to:

$$
\begin{align*}
& \lambda' \sum_{j=1}^{N} C_{ij} w_j - R_i + f_i - \sum_{a=1}^{m} Y_{ia} \mu_a' - \widetilde{\mu}' \, \text{sign}(w_i) = 0 \tag{148} \\
& \sum_{i=1}^{N} w_i Y_{ia} = 0 \tag{149} \\
& \sum_{i=1}^{N} |w_i| = 1 \tag{150}
\end{align*}
$$

**Deriving $ \widetilde{\mu}' $**:
Multiplying the first equation by $ w_i $ and summing over all $ i $, we obtain:

$$
\widetilde{\mu}' = \lambda' \sum_{i,j=1}^{N} C_{ij} w_i w_j - \sum_{i=1}^{N} (R_i - f_i) w_i \tag{151}
$$

**Conclusion**:
There exists a value of $ \lambda' $ for which minimizing the objective function $ g(w, \mu', \widetilde{\mu}', \lambda') $ yields the same optimal holding weights $ w_i $ as those obtained from maximizing the Sharpe ratio. Specifically, this value is $ \lambda' = \lambda $, where $ \lambda $ is defined in equation (143) for the optimal solution.

However, the **practical utility** of this reformulation is limited because:
- The value of $ \lambda $ is **unknown** unless the Sharpe ratio maximization problem is solved.
- The Sharpe ratio maximization is **highly nonlinear** and susceptible to typical **nonlinear instabilities**.

**Alternative Approach**:
Instead of directly solving the Sharpe ratio maximization problem, we can treat $ \lambda' $ as a **parameter** and perform a **one-dimensional search** to find the value of $ \lambda' $ that maximizes the Sharpe ratio. This transforms the problem into a more manageable **parameter tuning** task within the minimization framework.

#### Summary of Key Points

1. **Transaction Costs Impact**:
   - **Nonlinearity**: General transaction costs $ f(w) $ introduce nonlinearity into the optimization problem.
   - **Loss of Scale Invariance**: Unlike the cost-free case, the Sharpe ratio is no longer invariant under scaling of holding weights.

2. **Optimization Framework with Costs**:
   - **Objective Function**: Incorporate both expected returns and transaction costs.
   - **Constraints**: Maintain homogeneous linear constraints (e.g., dollar neutrality, factor neutrality).
   - **Lagrange Multipliers**: Introduce multipliers to enforce constraints and handle the normalization condition.

3. **Reformulation Challenges**:
   - **Complexity**: The presence of absolute values and general transaction costs complicates the optimization landscape.
   - **Solution Strategy**: Employ a minimization-based approach with parameter tuning for $ \lambda' $.

4. **Practical Considerations**:
   - **Numerical Stability**: Ensure that the optimization algorithm can handle the introduced nonlinearity and potential instabilities.
   - **Computational Efficiency**: Optimize the search for the appropriate $ \lambda' $ to maintain computational feasibility, especially in high-dimensional portfolios.

In the subsequent subsections, we will delve deeper into specific strategies for handling transaction costs, explore more sophisticated cost models, and discuss practical algorithms for solving the optimization problem efficiently.



#### 5.2.1 Pitfalls

In the presence of transaction costs, the **rescaling invariance** of the Sharpe ratio is lost. Consequently, the **maximum Sharpe ratio** solution is **not** obtained by simply minimizing the following objective function with respect to the holding weights $ w_i $ and Lagrange multipliers $ \mu_a'' $:

$$
\begin{align*}
\widetilde{g}\left(w, \mu'' , \lambda''\right) &\equiv \frac{\lambda''}{2} \sum_{i,j=1}^{N} C_{ij} w_i w_j - \sum_{i=1}^{N} R_i w_i + f - \sum_{a=1}^{m} \sum_{i=1}^{N} w_i Y_{ia} \mu_a'' \tag{155} \\
\widetilde{g}\left(w, \mu'', \lambda''\right) &\rightarrow \min \tag{156}
\end{align*}
$$

It is **incorrect** to assume—and this appears to be a common misstep in practical applications—that the **maximum Sharpe ratio** solution is given by the solution to this minimization condition for the value of $ \lambda'' $ such that the normalization condition (142) is satisfied.

##### Demonstrating the Incorrectness

To illustrate why this assumption is flawed, consider the **simplest case** where there are **no linear constraints** (i.e., $ m = 0 $). The optimization condition simplifies to:

$$
\lambda'' \sum_{j=1}^{N} C_{ij} w_j - R_i + f_i = 0 \tag{157}
$$

**Multiplying** both sides of this equation by $ w_i $ and **summing** over all $ i $, we obtain:

$$
\lambda'' = \frac{\sum_{i=1}^{N} (R_i - f_i) w_i}{\sum_{i,j=1}^{N} C_{ij} w_i w_j} \tag{158}
$$

For this solution to align with the earlier derived condition (148), we must satisfy:

$$
\frac{1}{\sum_{i,j=1}^{N} C_{ij} w_i w_j} \sum_{j=1}^{N} C_{ij} w_j = \text{sign}(w_i) \tag{159}
$$

However, **substituting** this back into equation (157) leads to:

$$
R_i - f_i = \gamma \, \text{sign}(w_i) \tag{160}
$$

where $ \gamma \equiv \sum_{i=1}^{N} (R_i - f_i) w_i \tag{161} $.

This equation imposes a **uniform condition** across all stocks $ i $, which is generally **impossible** to satisfy for an arbitrary cost function $ f(w) $. 

##### Example with Linear Costs

Consider the case where transaction costs are **linear**:

$$
f(w) = f_{\text{linear}}(w) \equiv \sum_{i=1}^{N} L_i |w_i - w_i^*| \tag{162}
$$

Here, $ L_i $ are positive constants representing the per-dollar trading costs for each stock. The condition from equation (160) becomes:

$$
R_i - L_i \, \text{sign}(w_i - w_i^*) = \gamma \, \text{sign}(w_i) \quad \forall i = 1, \ldots, N
$$

This implies that each stock must satisfy:

$$
R_i - L_i \, \text{sign}(w_i - w_i^*) = \gamma \, \text{sign}(w_i)
$$

For **different stocks**, the right-hand side involves the same $ \gamma $, while the left-hand side varies based on $ R_i $ and $ L_i $. This equality **cannot** hold simultaneously for all $ i $ unless $ f(w) $ has a very specific form, which is **not** the case for general linear costs.

##### Correct Approach

Given that the **incorrect minimization** does not yield the optimal solution, the **correct approach** involves minimizing the **proper objective function** that accounts for transaction costs and maintains the relationship between the Sharpe ratio and the constraints. Specifically, the correct objective function to minimize is:

$$
\begin{align*}
g\left(w, \mu', \widetilde{\mu}', \lambda'\right) &\equiv \frac{\lambda'}{2} \sum_{i,j=1}^{N} C_{ij} w_i w_j - \sum_{i=1}^{N} R_i w_i + f \\
&\quad - \sum_{a=1}^{m} \sum_{i=1}^{N} w_i Y_{ia} \mu_a' - \widetilde{\mu}' \left( \sum_{i=1}^{N} |w_i| - 1 \right) \tag{146} \\
& g\left(w, \mu', \widetilde{\mu}', \lambda'\right) \rightarrow \min \tag{147}
\end{align*}
$$

The minimization involves treating $ \lambda' $ as a **parameter** to be determined via a **one-dimensional search** that maximizes the Sharpe ratio, rather than directly solving for it within the minimization framework.

#### 5.2.2 Global vs. Local Optima

When incorporating transaction costs, especially **linear costs**, the **convexity** properties of the optimization problem undergo significant changes. Here's a detailed examination of the implications:

##### Convexity Considerations

1. **"Wrong" Objective Function (155)**:
   
   - If the **cost function** $ f(w) $ is **convex**, then the objective function $ \widetilde{g}(w, \mu'', \lambda'') $ is also **convex** with respect to $ w_i $.
   - **Convexity Implication**: A **unique local minimum** exists, which is also the **global minimum**.

2. **Correct Objective Function (146)**:
   
   - The correct objective function **does not necessarily maintain convexity**.
   - The **non-convexity** arises from the term involving the **Lagrange multiplier** $ \widetilde{\mu}' $:
     
     $$
     - \widetilde{\mu}' \left( \sum_{i=1}^{N} |w_i| - 1 \right)
     $$
   
   - **Condition for Convexity**: The contribution from $ \widetilde{\mu}' $ is **convex** if and only if $ \widetilde{\mu}' \leq 0 $.
   - **Issue**: $ \widetilde{\mu}' $ is generally **not guaranteed** to be non-positive, as seen from equation (154):
     
     $$
     \widetilde{\mu}' = \frac{1}{\widetilde{V}} \left[ \sum_{i=1}^{N} f_i w_i - f \right] \tag{154}
     $$
   
   - **Implication**: If $ \widetilde{\mu}' > 0 $, the objective function becomes **non-convex**, introducing the possibility of **multiple local minima**.

##### Specific Example with Linear Costs

Consider **linear transaction costs**:

$$
f(w) = f_{\text{linear}}(w) \equiv \sum_{i=1}^{N} L_i |w_i - w_i^*| \tag{162}
$$

In this scenario, $ \widetilde{\mu}' $ is given by:

$$
\widetilde{\mu}' = \sum_{i=1}^{N} L_i w_i^* \, \text{sign}(w_i - w_i^*) \tag{163}
$$

- **Sign Function**: $ \text{sign}(w_i - w_i^*) $ can be **positive**, **negative**, or **zero** depending on whether $ w_i > w_i^* $, $ w_i < w_i^* $, or $ w_i = w_i^* $, respectively.
- **Result**: $ \widetilde{\mu}' $ is **not necessarily non-positive**, leading to potential **non-convexity** in the objective function.

##### Implications for Optimization

1. **Multiple Local Minima**:
   
   - **Non-convex Objective**: The presence of multiple local minima complicates the optimization process.
   - **Optimization Algorithms**: Standard convex optimization techniques are **inadequate**. Instead, **global optimization methods** or **heuristic algorithms** may be required to navigate the complex landscape.

2. **Search for $ \lambda' $**:
   
   - Since $ \lambda' $ is treated as a **parameter**, a **one-dimensional search** is necessary to identify the value that maximizes the Sharpe ratio.
   - **Interdependency**: The optimal $ \lambda' $ depends on the holding weights $ w_i $, creating an intricate relationship that must be resolved iteratively.

3. **Practical Challenges**:
   
   - **Computational Complexity**: Increased complexity due to non-convexity can lead to **longer computation times** and **higher resource consumption**.
   - **Stability**: The optimization process may become **unstable**, especially in high-dimensional portfolios with numerous assets and constraints.

##### Summary

- **Incorrect Minimization**: Minimizing the "wrong" objective function $ \widetilde{g}(w, \mu'', \lambda'') $ does **not** yield the **maximum Sharpe ratio** solution when transaction costs are present.
  
- **Correct Minimization**: The proper objective function $ g(w, \mu', \widetilde{\mu}', \lambda') $ must be minimized, treating $ \lambda' $ as a **parameter** determined through a **search algorithm** to ensure the Sharpe ratio is maximized.

- **Optimization Landscape**:
  
  - The **correct objective function** can be **non-convex**, introducing **multiple local minima**.
  
  - **Global Optimization** techniques are essential to reliably find the **global minimum**, ensuring the **optimal portfolio** is achieved.

- **Algorithm Design**:
  
  - **Robust Optimization Algorithms**: Utilize algorithms capable of handling non-convex objectives, such as **genetic algorithms**, **simulated annealing**, or **branch and bound** methods.
  
  - **Initialization Strategies**: Implement smart initialization to increase the likelihood of converging to the global minimum rather than getting trapped in local minima.

- **Implications for Portfolio Management**:
  
  - **Enhanced Realism**: Incorporating transaction costs leads to **more realistic portfolio optimization**, accounting for the practical costs associated with trading.
  
  - **Strategic Trade-Offs**: Managers must balance **expected returns** against **transaction costs**, optimizing holdings in a manner that maximizes the **net Sharpe ratio**.

In the subsequent sections, we will explore **advanced cost models**, including **quadratic costs** and **market impact models**, and discuss **strategies** to efficiently navigate the **non-convex optimization landscape** introduced by these complexities.



### 5.3 Maximizing Sharpe Ratio with Linear Costs

As discussed in the previous subsection, the introduction of **transaction costs** renders the Sharpe ratio maximization problem highly **nonlinear** and potentially **multi-modal**, even when considering simple **linear costs** (equation (162)). This complexity arises because the presence of costs disrupts the inherent **scale invariance** of the Sharpe ratio, making the optimization landscape more challenging to navigate.

#### Common Practical Approach

In practical applications, portfolio managers often adopt an **approximate** method to handle transaction costs due to the computational complexity of the exact optimization. The prevalent approach involves:

1. **Minimizing the "Wrong" Objective Function**:
   
   - **Objective Function**:
     $$
     \widetilde{g}\left(w, \mu'' , \lambda''\right) \equiv \frac{\lambda''}{2} \sum_{i,j=1}^{N} C_{ij} w_i w_j - \sum_{i=1}^{N} R_i w_i + f - \sum_{a=1}^{m} \sum_{i=1}^{N} w_i Y_{ia} \mu_a'' \tag{155}
     $$
     
   - **Minimization Goal**:
     $$
     \widetilde{g}\left(w, \mu'', \lambda''\right) \rightarrow \min \tag{156}
     $$
   
   - **Procedure**:
     - **Iterate** over values of $ \lambda'' $ to find the optimal holding weights $ w_i $ that satisfy the normalization condition (equation (55)):
       $$
       \sum_{i=1}^{N} |w_i| = 1 \tag{55}
       $$
     
   - **Footnote 18**:
     > For any $ \lambda'' > 0 $, there exists a **unique optimum** assuming the covariance matrix $ C_{ij} $ is **positive-definite** and all transaction costs $ L_i \geq 0 $.

2. **Assumptions for the Approximation**:
   
   To justify this approximation, several conditions are typically assumed:
   
   - **Uniform Transaction Costs**:
     $$
     L_i \equiv L \quad \forall i \tag{163}
     $$
     
   - **Rebalancing Trade**:
     - Current holdings $ w_i^* $ are being adjusted to new holdings $ w_i $, essentially $ w_i^* \rightarrow w_i $.
     
   - **Dollar Neutrality**:
     $$
     \sum_{i=1}^{N} w_i^* = \sum_{i=1}^{N} w_i = 0 \tag{164}
     $$
     
   - **Large Number of Stocks**:
     $$
     N \gtrsim 1000 \tag{165}
     $$
     
   - **Diversification Constraints**:
     - Holding weights $ w_i $ are constrained to be within low single-digit percentages to ensure diversification and manage risk.
     
   - **Low Correlation Between Current and Desired Trade Signs**:
     $$
     \text{If } \text{sign}(w_i^*) \text{ and } \text{sign}(w_i - w_i^*) \text{ are not highly correlated, then } |\widetilde{\mu}'| \ll L \tag{166}
     $$
     
     **Footnote 19**:
     > We will discuss bounds below. Alternatively, one can **normalize** returns to achieve similar effects.
     
   - **Uniform Transaction Costs Relaxation**:
     - While uniform $ L_i $ simplifies the analysis, the approach can be extended to **non-uniform** costs with additional considerations.

   **Footnote 20**:
   > This argument also holds for **partially establishing** and **liquidating trades** with $ \xi \lesssim 1 $, i.e., $ \xi \equiv \sum_{i=1}^{N} |w_i^*| $ does not need to equal 1. The uniformity of $ L_i $ can also be relaxed, albeit with some care.

3. **Justification of the Approximation**:
   
   Under the aforementioned assumptions, particularly when:
   
   - **Uniform Transaction Costs** ($ L_i \equiv L $),
   - **Dollar Neutrality** ($ \sum w_i = 0 $),
   - **Large $ N $**,
   - **Diversification Constraints**, and
   - **Low Correlation Between Current Holdings and Desired Trades**,
   
   the term involving the Lagrange multiplier $ \widetilde{\mu}' $ in equation (148):
   
   $$
   \widetilde{\mu}' = \sum_{i=1}^{N} f_i w_i - f \tag{154}
   $$
   
   becomes **negligible** compared to the **linear transaction costs**. Specifically, when $ |\widetilde{\mu}'| \ll L $, the contribution from $ \widetilde{\mu}' $ can be **approximately ignored**, simplifying the optimization problem.

   This simplification effectively **reduces** the correct objective function (equation (146)) to the **"wrong" objective function** (equation (155)). Consequently, minimizing $ \widetilde{g}(w, \mu'', \lambda'') $ becomes a **reasonable approximation** for maximizing the Sharpe ratio under these specific conditions.

4. **Practical Implementation**

   Given the approximation's validity under the specified assumptions, the **practical steps** to maximize the Sharpe ratio with **linear costs** are as follows:
   
   1. **Formulate the "Wrong" Objective Function**:
      
      $$
      \widetilde{g}\left(w, \mu'', \lambda''\right) = \frac{\lambda''}{2} \sum_{i,j=1}^{N} C_{ij} w_i w_j - \sum_{i=1}^{N} R_i w_i + f - \sum_{a=1}^{m} \sum_{i=1}^{N} w_i Y_{ia} \mu_a'' \tag{155}
      $$
   
   2. **Minimize $ \widetilde{g}(w, \mu'', \lambda'') $**:
      
      - **Optimization Variables**:
        - **$ w_i $**: Desired holding weights.
        - **$ \mu_a'' $**: Lagrange multipliers for the constraints.
        - **$ \lambda'' $**: Scaling parameter.
      
      - **Procedure**:
        - **Iteratively adjust** $ \lambda'' $ using a **one-dimensional search** until the normalization condition $ \sum |w_i| = 1 $ is satisfied.
   
   3. **Advantages of the Approximation**:
      
      - **Convexity**: The "wrong" objective function (155) remains **convex** with respect to $ w_i $, ensuring a **unique local minimum** that is also the **global minimum**.
      
      - **Computational Efficiency**: Convex optimization techniques can be efficiently applied, avoiding the complexities associated with **non-convex** objective functions.
      
      - **Avoidance of Multiple Local Minima**: By using the convex "wrong" objective function, the issue of **multiple local optima** discussed in Subsection 5.2.2 is circumvented.

   4. **Limitations of the Approximation**:
      
      - **Not a True Sharpe Ratio Maximizer**: This approach **does not** genuinely maximize the Sharpe ratio but rather provides an **approximation** that balances expected returns against linear transaction costs.
      
      - **Dependency on Assumptions**: The validity of the approximation hinges on the **specific assumptions** outlined earlier. Deviations from these assumptions may lead to suboptimal portfolio allocations.

#### Summary

While the introduction of **linear transaction costs** complicates the Sharpe ratio maximization problem, practical approximations enable portfolio managers to efficiently determine near-optimal holdings. By minimizing the "wrong" but **convex** objective function under specific conditions—such as **uniform costs**, **dollar neutrality**, and **diversification constraints**—it is possible to obtain a solution that serves as a **reasonable approximation** to the true Sharpe ratio maximizer. However, it is essential to recognize the **limitations** of this approach and ensure that the underlying assumptions are sufficiently met to justify the approximation's validity.



## 6. Optimization: Costs, Constraints & Bounds

Building upon the foundations laid in previous sections, we now delve into a more comprehensive optimization framework that accounts for **transaction costs**, **constraints**, and **holding bounds**. This section aims to integrate these critical factors into the portfolio optimization process, ensuring that the resulting strategy is both **practical** and **robust**.

### 6.1 Bounds

In real-world portfolio management, imposing **bounds** on holding weights and trading sizes is essential to control **risk**, **leverage**, and **liquidity** constraints. These bounds prevent the portfolio from taking excessively large positions in any single asset and limit the extent of trading, thereby managing **transaction costs** and **market impact**.

#### Defining Holding and Trading Bounds

We typically impose two types of bounds:

1. **Position Bounds**: These limits cap the absolute value of holding weights $ w_i $ to prevent over-concentration in any particular asset.
2. **Trading Bounds**: These limits restrict the magnitude of changes in holding weights $ w_i $ to manage trading activity and associated costs.

Mathematically, these constraints can be expressed as:

$$
\begin{align*}
& |w_i| \leq \xi \widetilde{v}_i \tag{174} \\
& |w_i - w_i^*| \leq \widetilde{\xi} \widetilde{v}_i \tag{175} \\
& \widetilde{v}_i \equiv \frac{v_i}{I} \tag{176}
\end{align*}
$$

- **$ v_i $**: Represents the **20-day average daily dollar volume** for stock $ i $, serving as a proxy for **liquidity**.
- **$ \xi $** and **$ \widetilde{\xi} $**: Positive constants specifying the **maximum allowable position** and **maximum allowable trading** as a fraction of $ \widetilde{v}_i $, respectively.
- **$ I $**: Denotes the **investment level**, a scaling factor to transition from dollar holdings $ D_i $ to dimensionless holding weights $ w_i $.

#### Translating Bounds into $ x_i $ Variables

To simplify the optimization process, we introduce a change of variables:

$$
x_i \equiv w_i - w_i^* \tag{167}
$$

This transformation shifts our focus from the absolute holdings $ w_i $ to the **changes in holdings** $ x_i $, facilitating the incorporation of trading bounds.

With this substitution, the bounds become:

$$
\begin{align*}
x_i^+ &= \min\left(\widetilde{\xi} \widetilde{v}_i, \xi \widetilde{v}_i - w_i^*\right) \geq 0 \tag{177} \\
x_i^- &= \max\left(-\widetilde{\xi} \widetilde{v}_i, -\xi \widetilde{v}_i - w_i^*\right) \leq 0 \tag{178}
\end{align*}
$$

- **$ x_i^+ $**: The **upper bound** on $ x_i $, ensuring that the desired increase in holding $ w_i $ does not exceed the lesser of the trading limit $ \widetilde{\xi} \widetilde{v}_i $ or the remaining position capacity $ \xi \widetilde{v}_i - w_i^* $.
- **$ x_i^- $**: The **lower bound** on $ x_i $, ensuring that the desired decrease in holding $ w_i $ does not exceed the lesser of the trading limit $ \widetilde{\xi} \widetilde{v}_i $ or the remaining short position capacity $ -\xi \widetilde{v}_i - w_i^* $.

**Assumption**: We assume that the **current holdings** satisfy the position bounds:

$$
|w_i^*| \leq \xi \widetilde{v}_i \quad \forall i \tag{179}
$$

This ensures feasibility, meaning that existing positions do not violate the imposed position limits.

#### Simplifying the Bounds

To reduce notational complexity and facilitate optimization, we make the following additional assumption:

$$
x_i^+ > 0 \quad \text{and} \quad x_i^- < 0 \quad \forall i \tag{180}
$$

**Implications**:

- **Positive Upper Bounds**: $ x_i^+ > 0 $ ensures that there is room to **increase** the holding in each asset.
- **Negative Lower Bounds**: $ x_i^- < 0 $ ensures that there is room to **decrease** the holding in each asset.

**Handling Special Cases**:

- **Short-Sale Restrictions**: In scenarios where short-selling is restricted, $ x_i^- $ may be set to zero or a small negative value. However, rather than strictly enforcing $ x_i^- = 0 $, it's more practical to set $ x_i^- $ to a small negative number (e.g., $ x_i^- = -\epsilon $, where $ \epsilon $ is a minimal tolerance level) to accommodate numerical precision during optimization.

**Rationale**:

- **Flexibility**: Allowing $ x_i^- $ to be slightly negative rather than strictly zero provides flexibility in the optimization algorithm, preventing abrupt halting when a variable is exactly zero.
- **Numerical Stability**: This approach enhances numerical stability, especially in high-dimensional optimization problems where exact zeros can introduce computational challenges.

#### Practical Implementation of Bounds

When implementing these bounds in an optimization algorithm, especially in high-dimensional settings (e.g., $ N \gtrsim 1000 $), it's crucial to:

1. **Precompute Bound Values**:
   
   - Calculate $ x_i^+ $ and $ x_i^- $ for each asset $ i $ based on the current holdings $ w_i^* $ and liquidity measures $ v_i $.
   
2. **Incorporate Bounds into the Optimization Solver**:
   
   - Most optimization solvers (e.g., **CVXOPT**, **Gurobi**, **CPLEX**) allow specifying **variable bounds**. Input $ x_i^+ $ and $ x_i^- $ as the upper and lower bounds for each $ x_i $ variable, respectively.
   
3. **Ensure Feasibility**:
   
   - Verify that the initial holdings $ w_i^* $ satisfy the position bounds $ |w_i^*| \leq \xi \widetilde{v}_i $. If not, adjustments may be necessary before proceeding with optimization.
   
4. **Handle Boundary Conditions**:
   
   - For assets where $ x_i^+ $ or $ x_i^- $ are at their extreme values (e.g., $ x_i^+ = \xi \widetilde{v}_i - w_i^* $), ensure that the optimization algorithm can handle these boundaries without numerical issues.

#### Example Scenario

Consider a portfolio with the following parameters:

- **Number of Stocks**: $ N = 1000 $
- **Liquidity Measure**: $ v_i $ is the 20-day average daily dollar volume for each stock.
- **Investment Level**: $ I = \$1,000,000 $
- **Position Bound Parameter**: $ \xi = 0.05 $ (i.e., no more than 5% of the investment level in any single stock)
- **Trading Bound Parameter**: $ \widetilde{\xi} = 0.02 $ (i.e., no more than 2% change in holding weight per trading period)
- **Current Holdings**: $ w_i^* $ are determined based on a previously optimized portfolio satisfying $ \sum |w_i^*| = 1 $ and $ \sum w_i^* = 0 $ (dollar neutrality).

**Calculations**:

For each stock $ i $:

$$
\begin{align*}
\widetilde{v}_i &= \frac{v_i}{I} \\
x_i^+ &= \min(0.02 \times \widetilde{v}_i, 0.05 \times \widetilde{v}_i - w_i^*) \\
x_i^- &= \max(-0.02 \times \widetilde{v}_i, -0.05 \times \widetilde{v}_i - w_i^*)
\end{align*}
$$

These bounds ensure that:

- **Position Limits**: Holdings do not exceed 5% of the investment level.
- **Trading Limits**: Changes in holdings do not exceed 2% of the investment level per trading period.

**Optimization Variables**:

- **$ x_i $**: Change in holding weight for each stock.
- **$ w_i $**: Desired holding weight, where $ w_i = w_i^* + x_i $.

**Constraints**:

$$
\begin{align*}
& \sum_{i=1}^{N} |w_i| = 1 \quad (\text{Normalization}) \\
& \sum_{i=1}^{N} w_i Y_{ia} = 0 \quad \forall a = 1, \ldots, m \quad (\text{Homogeneous Linear Constraints}) \\
& x_i^- \leq x_i \leq x_i^+ \quad \forall i = 1, \ldots, N \quad (\text{Trading Bounds})
\end{align*}
$$

**Objective Function**:

$$
\widetilde{g}(x, \mu, \lambda) = \frac{\lambda}{2} \sum_{i,j=1}^{N} C_{ij} x_i x_j - \sum_{i=1}^{N} \left( \rho_i x_i - L_i |x_i| \right) - \sum_{a=1}^{m} \sum_{i=1}^{N} x_i Y_{ia} \mu_a \tag{168}
$$

Where:

$$
\begin{align*}
\rho_i &\equiv R_i - \lambda \sum_{j=1}^{N} C_{ij} w_j^* \tag{171} \\
x_i^\pm &\equiv w_i^\pm - w_i^* \tag{172}
\end{align*}
$$

**Assumptions**:

- **Current Holdings Compliance**:
  
  $$
  \sum_{i=1}^{N} w_i^* Y_{ia} = 0 \quad \forall a = 1, \ldots, m \tag{173}
  $$
  
  This ensures that the **current portfolio** already satisfies the homogeneous linear constraints, allowing for smoother transitions to the desired portfolio.

**Objective Function Minimization**:

The optimization seeks to **minimize** the objective function $ \widetilde{g}(x, \mu, \lambda) $ with respect to:

- **$ x_i $**: Changes in holding weights.
- **$ \mu_a $**: Lagrange multipliers associated with the homogeneous linear constraints.
- **$ \lambda $**: Scaling parameter to enforce the normalization condition.

This formulation encapsulates the trade-off between **maximizing expected returns** and **minimizing transaction costs**, all while adhering to specified **constraints** and **bounds**.

#### Summary of Key Points

1. **Variable Transformation**:
   
   - Shifting from $ w_i $ to $ x_i = w_i - w_i^* $ simplifies the incorporation of trading bounds and facilitates the optimization process.

2. **Imposing Bounds**:
   
   - **Position Bounds**: Control the **size** of holdings to manage **concentration risk**.
   - **Trading Bounds**: Limit the **extent of trading** to manage **transaction costs** and **market impact**.

3. **Normalization and Constraints**:
   
   - **Normalization**: Ensures that the total portfolio size remains consistent.
   - **Homogeneous Linear Constraints**: Enforce **risk management** objectives such as **dollar neutrality** and **factor neutrality**.

4. **Objective Function**:
   
   - Balances **expected returns** against **transaction costs**, with the covariance matrix $ C_{ij} $ capturing the **risk** associated with holding weights $ x_i $.

5. **Optimization Parameters**:
   
   - **$ \lambda $**: A parameter that scales the risk term to ensure the normalization condition is met.
   - **$ \mu_a $**: Multipliers that enforce the homogeneous linear constraints.

6. **Assumptions for Practical Optimization**:
   
   - **Large Portfolio Size**: Ensures that individual position and trading bounds have a **diluted impact** on the overall portfolio.
   - **Uniform Transaction Costs**: Simplifies the optimization problem, making it more tractable.
   - **Low Correlation Between Current Holdings and Desired Trades**: Allows for the approximation where certain terms in the objective function can be neglected, facilitating the use of simplified optimization methods.

#### Next Steps

With the foundation laid in this subsection, the subsequent sections will explore:

- **6.2 Optimization Techniques**: Delving into specific algorithms and methods to solve the formulated optimization problem efficiently.
- **6.3 Handling Nonlinear Transaction Costs**: Extending the framework to accommodate more complex cost structures beyond linear costs.
- **6.4 Practical Considerations and Extensions**: Discussing real-world challenges, such as **short-sale restrictions**, **integer constraints**, and **robust optimization** techniques to enhance portfolio resilience.

By systematically integrating **transaction costs**, **constraints**, and **bounds**, this optimization framework aims to deliver **practical** and **effective** portfolio strategies that balance **return objectives** with **operational realities**.

### Footnotes

1. **Footnote 18**:
   
   > For any $ \lambda'' > 0 $, there exists a **unique optimum** assuming the covariance matrix $ C_{ij} $ is **positive-definite** and all transaction costs $ L_i \geq 0 $.

2. **Footnote 19**:
   
   > We will discuss bounds below. Alternatively, one can **normalize** returns to achieve similar effects.

3. **Footnote 20**:
   
   > This argument also applies to **partially establishing** and **liquidating trades** with $ \xi \lesssim 1 $, i.e., $ \xi \equiv \sum_{i=1}^{N} |w_i^*| $ does not need to equal 1. The uniformity of $ L_i $ can also be relaxed, albeit with some care.

---

### 6.2 Optimization: General Case

In the realm of portfolio optimization, incorporating **transaction costs**, **constraints**, and **bounds** introduces significant complexity. This subsection delves into the **general optimization problem** that accommodates these factors, laying the groundwork for robust and realistic portfolio construction.

#### Defining Subsets of Assets

To systematically approach the optimization problem, we categorize the assets based on their trading and holding behaviors. Specifically, we define the following subsets of the asset index $ i = 1, \ldots, N $:

$$
\begin{align*}
& x_i \neq 0, \quad i \in J \tag{179} \\
& x_i = 0, \quad i \in J' \tag{180} \\
& x_i = x_i^+ > 0, \quad i \in J^+ \subset J \tag{181} \\
& x_i = x_i^- < 0, \quad i \in J^- \subset J \tag{182} \\
& \bar{J} \equiv J^+ \cup J^- \tag{183} \\
& \widetilde{J} \equiv J \backslash \bar{J} \tag{184} \\
& \eta_i \equiv \operatorname{sign}(x_i), \quad i \in J \tag{185}
\end{align*}
$$

**Explanation of Subsets:**

- **$ J $**: Set of assets where the holding change $ x_i $ is **non-zero**.
- **$ J' $**: Set of assets where the holding change $ x_i $ is **zero**; no trading occurs.
- **$ J^+ $**: Subset of $ J $ where the holding change $ x_i $ is **positive**, indicating an **increase** in position.
- **$ J^- $**: Subset of $ J $ where the holding change $ x_i $ is **negative**, indicating a **decrease** in position.
- **$ \bar{J} $**: Union of $ J^+ $ and $ J^- $, representing all assets with **active trading**.
- **$ \widetilde{J} $**: Set difference $ J \backslash \bar{J} $, representing assets with **non-boundary** holding changes.
- **$ \eta_i $**: Sign function of $ x_i $, indicating the **direction** of the holding change.

#### Global Minimum Condition

The optimization seeks to find the **global minimum** of the objective function $ \widetilde{g}(x, \mu, \lambda) $ under the imposed constraints and bounds. Formally, the global minimum condition is expressed as:

$$
\widetilde{g}\left(x', \mu', \lambda\right) \bigg|_{x_i' = x_i + \epsilon_i, \mu_a' = \mu_a + \epsilon_a} \geq \widetilde{g}\left(x_i, \mu_a, \lambda\right) \tag{186}
$$

**Interpretation:**

- **Perturbation**: The portfolio weights $ x_i $ and Lagrange multipliers $ \mu_a $ are perturbed by small amounts $ \epsilon_i $ and $ \epsilon_a $, respectively.
- **Condition**: The objective function $ \widetilde{g} $ must not decrease under any such perturbation, ensuring a **local (and global) minimum**.

**Constraints on Perturbations:**

$$
\begin{array}{ll}
\epsilon_i \leq 0, & i \in J^+ \\
\epsilon_i \geq 0, & i \in J^- \tag{188}
\end{array}
$$

**Explanation:**

- For assets in $ J^+ $ (where $ x_i = x_i^+ > 0 $), any perturbation $ \epsilon_i $ must **decrease** $ x_i $ (i.e., $ \epsilon_i \leq 0 $).
- For assets in $ J^- $ (where $ x_i = x_i^- < 0 $), any perturbation $ \epsilon_i $ must **increase** $ x_i $ (i.e., $ \epsilon_i \geq 0 $).

#### Expanding the Global Minimum Condition

Expanding the global minimum condition (equation (186)) involves analyzing how small changes $ \epsilon_i $ and $ \epsilon_a $ affect the objective function. The detailed expansion leads to:

$$
\begin{align*}
& \frac{\lambda}{2} \sum_{i,j=1}^{N} C_{ij} \epsilon_i \epsilon_j + \sum_{i \in J} L_i \left( |x_i + \epsilon_i| - |x_i| - \eta_i \epsilon_i \right) \\
& - \sum_{a=1}^{m} \sum_{i=1}^{N} \epsilon_i Y_{ia} \epsilon_a - \sum_{a=1}^{m} \sum_{i=1}^{N} x_i Y_{ia} \epsilon_a + \sum_{j=1}^{N} \left( \lambda \sum_{i \in J} C_{ij} x_i - \rho_j + L_j \eta_j - \sum_{a=1}^{m} Y_{ja} \mu_a \right) \epsilon_j \geq 0 \tag{189}
\end{align*}
$$

**Components Explained:**

1. **Quadratic Term**:
   
   $$
   \frac{\lambda}{2} \sum_{i,j=1}^{N} C_{ij} \epsilon_i \epsilon_j
   $$
   
   - Represents the **second-order** (quadratic) change in the objective function due to perturbations.
   - Since $ C_{ij} $ is **positive-definite**, this term is **positive semi-definite**.

2. **Absolute Value Term**:
   
   $$
   \sum_{i \in J} L_i \left( |x_i + \epsilon_i| - |x_i| - \eta_i \epsilon_i \right)
   $$
   
   - Accounts for the **linear transaction costs**.
   - The expression $ |x_i + \epsilon_i| - |x_i| - \eta_i \epsilon_i $ captures the **change in transaction costs** due to the perturbation.
   - This term is **positive semi-definite** because the absolute value function is **convex**.

3. **Interaction Term**:
   
   $$
   - \sum_{a=1}^{m} \sum_{i=1}^{N} \epsilon_i Y_{ia} \epsilon_a
   $$
   
   - Represents the **interaction** between perturbations in holding weights $ \epsilon_i $ and the Lagrange multipliers $ \epsilon_a $.
   - This term is **quadratic** in perturbations.

4. **Linear Term**:
   
   $$
   - \sum_{a=1}^{m} \sum_{i=1}^{N} x_i Y_{ia} \epsilon_a
   $$
   
   - Captures the **linear dependency** of the objective function on the perturbations $ \epsilon_a $ through the existing holdings $ x_i $.
   - This term is **linear** in $ \epsilon_a $.

5. **Constraint-Dependent Term**:
   
   $$
   \sum_{j=1}^{N} \left( \lambda \sum_{i \in J} C_{ij} x_i - \rho_j + L_j \eta_j - \sum_{a=1}^{m} Y_{ja} \mu_a \right) \epsilon_j
   $$
   
   - Incorporates the **constraints** into the perturbation analysis.
   - This term must satisfy the inequality $ \geq 0 $ for the objective function to be at a **minimum**.

#### Deriving Optimality Conditions

From the expanded global minimum condition (equation (189)), we derive the **first-order optimality conditions** by considering infinitesimal perturbations. Neglecting the second-order terms (quadratic in $ \epsilon_i $), we focus on the **linear terms**:

$$
\lambda \sum_{j \in J} C_{ij} x_j - \rho_i + L_i \eta_i - \sum_{a=1}^{m} Y_{ia} \mu_a = 0 \quad \forall i \in \widetilde{J} \tag{192}
$$

Additionally, the **constraints** must be satisfied:

$$
\sum_{i=1}^{N} x_i Y_{ia} = \sum_{i \in J} x_i Y_{ia} = 0 \quad \forall a = 1, \ldots, m \tag{193}
$$

For assets **not** in $ \widetilde{J} $ (i.e., assets at their bounds), the following **inequalities** must hold:

$$
\begin{align*}
& \forall j \in J': \quad \left| \lambda \sum_{i \in J} C_{ij} x_i - \rho_j - \sum_{a=1}^{m} Y_{ja} \mu_a \right| \leq L_j \tag{194} \\
& \forall j \in J^+: \quad \lambda \sum_{i \in J} C_{ij} x_i - \rho_j - \sum_{a=1}^{m} Y_{ja} \mu_a \leq -L_j \tag{195} \\
& \forall j \in J^-: \quad \lambda \sum_{i \in J} C_{ij} x_i - \rho_j - \sum_{a=1}^{m} Y_{ja} \mu_a \geq L_j \tag{196}
\end{align*}
$$

**Interpretation of Conditions:**

1. **Equation (192)**:
   
   - For assets in $ \widetilde{J} $ (i.e., those not at their trading bounds), the **first-order derivative** of the objective function with respect to $ x_i $ must be **zero**.
   - This ensures that the objective function is **stationary** with respect to permissible perturbations.

2. **Equation (193)**:
   
   - Enforces the **homogeneous linear constraints**, ensuring that the portfolio maintains desired exposures (e.g., **dollar neutrality**, **factor neutrality**).

3. **Equations (194)-(196)**:
   
   - Apply to assets **at their bounds** ($ J' $, $ J^+ $, $ J^- $), ensuring that any perturbations do not violate the **holding and trading constraints**.
   - These inequalities ensure that the **marginal benefit** of adjusting holdings does not **exceed the transaction costs**.

#### Challenges in Finding the Global Optimum

The optimization problem outlined is inherently **complex** due to several factors:

1. **Combinatorial Explosion**:
   
   - The identification of the appropriate subsets $ J' $, $ J^+ $, and $ J^- $ involves determining which assets are **active** (i.e., at their bounds) and which are **inactive**.
   - Given $ N $ assets, there are **$ 3^N $** possible combinations of these subsets, making exhaustive search **computationally infeasible** for large $ N $.

2. **Nonlinearity and Non-Convexity**:
   
   - The presence of **absolute values** and **transaction costs** introduces **nonlinearities** into the optimization problem.
   - Even with convex cost functions, the overall problem may exhibit **non-convexity**, especially when considering **constraints** and **bounds**.

3. **Dependency on Lagrange Multipliers**:
   
   - The optimal solution depends on the values of the **Lagrange multipliers** $ \mu_a $ and the parameter $ \lambda $.
   - These multipliers are interdependent and must be determined **simultaneously** with the holding weights $ w_i $, adding another layer of complexity.

4. **Numerical Stability and Efficiency**:
   
   - High-dimensional optimization (e.g., $ N \gtrsim 1000 $) demands **efficient algorithms** that can handle large-scale computations while maintaining **numerical stability**.
   - **Iterative methods** and **heuristic algorithms** may be required to navigate the optimization landscape effectively.

#### Practical Implications

Given the **theoretical** challenges, practical implementations of this optimization problem necessitate **approximations** and **algorithmic strategies** to obtain **near-optimal** solutions within reasonable computational timeframes. These strategies often involve:

- **Active-Set Methods**: Identifying and focusing on a subset of active constraints (i.e., assets at their bounds) to reduce the problem's dimensionality.
- **Convex Relaxations**: Approximating the non-convex problem with a **convex** one to leverage efficient optimization solvers.
- **Heuristic Algorithms**: Employing methods such as **genetic algorithms**, **simulated annealing**, or **particle swarm optimization** to explore the solution space more effectively.
- **Parallel Computing**: Utilizing **parallelization** to distribute computations across multiple processors, thereby enhancing computational speed.

#### Summary

The **general optimization problem** in portfolio management, which accounts for **transaction costs**, **constraints**, and **bounds**, presents significant **theoretical** and **practical** challenges. The intricate interplay between holding weights, transaction costs, and constraints leads to a **highly nonlinear** and **non-convex** optimization landscape, especially as the number of assets $ N $ increases. Addressing these challenges requires a combination of **mathematical rigor** and **computational ingenuity**, leveraging advanced **optimization techniques** and **algorithmic strategies** to derive **practical** and **effective** portfolio solutions.

In the subsequent subsections, we will explore **specific optimization techniques** tailored to this complex problem structure, discuss methods to handle **nonlinear transaction costs**, and examine strategies to ensure **computational efficiency** and **numerical stability** in large-scale portfolio optimizations.

### Footnotes

1. **Footnote 18**:
   
   > For any $ \lambda'' > 0 $, there exists a **unique optimum** assuming the covariance matrix $ C_{ij} $ is **positive-definite** and all transaction costs $ L_i \geq 0 $.

2. **Footnote 19**:
   
   > We will discuss bounds below. Alternatively, one can **normalize** returns to achieve similar effects.

3. **Footnote 20**:
   
   > This argument also applies to **partially establishing** and **liquidating trades** with $ \xi \lesssim 1 $, i.e., $ \xi \equiv \sum_{i=1}^{N} |w_i^*| $ does not need to equal 1. The uniformity of $ L_i $ can also be relaxed, albeit with some care.

---

### 6.3 Optimization: Factor Model

This complexity can be mitigated by assuming a **factor model** form for the covariance matrix $ C_{ij} $:

$$
C_{ij} = \Theta_{ij} \equiv \xi_i^{2} \delta_{ij} + \sum_{A=1}^{K} \Omega_{iA} \Omega_{jA} \tag{198}
$$

Here, any values of $ A $ such that the corresponding column of $ \Omega_{iA} $ is a **linear combination** of the columns of $ Y_{ia} $ must be **omitted** (with the **specific risk** untouched). This omission is necessary because, in equations (192), (194), (195), and (196), $ C_{ij} $ appears only in the combination:

$$
\begin{align*}
\sum_{j \in J} C_{ij} x_j &= \xi_i^{2} x_i + \sum_{A=1}^{K} \Omega_{iA} \sum_{j \in J} x_j \Omega_{jA}, \quad i \in J \tag{199} \\
\sum_{j \in J} C_{ij} x_j &= \sum_{A=1}^{K} \Omega_{iA} \sum_{j \in J} x_j \Omega_{jA}, \quad i \in J' \tag{200}
\end{align*}
$$

If any column in $ \Omega_{iA} $ is a **linear combination** of the columns of $ Y_{ia} $, its contribution vanishes due to equation (193). Therefore, we assume that all such columns in $ \Omega_{iA} $, if any, are **omitted**.

The optimization problem **reduces** to solving a $ (K + m) $-dimensional system. Let:

$$
v_A \equiv \sum_{i=1}^{N} x_i \Omega_{iA} = \sum_{i \in J} x_i \Omega_{iA}, \quad A = 1, \ldots, K \tag{201}
$$

Further, define $ H \equiv \{a\} \cup \{A\} $. Let $ \widehat{\Omega}_{i\alpha} $ for $ \alpha \in H $ be the following $ N \times (K + m) $ matrix:

$$
\begin{align*}
\widehat{\Omega}_{i a} &\equiv Y_{ia} \tag{202} \\
\widehat{\Omega}_{i A} &\equiv \Omega_{iA} \tag{203}
\end{align*}
$$

Let $ u_\alpha $ be the following $ (K + m) $-vector:

$$
\begin{align*}
u_a &\equiv -\frac{1}{\lambda} \mu_a \tag{204} \\
u_A &\equiv v_A \tag{205}
\end{align*}
$$

From equations (192), (193), and (201), we have:

$$
\begin{align*}
x_i &= \frac{1}{\lambda \xi_i^{2}} \left( \rho_i - L_i \eta_i - \lambda \sum_{\alpha \in H} \widehat{\Omega}_{i\alpha} u_\alpha \right), \quad i \in \widetilde{J} \tag{206} \\
\sum_{i \in J} x_i \widehat{\Omega}_{i\alpha} &= \sum_{\beta \in H} \varphi_{\alpha\beta} u_\beta \tag{207}
\end{align*}
$$

where $ \varphi_{\alpha\beta} $ is the following symmetric $ (K + m) \times (K + m) $ matrix:

$$
\begin{align*}
\varphi_{AB} &\equiv \delta_{AB} \tag{208} \\
\varphi_{A b} &= 0 \tag{209} \\
\varphi_{a b} &= 0 \tag{210}
\end{align*}
$$

Recalling that:

$$
x_i \eta_i > 0, \quad i \in \widetilde{J} \tag{211}
$$

we obtain:

$$
\begin{align*}
\eta_i &= \operatorname{sign}\left( \rho_i - \lambda \sum_{\alpha \in H} \widehat{\Omega}_{i\alpha} u_\alpha \right), \quad i \in \widetilde{J} \tag{212} \\
\forall i \in J^+: \quad \rho_i - \lambda \sum_{\alpha \in H} \widehat{\Omega}_{i\alpha} u_\alpha &\geq L_i + \lambda \xi_i^{2} x_i^+ \equiv L_i^+ \tag{213} \\
\forall i \in J^-: \quad \rho_i - \lambda \sum_{\alpha \in H} \widehat{\Omega}_{i\alpha} u_\alpha &\leq -L_i + \lambda \xi_i^{2} x_i^- \equiv -L_i^- \tag{214} \\
\forall i \in \widetilde{J}: \quad \left| \rho_i - \lambda \sum_{\alpha \in H} \widehat{\Omega}_{i\alpha} u_\alpha \right| &> L_i \tag{215} \\
\forall i \in J': \quad \left| \rho_i - \lambda \sum_{\alpha \in H} \widehat{\Omega}_{i\alpha} u_\alpha \right| &\leq L_i \tag{216}
\end{align*}
$$

These inequalities define $ J^+ $, $ J^- $, $ \widetilde{J} $, and $ J' $ in terms of the $ (K + m) $ unknowns $ u_\alpha $. Note that $ L_i^\pm > L_i $ for $ i \in J^\pm $, and if we take $ x_i^\pm \rightarrow \pm \infty $, we obtain empty $ J^\pm $.

Substituting equation (206) into equation (207), we derive the following system of $ (K + m) $ equations for the $ (K + m) $ unknowns $ u_\alpha $:

$$
\sum_{\beta \in H} \widehat{Q}_{\alpha\beta} u_\beta = y_\alpha \tag{217}
$$

where:

$$
\begin{align*}
\widehat{Q}_{\alpha\beta} &\equiv \varphi_{\alpha\beta} + \sum_{i \in \widetilde{J}} \frac{\widehat{\Omega}_{i\alpha} \widehat{\Omega}_{i\beta}}{\xi_i^{2}} \tag{218} \\
y_\alpha &\equiv \frac{1}{\lambda} \sum_{i \in \widetilde{J}} \frac{\widehat{\Omega}_{i\alpha}}{\xi_i^{2}} \left( \rho_i - L_i \eta_i \right) + \sum_{i \in J^+} x_i^+ \widehat{\Omega}_{i\alpha} + \sum_{i \in J^-} x_i^- \widehat{\Omega}_{i\alpha} \tag{219}
\end{align*}
$$

Thus, the solution for $ u_\alpha $ is given by:

$$
u_\alpha = \sum_{\beta \in H} \widehat{Q}_{\alpha\beta}^{-1} y_\beta \tag{220}
$$

**Note**: $ \widehat{Q}^{-1} $ denotes the inverse of the matrix $ \widehat{Q} $.

Equation (220) solves for $ u_\alpha $ given $ \eta_i $, $ J^+ $, $ J^- $, $ \widetilde{J} $, and $ J' $. Conversely, equations (212), (213), (214), (215), and (216) determine $ \eta_i $, $ J^+ $, $ J^- $, $ \widetilde{J} $, and $ J' $ in terms of $ u_\alpha $. The entire system is then solved **iteratively**, starting with an initial guess.

#### Iterative Solution Procedure

1. **Initial Iteration**:
   
   - **Set**: $ \widetilde{J}^{(0)} = \{1, \ldots, N\} $
   - **Initial Active Sets**: $ J^{+(0)} = J^{-(0)} = \emptyset $
   - **Initial Signs**: 
     $$
     \eta_i^{(0)} = \pm 1, \quad i = 1, \ldots, N \tag{221}
     $$
     While the values of $ \eta_i^{(0)} $ can be **arbitrary**, choosing:
     $$
     \eta_i^{(0)} = \operatorname{sign}(\rho_i), \quad i = 1, \ldots, N \tag{222}
     $$
     generally **enhances** the **convergence speed**.

2. **Convergence Enhancement Trick**:
   
   - Define $ \widehat{x}_i^{(s)} $ such that:
     $$
     \begin{align*}
     & x_i^- \leq \widehat{x}_i^{(s)} \leq x_i^+ \quad \forall i \tag{223} \\
     & \sum_{i=1}^{N} \widehat{x}_i^{(s)} Y_{ia} = 0, \quad a = 1, \ldots, m \tag{224}
     \end{align*}
     $$
   
   - Let $ x_i^{(s+1)} $ be the solution obtained at the $ (s+1) $-th iteration, which **satisfies** the linear constraints but **may not** satisfy the bounds.
   
   - Define:
     $$
     \begin{align*}
     & q_i \equiv x_i^{(s+1)} - \widehat{x}_i^{(s)} \tag{225} \\
     & h_i(t) \equiv \widehat{x}_i^{(s)} + t q_i, \quad t \in [0, 1] \tag{226}
     \end{align*}
     $$
   
   - Then set:
     $$
     \widehat{x}_i^{(s+1)} \equiv h_i(t_*) = \widehat{x}_i^{(s)} + t_* q_i \tag{227}
     $$
     where $ t_* $ is the **maximal** value of $ t $ such that $ h_i(t) $ satisfies the bounds:
     $$
     \begin{align*}
     & q_i > 0: \quad p_i \equiv \min(x_i^{(s+1)}, x_i^+) \tag{228} \\
     & q_i < 0: \quad p_i \equiv \max(x_i^{(s+1)}, x_i^-) \tag{229} \\
     & t_* = \min \left( \left. \frac{p_i - \widehat{x}_i^{(s)}}{q_i} \right| q_i \neq 0, \quad i = 1, \ldots, N \right) \tag{230}
     \end{align*}
     $$
   
   - Update active sets $ J^+ $ and $ J^- $ based on the newly adjusted $ \widehat{x}_i $:
     $$
     \begin{array}{ll}
     \forall i \in J^+: & \widehat{x}_i = x_i^+ \\
     \forall i \in J^-: & \widehat{x}_i = x_i^- \tag{232}
     \end{array}
     $$
     This approach **adds** new elements to the sets $ J^+ $ and $ J^- $ **one (or a few) at a time**, improving the **convergence speed** compared to adding many elements simultaneously.

3. **Convergence Criteria**:
   
   The iterative process **terminates** when the following criteria are met, indicating that the solution has reached the **global optimum**:
   
   $$
   \begin{align*}
   & \widetilde{J}^{(s+1)} = \widetilde{J}^{(s)} \tag{233} \\
   & J^{+(s+1)} = J^{+(s)} \tag{234} \\
   & J^{-(s+1)} = J^{-(s)} \tag{235} \\
   & \forall i \in \widetilde{J}^{(s+1)}: \quad \eta_i^{(s+1)} = \eta_i^{(s)} \tag{236} \\
   & \forall \alpha \in H: \quad u_\alpha^{(s+1)} = u_\alpha^{(s)} \tag{237}
   \end{align*}
   $$
   
   - **First Four Criteria**: Based on **discrete quantities**, these are **unaffected** by computational (machine) precision effects.
   - **Last Criterion**: Based on **continuous quantities**, it is satisfied **within** computational (machine) precision or a **preset tolerance**.

#### Summary

By adopting a **factor model** for the covariance matrix $ C_{ij} $, the optimization problem becomes more **tractable**, reducing the dimensionality from $ N $ to $ (K + m) $. This allows for efficient iterative solutions even in large-scale portfolios. The **iterative procedure** systematically updates the active sets and Lagrange multipliers to converge to the **global optimum**, ensuring that the portfolio maximizes the Sharpe ratio while respecting transaction costs, constraints, and bounds.

---

### Footnotes

1. **Footnote 18**:
   
   > For any $ \lambda'' > 0 $, there exists a **unique optimum** assuming the covariance matrix $ C_{ij} $ is **positive-definite** and all transaction costs $ L_i \geq 0 $.

2. **Footnote 19**:
   
   > We will discuss bounds below. Alternatively, one can **normalize** returns to achieve similar effects.

3. **Footnote 20**:
   
   > This argument also applies to **partially establishing** and **liquidating trades** with $ \xi \lesssim 1 $, i.e., $ \xi \equiv \sum_{i=1}^{N} |w_i^*| $ does not need to equal 1. The uniformity of $ L_i $ can also be relaxed, albeit with some care.

---

## 7 Example: Intraday Mean-Reversion Alpha

In this section, to illustrate our discussion in Section 2, we examine an **intraday mean-reversion alpha**. The following subsections outline the setup, methodology, simulation results, risk management techniques, and practical considerations associated with this example.

---

### 7.1 Notations and Definitions

#### Stock Prices and Time-Series

- **Stock Price $ P_i $**: For each stock labeled by $ i = 1, \ldots, N $, where $ N $ is the number of stocks in our universe.
  
- **Time-Series Data $ P_{is} $**: Represents the stock price for stock $ i $ at trading date $ s = 0, 1, \ldots, M $, with $ s = 0 $ being the most recent date.

#### Price Types

- **Unadjusted Prices**:
  - $ P_{is}^O $: Open price
  - $ P_{is}^C $: Close price

- **Fully Adjusted Prices**:
  - $ P_{is}^{AO} $: Open price adjusted for splits and dividends
  - $ P_{is}^{AC} $: Close price adjusted for splits and dividends

#### Volume and Returns

- **Daily Volume $ V_{is} $**: Unadjusted daily volume in shares for stock $ i $ on date $ s $.

- **Overnight Return $ R_{is} $**:
  
  $$
  R_{is} \equiv \ln\left(\frac{P_{is}^{AO}}{P_{i, s+1}^{AC}}\right) \tag{238}
  $$
  
  *Note: Both prices are fully adjusted.*

---

### 7.2 Loadings Matrix and Residuals

#### Loadings Matrix $ \Lambda_{iA} $

- **Structure**: An $ N \times K $ binary matrix based on the **Bloomberg Industry Classification System (BICS)** sectors, industries, and sub-industries.

- **Clusters**: Binary clusters discussed in Subsection 2.4.

#### Cross-Sectional Regression

- **Procedure**: For each date $ s $, perform a cross-sectional regression of returns $ R_{is} $ over $ \Lambda_{iA} $ with:
  - **No Intercept**$^2$
  - **Unit Weights** (as in equation (23)).

- **Residuals $ \varepsilon_{is} $**: The residuals from the regression are used to specify desired dollar holdings.

#### Desired Dollar Holdings $ D_{is} $

$$
\begin{align*}
D_{is} &= -\varepsilon_{is} \frac{I}{\sum_{j=1}^{N} |\varepsilon_{js}|} \tag{239} \\
\sum_{i=1}^{N} |D_{is}| &= I \tag{240} \\
\sum_{i=1}^{N} D_{is} &= 0 \tag{241}
\end{align*}
$$

- **$ I $**: Intraday investment level, consistent across all dates $ s $.

---

### 7.3 Portfolio Establishment and Liquidation

- **Establishment**: The portfolio is established at the open ($ P_{is}^O $), assuming fills at these open prices.

- **Liquidation**: The portfolio is liquidated at the close ($ P_{is}^C $), assuming fills at these close prices.

- **Assumptions**:
  - **No Transaction Costs or Slippage**: This simplification is for illustrative purposes and does not reflect real-life trading conditions.

#### Daily Profit & Loss (P&L)

$$
\Pi_{is} = D_{is} \left[ \frac{P_{is}^C}{P_{is}^O} - 1 \right] \tag{242}
$$

- **Shares Traded $ Q_{is} $**:
  
  $$
  Q_{is} = \frac{2 |D_{is}|}{P_{is}^O}
  $$
  
  Represents the total shares bought plus sold for establishing and liquidating positions.

---

### 7.4 Universe Selection

#### Average Daily Dollar Volume (ADDV)

$$
A_{is} \equiv \frac{1}{d} \sum_{r=1}^{d} V_{i, s+r} P_{i, s+r}^C \tag{243}
$$

- **$ d = 21 $**: Represents one month (21 trading days).

#### Universe Criteria

- **Top 2000 Tickers by ADDV**: Selected based on the highest average daily dollar volume.

- **Rebalancing Frequency**:
  - **Monthly Rebalancing**: Every 21 trading days to prevent universe selection bias.
  
- **Survivorship Bias Considerations**:
  - **Data Period**: 8/1/2008 through 9/5/2014.
  - **Ticker Selection**: Based on data available as of 9/6/2014 from [Yahoo Finance](http://finance.yahoo.com).
  - **Restrictions**:
    - Only U.S. listed common stocks and class shares.
    - Excludes OTCs, preferred shares, etc.
    - BICS sector, industry, and sub-industry assignments as of 9/6/2014$^6$.
  
- **Note**: Survivorship bias is not a leading effect in this setup.

---

### 7.5 Simulation Results

#### Simulation Parameters

- **Duration**: 5 years ($ M = 252 \times 5 $), with $ s = 0 $ on 9/5/2014.

#### Performance Metrics

- **Annualized Return-on-Capital (ROC)**:
  
  $$
  \text{ROC} = \left( \frac{\text{Average Daily P&L}}{I} \right) \times 252
  $$
  
- **Annualized Sharpe Ratio (SR)**:
  
  $$
  \text{SR} = \text{Daily Sharpe Ratio} \times \sqrt{252}
  $$
  
- **Cents-Per-Share (CPS)**:
  
  $$
  \text{CPS} = \frac{\text{Total P&L}}{\text{Total Shares Traded}}
  $$

#### Results Overview

- **Table 1**: Displays ROC, SR, and CPS for three cluster choices:
  - BICS Sectors
  - BICS Industries
  - BICS Sub-Industries

- **Figure 1**: P&L graphs corresponding to the three cases in Table 1.

---

### 7.6 Risk Management via Residual Normalization

#### Issue Identified

- **Large Residuals $ \varepsilon_{is} $**: Can lead to disproportionate loading on certain stocks, reducing portfolio diversification and lowering Sharpe ratios.

#### Proposed Solution

- **Normalization of Residuals**: Treat large residuals as outliers through a normalization process similar to Winsorization.

#### Normalization Method

1. **Assumption**:
   - $ X_i $ (here, $ \varepsilon_{is} $) follows a normal distribution with mean $ \bar{X} $ and standard deviation $ \chi $.

2. **Deformation**:
   - Transform $ X_i $ to $ \widetilde{X}_i $ such that $ \widetilde{X}_i $ conforms to the normal distribution with the same $ \bar{X} $ and $ \chi $.
   - Example: Use the `normalize()` function as described in Appendix A of Kakushadze and Liew (2014).

3. **Application**:
   - Apply the transformation to residuals $ \varepsilon_{is} $ for each date to obtain $ \widetilde{\varepsilon}_{is} $.

4. **Updated Dollar Holdings**:
   
   $$
   D_{is} = -\widetilde{\varepsilon}_{is} \frac{I}{\sum_{j=1}^{N} |\widetilde{\varepsilon}_{js}|}
   $$
   
   Maintains dollar neutrality while "squashing" outliers.

#### Impact on Performance

- **Table 2**: Presents ROC, SR, and CPS after normalization.
  - **Increase in SR**: Demonstrates improved risk-adjusted returns.
  - **Decrease in ROC and CPS**: Trade-off for enhanced Sharpe ratios.

- **Figure 2**: P&L graphs corresponding to the three cases in Table 2.

---

### 7.7 Practical Caveats and Considerations

#### Definition of "Open"

- **Fuzzy Timing**: Stocks do not always open at 9:30:00 sharp, making simultaneous order placement at the open unrealistic$^7$.

- **Real-Life Adjustment**:
  - **Delayed Execution**: Wait until shortly after the open to compute the alpha for available stocks.
  - **Order Execution**: Send orders and obtain fills based on updated alpha computations.

#### Simulated Strategy Implementation

- **Platform**: [Vynance Portfolio](http://vynance.com/portfolio.html) offers a freely accessible intraday simulated strategy.

- **Strategy Details**:
  - **Establishing Time**: 9:31:30
  - **Liquidating Time**: 15:59:00
  - **Trading Universe**: Varies daily, typically 200-300 tickers for long and short positions.
  
- **Performance Metrics (2/18/2014 - 9/19/2014)**:
  - **ROC**: 29.19%
  - **SR**: 12.13
  - **CPS**: 1.52

- **Assumptions**:
  - **No Trading Costs or Slippage**
  - **Delay-30-Seconds Strategy**: Alpha computed based on 9:31:00 pricing data, which itself is delayed by 5–35 seconds.

- **Long-Term Performance**:
  - **Since Inception (4/14/2011)**:
    - **Annualized Daily Sharpe Ratio**: ~16
    - **Monthly ROC**: ~3%

- **Survivorship Bias**: Indicated as a non-leading effect, as Vynance Portfolio simulations are conducted daily in real time.

---

#### Footnotes

1. **Footnote 21**:
   
   > Note that stocks rarely jump (sub-)industries/sectors, so $ \Lambda_{iA} $ can be assumed to be static.

2. **Footnote 22**:
   
   > More precisely, the intercept is already subsumed in $ \Lambda_{iA} $: $ \Lambda_{iA} = 1 $ if the stock labeled by $ i $ belongs to the cluster labeled by $ A = 1, \ldots, K $; otherwise, $ \Lambda_{iA} = 0 $. Each stock belongs to one and only one cluster. This implies that $ \sum_{A=1}^{K} \Lambda_{iA} = 1 $ for each $ i $, so a linear combination of the columns of $ \Lambda_{iA} $ is the intercept.

3. **Footnote 23**:
   
   > This is a so-called "delay-0" alpha—$ P_{is}^O $ is used in the alpha, and as the establishing fill price.

4. **Footnote 24**:
   
   > I.e., to ensure that our results are not a mere consequence of the universe selection.

5. **Footnote 25**:
   
   > Note that, since the alpha is purely intraday, this "rebalancing" does not generate additional trades, it simply changes the universe that is traded for the next 21 days.

6. **Footnote 26**:
   
   > The number of such tickers in our data is 3,811. The number of BICS sectors is 10. The numbers of BICS industries is 48. The number of BICS sub-industries varies between 164 and 169 (due to small sub-industries, which are affected by the varying top-2000-by-ADDV universe).

7. **Footnote 27**:
   
   > In real life one would have to wait until a little after the open and compute the alpha for the stocks that are open as of that time, then send the orders and get fills.

---

**Notes:**

- **Tables and Figures**: References to Table 1, Table 2, Figure 1, and Figure 2 are maintained as per the original text. Ensure these are properly defined and included in the actual document.

- **Hyperlinks**: URLs have been converted into clickable links for ease of access.

- **Mathematical Notation**: LaTeX equations are preserved using Markdown's LaTeX support for clarity and readability.

- **Footnotes**: Footnotes are clearly listed at the end of the section, corresponding to their respective markers in the text.

- **Clarity and Consistency**: All notations and symbols are kept consistent with the original LaTeX code to maintain the integrity of the technical content.

This structured rendition ensures that the original content is organized clearly, enhancing readability and comprehension while preserving all technical details and references.