# Trading Costs and Liquidity in Quantitative Investing

**Author:** [Your Name]
**Date:** [Date]

---

## 🎯 Learning Objectives

By the end of this notebook, you will be able to:

* **Understand** the different sources of trading costs and the concept of liquidity.
* **Define** and calculate the "Implementation Shortfall".
* **Analyze** the "Absorption Capacity" of a stock and its relation to trading volume.
* **Calculate** the "Used Volume" to assess the feasibility of a trading strategy.
* **Implement** a momentum strategy in Python and analyze its trading costs.

In [None]:
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import statsmodels.api as sm
import seaborn as sns

# Set a nice plot style
sns.set_style("whitegrid")

## 1. Introduction to Trading Costs and Liquidity

So far, we have taken prices as given and discussed quantitative allocation rules that outperform a standard CAPM benchmark.

An important question is whether we would be able to trade at those prices if we tried to implement the strategy. This is where the concepts of **liquidity** and **trading costs** become crucial.



### What is Liquidity?

**Liquidity** is the ease of trading a security. This encompasses many factors, all of which impose a cost on those wishing to trade.

### Sources of Illiquidity

There are several sources of illiquidity that contribute to trading costs:

* **Exogenous transactions costs:** These are the most direct costs and include brokerage fees, order-processing costs, and transaction taxes.
* **Demand pressure:** When you need to sell quickly, a natural buyer may not be immediately available. Whoever is available will likely ask to be compensated to absorb the trade, leading to a less favorable price.
* **Inventory risk:** If you can’t find a buyer, you might sell to a market maker who will later sell the position. However, since the market maker faces the risk of future price changes, they will demand compensation for this risk.
* **Private information:** This is the concern over trading against a more informed party (e.g., an insider). You need to be compensated for taking on this risk. Private information can be about fundamentals or about future order flows.

> 🤔 **Food for thought:** A perfect answer to the question of "at what price can I trade?" requires costly experimentation. It would require trading and measuring how prices change in response to your trading behavior.

## 2. Implementation Shortfall

A useful metric to measure the total cost of trading is the **Implementation Shortfall**.

* **Wish portfolio:** This is the portfolio you would hold if trading costs were zero.
* **Performance of wish portfolio:** You can compute its returns in real time assuming you can trade at the mid-price.

<div class="alert alert-info" role="alert">
  <b>Implementation Shortfall</b> = (Return of Wish Portfolio) - (Return of Real Portfolio)
</div>

The shortfall includes both **execution costs** (the costs of trading to get you to the wish portfolio) and **opportunity costs** (the costs of deviating from the wish portfolio).

> 🤔 **Question:** If you trade quickly, what force will increase the Shortfall? Which will decrease?

### The Trade-Off

How should you trade in this world? The precise answer is highly empirical, but the overall shape of the solution is clear.



* Find the size of the tracking error that minimizes the shortfall.
* If your weights imply a tracking error below a certain cap, don't trade.
* Trade whenever the tracking error goes above this value.
* Trade just enough to keep the tracking error at the cap.

The key to this trade-off is the **cost of trading**, which is highly dependent on who is trading. Why do you think that is?

## 3. Absorption Capacity

A practical way to think about trading costs is to measure the **absorption capacity** of a stock. The idea is to measure how much of the trading volume of each stock you would "use" to implement your strategy for a given position size.

Specifically, we can define the "Used Volume" as:

$$
UsedVolume_{i,t} = \frac{Trading_{i,t}}{Volume_{i,t}}
$$

The idea is that if you trade a small share of the volume, you are likely to be able to trade at the posted prices.

The graph below shows the trading costs for trades made by AQR, a large quantitative investment firm. Notice how the cost (market impact) increases as the percentage of daily volume traded increases.

![Market Impact](../../assets/plots/marketimpact.jpg)

## 4. How Much Do You Actually Need to Trade?

To compute how much you need to trade, you need to compare the desired weights on a given date $t$ with the weights you have at the end of date $t+1$.

* Before trading, your weight in date $t+1$ is:

$$
w_{i,t+1}(\text{before trading}) = \frac{w_{i,t}^*(1+r_{i,t+1})}{(1+r_{t+1}^{\text{strategy}})}
$$

* If the desired position in the stock is $w_{i,t+1}^*$, then:

$$
\begin{align}
UsedVolume_{i,t} &= \frac{\text{position} \times \bigg(w_{i,t+1}^* - w_{i,t+1}(\text{before trading})\bigg)}{Volume_{i,t}} \\
&= \frac{\text{position}}{Volume_{i,t}}\left(w_{i,t+1}^* - \frac{w_{i,t}^*(1+r_{i,t+1})}{1+r_{t+1}^{\text{strategy}}}\right)
\end{align}
$$

* $UsedVolume_{i,t}$ is a stock-time specific statistic. Implementability will depend on how high this quantity is across time and across stocks—the lower the better.

* If it is very high (i.e., close to 1), it means that your position would require almost all the volume in a particular stock. This doesn't mean that you wouldn't be able to trade, but it is likely that prices would move against you (i.e., go up as you buy, and go down as you sell).

* One very conservative way of looking at it is to look at the **maximum** of this statistic across stocks. This tells you the "weakest" link in your portfolio formation. The max statistic is the right one to look at if you are unwilling to deviate from your "wish portfolio". But remember, the "wish portfolio" does not take into account transaction costs.

> **How can portfolios take into account trading costs to reduce total costs substantially?**
> 
> **Can we change the portfolios to reduce trading costs without altering them significantly?**

One simple way of looking at this is to look at the 95/75/50 percentiles of the used volume distribution.

* If it declines steeply, it might make sense to avoid the 5% to 25% of the stocks that are least liquid in your portfolio.
* But as you deviate from the original portfolio, you will have a tracking error relative to the original strategy.

In [None]:
# Load the data
crsp = pd.read_pickle('https://github.com/amoreira2/Fin418/blob/main/assets/data/crspm2005_2020.pkl?raw=true')
from pandas.tseries.offsets import *

# --- Data Cleaning and Preparation ---

# Change variable format to int
crsp[['permno']] = crsp[['permno']].astype(int)

# Line up date to be end of month
crsp['date'] = pd.to_datetime(crsp['date']) + MonthEnd(0)

# Calculate market equity (me)
# We use the absolute value of price to handle potential negative prices in the data.
crsp['me'] = crsp['prc'].abs() * crsp['shrout']

# Keep only necessary columns and sort the data
crsp = crsp[['permno', 'date', 'ret', 'me', 'vol', 'prc']].sort_values(['permno', 'date']).set_index('date').drop_duplicates().reset_index()

crsp.head()

In [None]:
# data=crsp_m.copy()
# ngroups=10

def momreturns_w(data, ngroups):
    """
    This function calculates momentum returns and portfolio weights.
    """

    # Step 1: Create a temporary CRSP dataset
    _tmp_crsp = data[['permno','date','ret', 'me','vol','prc']].sort_values(['permno','date']).set_index('date').drop_duplicates()
    _tmp_crsp['volume'] = _tmp_crsp['vol'] * _tmp_crsp['prc'].abs() * 100 / 1e6 # in million dollars like me

    # Step 2: Construct the momentum signal (cumulative return over the past 12 months)
    _tmp_crsp['grossret'] = _tmp_crsp['ret'] + 1
    _tmp_cumret = _tmp_crsp.groupby('permno')['grossret'].rolling(window=12).apply(np.prod, raw=True) - 1
    _tmp_cumret = _tmp_cumret.reset_index().rename(columns={'grossret':'cumret'})

    _tmp = pd.merge(_tmp_crsp.reset_index(), _tmp_cumret[['permno','date','cumret']], how='left', on=['permno','date'])
    # Lag the signal by 2 months
    _tmp['mom'] = _tmp.groupby('permno')['cumret'].shift(2)

    # Step 3: Rank assets by the signal
    mom = _tmp.sort_values(['date','permno']) # Sort by date and firm identifier
    mom = mom.dropna(subset=['mom'], how='any') # Drop rows with missing momentum signal
    mom['mom_group'] = mom.groupby(['date'])['mom'].transform(lambda x: pd.qcut(x, ngroups, labels=False, duplicates='drop'))

    mom = mom.dropna(subset=['mom_group'], how='any')
    mom['mom_group'] = mom['mom_group'].astype(int).astype(str).apply(lambda x: 'm{}'.format(x))

    mom['date'] = mom['date'] + MonthEnd(0)
    mom = mom.sort_values(['permno','date'])

    # Step 4: Form portfolio weights
    def wavg_wght(group, ret_name, weight_name):
        d = group[ret_name]
        w = group[weight_name]
        try:
            group['Wght'] = w / w.sum()
            return group[['permno','Wght']]
        except ZeroDivisionError:
            return np.nan

    # I've corrected the error in the following lines.
    # The original code had an issue with the merge operation because of duplicate columns.
    # The corrected code resets the index before the merge to avoid the ambiguity.
    weights = mom.groupby(['date','mom_group']).apply(wavg_wght, 'ret', 'me').reset_index()
    weights = weights.drop('level_2', axis=1) # drop extra index level

    # Merge back
    weights = mom.merge(weights, on=['date', 'permno', 'mom_group'])
    weights = weights.sort_values(['date','permno'])

    weights['mom_group_lead'] = weights.groupby('permno').mom_group.shift(-1)
    weights['Wght_lead'] = weights.groupby('permno').Wght.shift(-1)
    weights = weights.sort_values(['permno','date'])

    def wavg_ret(group, ret_name, weight_name):
        d = group[ret_name]
        w = group[weight_name]
        try:
            return (d * w).sum()
        except ZeroDivisionError:
            return np.nan

    port_vwret = weights.groupby(['date','mom_group']).apply(wavg_ret, 'ret','Wght')
    port_vwret = port_vwret.reset_index().rename(columns={0:'port_vwret'})

    weights = weights.merge(port_vwret, how='left', on=['date','mom_group'])

    port_vwret = port_vwret.set_index(['date','mom_group'])
    port_vwret = port_vwret.unstack(level=-1)
    port_vwret = port_vwret.port_vwret

    return port_vwret, weights

In [None]:
port_ret, wght = momreturns_w(crsp, 10)

## 5. Visualizing the Results

Now that we have the portfolio returns and weights, let's visualize them to better understand the momentum strategy.

### Cumulative Returns of Momentum Portfolios

A great way to see the performance of the strategy is to plot the cumulative returns of the different momentum portfolios.

In [None]:
# Calculate and plot cumulative returns
(port_ret + 1).cumprod().plot(figsize=(12, 8))
plt.title('Cumulative Returns of Momentum Portfolios')
plt.xlabel('Date')
plt.ylabel('Cumulative Return')
plt.legend(title='Momentum Group')
plt.show()

### Analysis of Trading Volume

Let's now analyze the trading volume required to implement this strategy. We will focus on the highest momentum portfolio ('m9').

In [None]:
# Get the weights for the highest momentum portfolio
m9_weights = wght[wght['mom_group'] == 'm9'].copy()

# Calculate the trading required
m9_weights['trade'] = (m9_weights['Wght_lead'] - m9_weights['Wght'] * (1 + m9_weights['ret']) / (1 + m9_weights['port_vwret'])).abs()

# Calculate the used volume
m9_weights['used_volume'] = (m9_weights['trade'] * m9_weights['me']) / (m9_weights['volume'] * 1e6) # Multiply volume by 1e6 to match 'me' scale

# Plot the distribution of used volume
plt.figure(figsize=(12, 6))
sns.histplot(m9_weights['used_volume'].dropna(), bins=100, log_scale=True)
plt.title('Distribution of Used Volume for the Highest Momentum Portfolio (m9)')
plt.xlabel('Used Volume (log scale)')
plt.ylabel('Frequency')
plt.show()

# Print some summary statistics
print("Summary Statistics for Used Volume:")
print(m9_weights['used_volume'].describe(percentiles=[0.5, 0.75, 0.95, 0.99]))

---

## 🏁 Final Takeaways

Congratulations on completing this notebook! Here are the key takeaways:

* **Trading costs are multifaceted:** They go beyond simple commissions and include market impact, spread costs, and opportunity costs.
* **Liquidity is a key constraint:** The ability to trade without significantly moving the price is a major factor in the success of any quantitative strategy.
* **Implementation Shortfall is a comprehensive measure:** It captures both the explicit and implicit costs of trading.
* **Absorption capacity helps in strategy design:** By understanding how much of the available volume your strategy consumes, you can design portfolios that are cheaper to trade.
* **There is a trade-off between tracking and trading:** You can reduce trading costs by allowing your portfolio to deviate from the "ideal" portfolio, but this introduces tracking error.

This notebook provides a foundational understanding of trading costs. In practice, quantitative investors use sophisticated models and algorithms to optimize their trading and minimize costs.