# How to lose money fast
In this code we will see how to lose money fast, thinking that we are making money.

1. **Current Market Weights**: In financial markets, the weight of a stock in an index like the S&P 500 is determined by its market capitalization, which is the stock's price multiplied by the number of shares outstanding. These weights change over time as the prices of stocks and the number of shares outstanding change.
2. **Applying Current Weights Historically**: The bias occurs when one uses today's market weights of the index components and applies them retrospectively to a past period (like 10 years ago). This creates a problem because it incorporates information about the companies' future performances and survival that wasn't known at that point in time.
3. **Survivorship Bias**: Part of this bias is specifically related to survivorship. Over time, companies can go bankrupt, get delisted, or otherwise drop out of an index. If you're using today's index components and applying their weights historically, you're only looking at the 'survivors'—the companies that managed to stay in the index until today. This ignores all the companies that failed or underperformed significantly during that period.
4. **Look-Ahead Bias**: The fundamental issue is that you're using information (current market weights) that was not available in the past. In backtesting investment strategies, this can lead to overly optimistic results. A strategy might appear successful when tested with current weights applied historically, but in reality, it wouldn't have been possible to implement that strategy with the information available at the time.
5. **Distorting Historical Performance**: This approach can significantly distort the historical performance of an index. Companies that grew substantially over the years will be overrepresented in the historical analysis, while those that declined or remained stable will be underrepresented.
6. **Implications in Investment Strategy**: For investors or fund managers, this bias can lead to misleading conclusions about the effectiveness of certain investment strategies. It’s crucial to use historical weights to get an accurate understanding of how a strategy would have performed in the past.
In essence, applying current market weights to historical periods in financial analyses leads to anachronistic and potentially misleading results, as it assumes knowledge of the future that wasn't actually available at the time.

In [65]:
import yfinance as yf
import pandas as pd

## Loading the Data

In [2]:
def get_sp_data(start='2008-01-01', end=None):
    # Thanks to https://gist.github.com/paduel/32ac6f0a47f3fae67e414a73b9779e89
    
    # Get the current SP components, and get a tickers list
    sp_assets = pd.read_html(
        'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0]
    assets = sp_assets['Symbol'].str.replace('.', '-').tolist()
    # Download historical data to a multi-index DataFrame
    try:
        data = yf.download(assets, start=start, end=end, auto_adjust=True)
        filename = 'sp_components_data.pkl'
        data.to_pickle(filename)
        print('Data saved at {}'.format(filename))
    except ValueError:
        print('Failed download, try again.')
        data = None
    return data


In [21]:
sp500_data = get_sp_data() # Don't care about failed download, just for demo

  assets = sp_assets['Symbol'].str.replace('.', '-').tolist()


[*********************100%***********************]  503 of 503 completed

7 Failed downloads:
- BMY: No data found for this date range, symbol may be delisted
- CBRE: No data found for this date range, symbol may be delisted
- EL: No data found for this date range, symbol may be delisted
- MKC: No data found for this date range, symbol may be delisted
- HST: No data found for this date range, symbol may be delisted
- PNW: No data found for this date range, symbol may be delisted
- HRL: No data found for this date range, symbol may be delisted
Data saved at sp_components_data.pkl


In [3]:
sp500_data = pd.read_pickle('sp_components_data.pkl') # load data

In [7]:
SP_price = yf.download("^GSPC", start="2015-01-01", end=None, auto_adjust=True)['Close']

[*********************100%***********************]  1 of 1 completed


In [4]:
market_w = pd.read_csv("sp500_tickers_extracted_decimal.csv",index_col=1)['Portfolio%'] # load market weights, scraped from https://www.slickcharts.com/sp500

## Performance evaluation

In [5]:
adj_close = sp500_data['Close'].ffill()
adj_close.index = pd.to_datetime(adj_close.index)

In [9]:
base_prices = adj_close.loc['2015-01-01':].dropna(axis=1) # choose the start date
cum_performance = (base_prices / base_prices.iloc[0]).dropna(axis=1)

In [11]:
df_weights = market_w.loc[cum_performance.columns]
df_weights/=df_weights.sum()

In [57]:
portfolio_perf = (cum_performance*df_weights).sum(1) - 1 
bmrk_perf = (SP_price/SP_price.iloc[0]) - 1
bmrk_perf = bmrk_perf.reindex(portfolio_perf.index)

In [64]:
import plotly.graph_objects as go

fig = go.Figure()

# Adding Portfolio performance area
fig.add_trace(go.Scatter(x=portfolio_perf.index, y=portfolio_perf, mode='lines',
                         name='Portfolio', line=dict(color='#0091d5', width=2.5),
                         fill='tozeroy'))

# Adding Benchmark performance area
fig.add_trace(go.Scatter(x=bmrk_perf.index, y=bmrk_perf, mode='lines',
                         name='Benchmark', line=dict(color='#ea6a47', width=2.5),
                         fill='tozeroy'))

# Updating layout for a better look
fig.update_layout(
    title='Portfolio vs. Benchmark Performance',
    xaxis_title='Date',
    yaxis_title='Performance (%)',
    legend_title='Legend',
    font=dict(
        family="Arial, sans-serif",
        size=12,
        color="Black"
    ),
    xaxis=dict(
        showline=True,
        showgrid=False,
        showticklabels=True,
        linecolor='rgb(204, 204, 204)',
        linewidth=2,
        ticks='outside',
        tickfont=dict(
            family='Arial',
            size=12,
            color='rgb(82, 82, 82)',
        ),
        nticks=12,
    ),
    yaxis=dict(
        showgrid=True,
        zeroline=True,
        showline=True,
        showticklabels=True,
        gridcolor='rgb(204, 204, 204)',
        gridwidth=0.5,
        linecolor='rgb(204, 204, 204)',
        linewidth=2,
        tickformat=',.0%',  # Adding percentage format to y-axis
        nticks=10,
    ),
    autosize=True,
    margin=dict(
        autoexpand=True,
        l=100,
        r=20,
        t=110,
    ),
    showlegend=True,
    plot_bgcolor='white'
)

# Show the plot
fig.show()
