# Cointegration Analysis

**Cointegration** tests whether two or more time series share a long-run equilibrium relationship, even though they may drift apart in the short term. This is particularly useful for:

- **Pairs Trading**: Finding stocks that move together
- **Portfolio Construction**: Identifying stocks with stable relationships
- **Risk Management**: Understanding long-term dependencies


## Quick Cointegration Guide

### What is Cointegration?
- **Definition**: A statistical relationship where two or more price series move together in the long run
- **Purpose**: Find stocks that have stable long-term relationships despite short-term divergences
- **Applications**: Pairs trading, portfolio construction, risk management

### How to Interpret Results:

#### Engle-Granger Test (Pairs):
- **P-value < 0.05**: Strong evidence of cointegration 
- **P-value > 0.05**: No evidence of cointegration 

#### Johansen Test (Multiple stocks):
- **Cointegration rank > 0**: Found long-term relationships 
- **Cointegration rank = 0**: No relationships found 

### Trading Strategies:
1. **Pairs Trading**: When spread deviates from mean, trade the divergence
2. **Portfolio Rebalancing**: Use cointegrated stocks for stable portfolios
3. **Risk Management**: Avoid highly correlated positions without cointegration

### Pro Tips:
- Test different time periods (1y, 2y, 5y)
- Consider sector relationships (tech stocks, bank stocks, etc.)
- Monitor changing relationships over time

## How to Use:
1. Set your stock pairs in the cell below
2. Run the cointegration analysis
3. Interpret the results (p-value < 0.05 suggests cointegration)

In [None]:
# Cointegration Analysis Setup
# Change these stock pairs to test different combinations
stock_pair_1 = 'KO'  # First stock
stock_pair_2 = 'PEP'   # Second stock

# You can also test multiple stocks at once
multi_stocks = ['AAPL', 'MSFT', 'GOOGL', 'AMZN']  # Portfolio of stocks to test

print(f"Setting up cointegration analysis for:")
print(f"Pair analysis: {stock_pair_1} vs {stock_pair_2}")
print(f"Multi-stock analysis: {multi_stocks}")

In [None]:
# Pairwise cointegration analysis
print("PAIRWISE COINTEGRATION ANALYSIS")
print("="*60)

# Fetch data for both stocks
pair_symbols = [stock_pair_1, stock_pair_2]
cointegration_data = {}

for symbol in pair_symbols:
    try:
        data_temp = fetch_stock_data(symbol, "1y")
        if not data_temp.empty:
            cointegration_data[symbol] = data_temp['Close']
        else:
            print(f"Warning: No data found for {symbol}")
    except Exception as e:
        print(f"Error fetching {symbol}: {e}")

# Perform Engle-Granger test if we have both stocks
if len(cointegration_data) >= 2:
    stock1_prices = cointegration_data[stock_pair_1]
    stock2_prices = cointegration_data[stock_pair_2]
    
    # Align the data
    aligned_data = pd.concat([stock1_prices, stock2_prices], axis=1).dropna()
    
    if aligned_data.shape[0] >= 30:
        series1 = aligned_data.iloc[:, 0]
        series2 = aligned_data.iloc[:, 1]
        
        # Perform cointegration test
        score, p_value, critical_values = coint(series1, series2)
        
        # Determine conclusion
        if p_value < 0.01:
            conclusion = "STRONG evidence of cointegration (99% confidence)"
            trading_signal = "EXCELLENT for pairs trading"
        elif p_value < 0.05:
            conclusion = "GOOD evidence of cointegration (95% confidence)"
            trading_signal = "GOOD for pairs trading"
        elif p_value < 0.10:
            conclusion = "WEAK evidence of cointegration (90% confidence)"
            trading_signal = "RISKY for pairs trading"
        else:
            conclusion = 'NO evidence of cointegration'
            trading_signal = 'NOT suitable for pairs trading'
        
        print(f"\nCointegration Test Results:")
        print(f"Test Score: {score:.4f}")
        print(f"P-value: {p_value:.4f}")
        print(f"Conclusion: {conclusion}")
        print(f"Trading Signal: {trading_signal}")
        
        if p_value < 0.05:
            print("These stocks ARE cointegrated!")
        else:
            print("These stocks are NOT cointegrated")
            
        # Plot the analysis
        fig, axes = plt.subplots(2, 2, figsize=(15, 10))
        fig.suptitle(f'Cointegration Analysis: {stock_pair_1} vs {stock_pair_2}', fontsize=16, fontweight='bold')
        
        # Plot 1: Raw prices
        axes[0, 0].plot(series1.index, series1, label=stock_pair_1, linewidth=2)
        axes[0, 0].plot(series2.index, series2, label=stock_pair_2, linewidth=2)
        axes[0, 0].set_title('Price Series')
        axes[0, 0].legend()
        axes[0, 0].grid(True, alpha=0.3)
        
        # Plot 2: Normalized prices
        norm_series1 = (series1 / series1.iloc[0]) * 100
        norm_series2 = (series2 / series2.iloc[0]) * 100
        axes[0, 1].plot(norm_series1.index, norm_series1, label=f'{stock_pair_1} (Normalized)', linewidth=2)
        axes[0, 1].plot(norm_series2.index, norm_series2, label=f'{stock_pair_2} (Normalized)', linewidth=2)
        axes[0, 1].set_title('Normalized Price Series (Base = 100)')
        axes[0, 1].legend()
        axes[0, 1].grid(True, alpha=0.3)
        
        # Plot 3: Scatter plot
        axes[1, 0].scatter(series1, series2, alpha=0.6, s=20)
        slope, intercept, r_value, p_val_reg, std_err = stats.linregress(series1, series2)
        line = slope * series1 + intercept
        axes[1, 0].plot(series1, line, 'r', linewidth=2, label=f'R² = {r_value**2:.3f}')
        axes[1, 0].set_xlabel(f'{stock_pair_1} Price ($)')
        axes[1, 0].set_ylabel(f'{stock_pair_2} Price ($)')
        axes[1, 0].set_title('Price Relationship Scatter Plot')
        axes[1, 0].legend()
        axes[1, 0].grid(True, alpha=0.3)
        
        # Plot 4: Spread
        spread = series2 - (slope * series1 + intercept)
        axes[1, 1].plot(spread.index, spread, color='purple', linewidth=1)
        axes[1, 1].axhline(y=0, color='black', linestyle='--', alpha=0.5)
        axes[1, 1].axhline(y=spread.std(), color='red', linestyle='--', alpha=0.7, label='+1 Std Dev')
        axes[1, 1].axhline(y=-spread.std(), color='red', linestyle='--', alpha=0.7, label='-1 Std Dev')
        axes[1, 1].set_title('Spread (Residuals) - Trading Signal')
        axes[1, 1].legend()
        axes[1, 1].grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
    else:
        print("Error: Not enough data points for reliable cointegration test")
else:
    print("Could not fetch sufficient data for both stocks")

## Multy-Stock Cointegration

In [None]:
# Multi-stock cointegration analysis
print("Multi-stock cointegration analysis")
print("="*60)

# Fetch data for multiple stocks
multi_data = {}
for symbol in multi_stocks:
    try:
        data_temp = fetch_stock_data(symbol, "1y")
        if not data_temp.empty:
            multi_data[symbol] = data_temp['Close']
        else:
            print(f"Warning: No data found for {symbol}")
    except Exception as e:
        print(f"Error fetching {symbol}: {e}")

if len(multi_data) >= 2:
    # Convert to list for easier handling
    price_series = [multi_data[symbol] for symbol in multi_stocks if symbol in multi_data]
    available_stocks = [symbol for symbol in multi_stocks if symbol in multi_data]
    
    print(f"Analyzing {len(available_stocks)} stocks: {', '.join(available_stocks)}")
    
    # Prepare data for Johansen test
    aligned_data = pd.concat(price_series, axis=1).dropna()
    
    if aligned_data.shape[0] >= 50:
        # Perform Johansen test
        johansen_result = coint_johansen(aligned_data, 0, 1)
        trace_stats = johansen_result.lr1
        critical_values_trace = johansen_result.cvt
        
        cointegration_rank = 0
        for i in range(len(trace_stats)):
            if trace_stats[i] > critical_values_trace[i, 1]:
                cointegration_rank = i + 1
                break
        
        print(f"\nJohansen Test Results:")
        print(f"Cointegration Rank: {cointegration_rank}")
        
        # Plot multi-stock analysis
        normalized_data = aligned_data.div(aligned_data.iloc[0]) * 100
        
        fig, axes = plt.subplots(2, 2, figsize=(16, 12))
        
        # Plot 1: Raw prices
        for i, col in enumerate(aligned_data.columns):
            axes[0, 0].plot(aligned_data.index, aligned_data[col], label=available_stocks[i], linewidth=2)
        axes[0, 0].set_title('Raw Price Series')
        axes[0, 0].legend()
        axes[0, 0].grid(True, alpha=0.3)
        
        # Plot 2: Normalized prices
        for i, col in enumerate(normalized_data.columns):
            axes[0, 1].plot(normalized_data.index, normalized_data[col], label=f'{available_stocks[i]} (Norm)', linewidth=2)
        axes[0, 1].set_title('Normalized Price Series (Base = 100)')
        axes[0, 1].legend()
        axes[0, 1].grid(True, alpha=0.3)
        
        # Plot 3: Correlation matrix
        correlation_matrix = aligned_data.corr()
        im = axes[1, 0].imshow(correlation_matrix, cmap='RdYlBu', vmin=-1, vmax=1)
        axes[1, 0].set_xticks(range(len(available_stocks)))
        axes[1, 0].set_yticks(range(len(available_stocks)))
        axes[1, 0].set_xticklabels(available_stocks, rotation=45)
        axes[1, 0].set_yticklabels(available_stocks)
        axes[1, 0].set_title('Correlation Matrix')
        for i in range(len(available_stocks)):
            for j in range(len(available_stocks)):
                axes[1, 0].text(j, i, f'{correlation_matrix.iloc[i, j]:.2f}', 
                              ha="center", va="center", color="black", fontweight='bold')
        
        # Plot 4: Rolling correlation
        if len(available_stocks) >= 2:
            rolling_corr = aligned_data.iloc[:, 0].rolling(30).corr(aligned_data.iloc[:, 1])
            axes[1, 1].plot(rolling_corr.index, rolling_corr, linewidth=2, color='purple')
            axes[1, 1].set_title(f'30-Day Rolling Correlation\n{available_stocks[0]} vs {available_stocks[1]}')
            axes[1, 1].axhline(y=0.8, color='green', linestyle='--', alpha=0.7, label='High Correlation')
            axes[1, 1].axhline(y=0.5, color='orange', linestyle='--', alpha=0.7, label='Medium Correlation')
            axes[1, 1].legend()
            axes[1, 1].grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
        
        # Additional insights
        if cointegration_rank > 0:
            print(f"\nPORTFOLIO INSIGHTS:")
            print(f"{'='*40}")
            print(f"Cointegration rank: {cointegration_rank}")
            print(f"Trading opportunities:")
            print(f"   • Monitor for temporary divergences from long-run relationship")
            print(f"   • Consider mean-reversion strategies")
            print(f"   • Use portfolio approach rather than individual stock picks")
    else:
        print("Error: Not enough data points for reliable Johansen test")
else:
    print("Could not fetch sufficient data for multi-stock analysis")