<a href="https://colab.research.google.com/github/JianfengMI/MLprojects/blob/main/Engulfing_bar_strategy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# An engulfing bar strategy involves identifying a two-candle pattern where the second candle's body completely engulfs the first candle's body, signaling a potential trend reversal. To execute, identify the pattern at a key support/resistance level or within a prevailing trend, place a stop-loss below the low of a bullish engulfing pattern or above the high of a bearish engulfing pattern, and aim for a specific profit target, often using a risk-reward ratio of 2:1 or higher. Combining this with other confirmations, such as momentum indicators or volume, can increase reliability.
Bullish engulfing strategy

    Pattern: A smaller, red (or bearish) candle is followed by a larger, green (or bullish) candle that opens at or below the first candle's low and closes above its high.
    Context: Look for this pattern at the bottom of a downtrend or at a support level, signaling a potential move to the upside.
    Entry: Enter a buy trade at the opening of the candle following the pattern's formation.
    Stop-loss: Place the stop-loss below the low of the bullish engulfing candle.
    Take-profit: Set a profit target, aiming for at least twice the amount risked.

Bearish engulfing strategy

    Pattern: A smaller, green (or bullish) candle is followed by a larger, red (or bearish) candle that opens at or above the first candle's high and closes below its low.
    Context: Look for this pattern at the top of an uptrend or at a resistance level, signaling a potential move to the downside.
    Entry: Enter a sell trade at the opening of the candle following the pattern's formation.
    Stop-loss: Place the stop-loss above the high of the bearish engulfing candle.
    Take-profit: Set a profit target, aiming for at least twice the amount risked.

Enhancing the strategy

    Use higher timeframes: Look for patterns on higher timeframes like the daily or 4-hour chart to reduce market noise.
    Confirm with momentum indicators: Use indicators like the RSI or Stochastic Oscillator to confirm signals, such as oversold conditions for a bullish pattern or overbought conditions for a bearish pattern.
    Consider volume: Look for higher volume on the engulfing candle as it can indicate stronger conviction behind the move.
    Confirm with support/resistance: The pattern is more reliable when it forms at a significant support or resistance leve

Here I gonna follow the simple rules below:
1. Trade Entries
step 1. identify an engulfing / outside bar
step 2. if close = lowest close [n], buy next open
step 3. if close = highest close [n], sell short next open
note: n = 5 or 9

2. Trade Exits
step 1. after N bars in trade and in profit, exit next open
step 2. stop and reverse (optional)
note: N = 3

***Plan:***
1. download 1 or 10 stocks prices
2. identify an engulfing bar
3. check if its close is lowest close or highest close
4. set state as buying/selling and hold
5. check the exit signal
6. calculate returns

## 1. import libriries and download stocks prices

In [None]:
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
import yfinance as yf
from datetime import datetime, timedelta
import math
from tqdm.auto import tqdm
import requests
import matplotlib.pyplot as plt

In [None]:
# set parameters
START_DATE = (datetime.today() - timedelta(days=365*5)).strftime('%Y-%m-%d') # 5y of history
END_DATE = datetime.today().strftime('%Y-%m-%d')
TICKERS = ['AAPL', 'MS', 'TSLA', 'GOOGL', 'AMZN','MSFT','QQQ','SPY','NVDA', 'BA'] # 10 stickers to start

In [None]:
# download stocks prices
def download_stock_prices(ticker, start_date, end_date):
  """
  Download stocks prices from Yahoo Finance in batches to avoid overloading the server.
  Returns DataFrame with columns: Date, Open, High, Low, Close, Adj Close, Volume
  """
  df = pd.DataFrame()
  batch_size = 80
  for i in tqdm(range(0, len(ticker), batch_size), desc="Downloading stock prices"):
    batch = ticker[i:i+batch_size]
    try:
      df_batch = yf.download(batch, start=start_date, end=end_date, progress=False, group_by='ticker', auto_adjust=True)

      if df_batch.empty:
        print(f'No data downloaded, skipping batch {batch}')
        continue

      if df.empty:
        df = df_batch
      else:
        df = pd.concat([df, df_batch], axis=1)

    except Exception as e: # Added an except block for robust error handling
      print(f"An error occurred during download for batch {batch}: {e}")
      continue # Continue to the next batch
  return df # Moved outside the for loop and try block

In [None]:
df = download_stock_prices(TICKERS, START_DATE, END_DATE)

In [None]:
df.head()

In [None]:
data = df.copy()

In [None]:
# add columns buy, sell, and profit to the dataframe according to the engulfing bar strategy

for ticker in data.columns.levels[0]:
  close_series = data.xs('Close', level=1, axis=1)[ticker]
  # Initialize buy/sell columns as 0 for the current ticker.
  # Using .loc for initial creation/assignment is safer with MultiIndex columns.
  data[(ticker, 'buy')] = 0
  data[(ticker, 'sell')] = 0
  data[(ticker, 'profit')] = 0

  # Iterate using integer positions for iloc access
  for i in range(1, len(data)): # Start from 1 to allow comparison with i-1
    # Get data for current and previous day using iloc
    # For MultiIndex, .iloc with .get_loc is necessary to specify column by tuple
    current_high = data.iloc[i, data.columns.get_loc((ticker, 'High'))]
    prev_high = data.iloc[i-1, data.columns.get_loc((ticker, 'High'))]
    current_low = data.iloc[i, data.columns.get_loc((ticker, 'Low'))]
    prev_low = data.iloc[i-1, data.columns.get_loc((ticker, 'Low'))]

    # Engulfing bar condition
    if (current_high >= prev_high) and (current_low <= prev_low):
      current_close = data.iloc[i, data.columns.get_loc((ticker, 'Close'))]

      # Ensure enough history for the 5-day lookback for min/max close
      if i < 5:
        continue # Skip if not enough history

      buy_signal_value = 0
      # Compare to past 5 days Close price (using iloc for range)
      if current_close <= close_series.iloc[i-5:i].min():
        buy_signal_value = 1 # Buy signal
      elif current_close >= close_series.iloc[i-5:i].max():
        buy_signal_value = -1 # Sell signal

      if buy_signal_value != 0:
        # Check if i+1 is within DataFrame bounds before trying to access
        if i + 1 < len(data):
          # Place buy/sell order on next open (day i+1)
          data.iloc[i+1, data.columns.get_loc((ticker, 'buy'))] = buy_signal_value
          buy_price = data.iloc[i+1, data.columns.get_loc((ticker, 'Open'))]

          profit_for_this_trade = 0 # Default profit if no exit or unprofitable
          # Check for exit signal for 3 to 5 days holding (days i+3 to i+5)
          for j in range(3, 6):
            # Ensure target day i+j is within DataFrame bounds
            if i + j < len(data):
              holding_close = data.iloc[i+j, data.columns.get_loc((ticker, 'Close'))]
              calculated_profit = buy_signal_value * (holding_close - buy_price) / buy_price

              if calculated_profit > 0: # If in profit, exit
                data.iloc[i+j, data.columns.get_loc((ticker, 'sell'))] = 1
                data.iloc[i+j, data.columns.get_loc((ticker, 'profit'))] = calculated_profit
                break # Exit the inner loop (j) as a profitable exit was found
            else:
              # If future data is not available for this holding period, break
              break


In [None]:
# stat analysis on buys
buy_signals = data.xs('buy', level=1, axis=1)

total_transactions        = buy_signals.abs().sum()           # Total signals
total_buys                = (buy_signals > 0).sum()           # Total buys
total_sells               = (buy_signals < 0).sum()           # Total sells

In [None]:
# put them in a dataframe
transaction_stats = pd.DataFrame({
    'Total Transactions': total_transactions,
    'Total Buys': total_buys,
    'Total Sells': total_sells
})

transaction_stats

In [None]:
# plot transaction_stats
transaction_stats.plot(kind='bar', figsize=(10, 5))

In [None]:
# to see how many time the transactions are profitable
profit_stats = data.xs('profit', level=1, axis=1)

total_profitable_transactions = (profit_stats > 0).sum()
total_unprofitable_transactions = (profit_stats < 0).sum()

profit_stats_df = pd.DataFrame({
    'Total profitable Transactions': total_profitable_transactions,
    'Total unprofitable Transactions': total_unprofitable_transactions
})

profit_stats_df

In [None]:
# download S&P500 index price and compare them
spy = yf.download('SPY', start=START_DATE, end=END_DATE, progress=False, group_by='Ticker')

In [None]:
spy = spy[('SPY', 'Close')]

In [None]:
spy['returns'] = spy.pct_change()

In [None]:
spy['returns']

In [None]:
# put all profit series together as a dataframe
profit_df = data.xs('profit', level=1, axis=1)

In [None]:
profit_df

In [None]:
# concat the profit_df and spy['returns']

# Ensure the indices of both DataFrames/Series are unique before concatenation
# This prevents InvalidIndexError if there are duplicate dates in the index
profit_df_unique_index = profit_df[~profit_df.index.duplicated(keep='first')]
spy_returns_unique_index = spy['returns'][~spy['returns'].index.duplicated(keep='first')]

# Perform the concatenation with the unique-indexed DataFrames/Series
profit_df = pd.concat([profit_df_unique_index, spy_returns_unique_index], axis=1)

In [None]:
profit_df.head()

In [None]:
profit_df.dropna(inplace=True)

In [None]:
# transfer to monthly returns
monthly_profit = profit_df.resample('M').sum()

In [None]:
monthly_profit.columns

In [None]:
# change column ('SPY', 'Close') to SPY_HOLD
monthly_profit.columns = [
    'SPY_HOLD' if col == ('SPY', 'Close') else col
    for col in monthly_profit.columns
]

In [None]:
# calculate cumulative returns
monthly_profit = (1 + monthly_profit).cumprod() - 1

In [None]:
monthly_profit.head()

In [None]:
# plot each column vs column 'SPY_HOLD'

# Define cols to iterate through all ticker columns except 'SPY_HOLD'
cols = [col for col in monthly_profit.columns if col != 'SPY_HOLD']

# Create a figure and a 2x2 grid of axes. Adjust nrows/ncols based on the number of columns.
# To handle a variable number of plots, calculate rows and columns dynamically.
num_plots = len(cols)
num_rows = math.ceil(num_plots / 2)
fig, axes = plt.subplots(nrows=num_rows, ncols=2, figsize=(10, 4 * num_rows))

# Flatten the axes array for easier iteration if there's more than one row
axes = axes.flatten()

# Iterate through columns and plot on the corresponding axes
for i, col in enumerate(cols):
    ax = axes[i]

    ax.plot(monthly_profit[col], label=col)
    ax.plot(monthly_profit['SPY_HOLD'], label='SPY_HOLD')
    ax.set_title(f'Monthly Profit for {col} vs SPY_HOLD')
    ax.set_xlabel('Date')
    ax.set_ylabel('Cumulative Returns')
    ax.legend()

# Hide any unused subplots if num_plots is odd
for j in range(num_plots, len(axes)):
    fig.delaxes(axes[j])

plt.tight_layout()
plt.show()

## Conclusion:
Based on the results and plots, this strategy works well on individual stocks (not for AMZN, MS), but not good on indexies, e.g. buy_hold SPY and QQQ is much better.

In [None]:
final_returns = monthly_profit.iloc[-1]

In [None]:
# check relationship between transaction times and profit
returns_per_transaction_df = pd.DataFrame({
    'Total profitable Transactions': total_profitable_transactions,
    'Final Returns': final_returns
})

# Capture the axes object returned by plot()
ax = returns_per_transaction_df.plot(kind='scatter',
                                x='Total profitable Transactions',
                                y='Final Returns',
                                s=returns_per_transaction_df['Final Returns'] * 200,
                                alpha=0.5,
                                figsize=(10, 5))
# Annotate each data point
for i, row in returns_per_transaction_df.iterrows():
    ax.annotate(row.name, (row['Total profitable Transactions'], row['Final Returns']), textcoords="offset points", xytext=(0,5), ha='center')
plt.xlabel('Transaction Times')
plt.ylabel('Final Returns')
plt.title('Relationship between Transaction Times and Profit')
plt.show();