# 03 - Feature Engineering & Visualization

**Objective:**  
Prepare ML-ready features for our Forex trading strategy using multi-timeframe analysis.

**Steps:**  
1. Load **4H, 1D, and 1W processed OHLCV data**  
2. Generate features:  
   - Trend detection (EMA slope)  
   - Candlestick patterns (Hammer, Doji, Engulfing)  
   - Pivot-based support & resistance  
3. Merge features into a **4H ML-ready dataset**  
4. Label trade signals (1 = Long, -1 = Short, 0 = No Trade)  
5. Visualize signals with an **annotated candlestick chart including pivots**  
6. Save datasets for **Notebook 4: Model Training**

---

# Change working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [1]:
import os
current_dir = os.getcwd()
current_dir

'/workspaces/forex-mtf-strategy-predictor/jupyter_notebooks'

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [2]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

You set a new current directory


Confirm the new current directory

In [3]:
current_dir = os.getcwd()
current_dir

'/workspaces/forex-mtf-strategy-predictor'

## Import Libraries and Set Up Project Paths

Here we:  
- Import Pandas, NumPy, and Plotly for data handling and visualization.  
- Import feature functions from `src/`.  
- Define paths for `processed` data and `features` output.  
- Define the currency pairs we are processing.


In [5]:
from pathlib import Path
import pandas as pd
import numpy as np
import plotly.graph_objects as go

# Import feature functions
from src.trend_analysis import calculate_trend
from src.candlestick_patterns import detect_patterns
from src.support_resistance import calculate_support_resistance, merge_multi_timeframe_features

# Paths
processed_path = Path("data/processed")
features_path = Path("data/features")
features_path.mkdir(parents=True, exist_ok=True)

# Currency pairs
pairs = ["EUR_USD", "USD_JPY", "GBP_USD", "USD_CHF", "AUD_USD", "USD_CAD", "NZD_USD"]

---

## Load and Feature-Engineer All Timeframes (Single Pair Demo)

I start with **EUR/USD** to demonstrate feature engineering before batch processing all pairs:

1. Load **4H, 1D, 1W processed OHLCV data**.  
2. Generate features for each timeframe:
   - Trend signal (EMA slope)  
   - Candlestick patterns (Hammer, Doji, Engulfing)  
   - Support/Resistance using pivot points


In [6]:
pair = "EUR_USD"

# Load processed OHLC data
df_4h = pd.read_csv(processed_path / f"{pair}_4H.csv", parse_dates=["timestamp"])
df_1d = pd.read_csv(processed_path / f"{pair}_1D.csv", parse_dates=["timestamp"])
df_1w = pd.read_csv(processed_path / f"{pair}_1W.csv", parse_dates=["timestamp"])

# Trend
df_4h = calculate_trend(df_4h)
df_1d = calculate_trend(df_1d)
df_1w = calculate_trend(df_1w)

# Candlestick Patterns
df_4h = detect_patterns(df_4h)
df_1d = detect_patterns(df_1d)
df_1w = detect_patterns(df_1w)

# Pivot-based Support/Resistance
df_4h = calculate_support_resistance(df_4h, freq='W', tolerance=0.001)
df_1d = calculate_support_resistance(df_1d, freq='M', tolerance=0.001)
df_1w = calculate_support_resistance(df_1w, freq='M', tolerance=0.002)


  ohlc = df.resample(freq).agg({"high": "max", "low": "min", "close": "last"})
  ohlc = df.resample(freq).agg({"high": "max", "low": "min", "close": "last"})


---

## Merge Multi-Timeframe Features into 4H Dataset

We merge the features from:
- 4H (entry timeframe)
- 1D (swing confirmation)
- 1W (macro trend / major pivots)

**Logic:**
1. Resample 1D and 1W features to 4H frequency with forward-fill.  
2. Merge them into the 4H dataframe for ML.


In [7]:
df_multi = merge_multi_timeframe_features(df_4h, df_1d, df_1w)
print(f"Clean dataset: {len(df_multi)} rows, {df_multi.isna().sum().sum()} NaNs left")
df_multi.head()

Clean dataset: 11254 rows, 0 NaNs left


  df_1d_res = df_1d.set_index('timestamp').resample('4H').ffill()
  df_1w_res = df_1w.set_index('timestamp').resample('4H').ffill()


Unnamed: 0,timestamp,open,high,low,close,volume,ema,ema_diff,trend_signal,pattern_signal,...,lower_shadow_1W,P_1W,S1_1W,S2_1W,S3_1W,R1_1W,R2_1W,R3_1W,at_support_1W,at_resistance_1W
72,2018-08-05 20:00:00+00:00,1.15688,1.1572,1.15602,1.15614,1892,1.161312,-0.000544,-1,0,...,0.00018,1.166177,1.157213,1.148517,1.139553,1.174873,1.183837,1.192533,0.0,0.0
73,2018-08-06 00:00:00+00:00,1.15614,1.15685,1.15571,1.15638,3384,1.160843,-0.00047,-1,0,...,0.00018,1.166177,1.157213,1.148517,1.139553,1.174873,1.183837,1.192533,0.0,0.0
74,2018-08-06 04:00:00+00:00,1.15638,1.15694,1.15496,1.15646,6204,1.160425,-0.000417,-1,-1,...,0.00018,1.166177,1.157213,1.148517,1.139553,1.174873,1.183837,1.192533,0.0,0.0
75,2018-08-06 08:00:00+00:00,1.15641,1.15673,1.15298,1.1542,7350,1.159832,-0.000593,-1,0,...,0.00018,1.166177,1.157213,1.148517,1.139553,1.174873,1.183837,1.192533,0.0,0.0
76,2018-08-06 12:00:00+00:00,1.15419,1.15707,1.15364,1.15636,8453,1.159502,-0.000331,-1,0,...,0.00018,1.166177,1.157213,1.148517,1.139553,1.174873,1.183837,1.192533,0.0,0.0


---

## Label Trade Signals

- **Long (1)**: 4H trend up + 1D trend up + bullish 4H pattern + near support
- **Short (-1)**: 4H trend down + 1D trend down + bearish 4H pattern + near resistance
- **No Trade (0)**: All other conditions


In [8]:
def label_trade_signals(df):
    conditions = [
        (df['trend_signal'] == 1) & (df['trend_signal_1D'] == 1) & (df['pattern_signal'] == 1) & (df['at_support'] == 1),
        (df['trend_signal'] == -1) & (df['trend_signal_1D'] == -1) & (df['pattern_signal'] == -1) & (df['at_resistance'] == 1)
    ]
    choices = [1, -1]
    df['signal'] = np.select(conditions, choices, default=0)
    return df

df_multi = label_trade_signals(df_multi)
df_multi[['timestamp','close','trend_signal','trend_signal_1D','pattern_signal','signal']].head(20)

Unnamed: 0,timestamp,close,trend_signal,trend_signal_1D,pattern_signal,signal
72,2018-08-05 20:00:00+00:00,1.15614,-1,-1.0,0,0
73,2018-08-06 00:00:00+00:00,1.15638,-1,-1.0,0,0
74,2018-08-06 04:00:00+00:00,1.15646,-1,-1.0,-1,0
75,2018-08-06 08:00:00+00:00,1.1542,-1,-1.0,0,0
76,2018-08-06 12:00:00+00:00,1.15636,-1,-1.0,0,0
77,2018-08-06 16:00:00+00:00,1.15542,-1,-1.0,0,0
78,2018-08-06 20:00:00+00:00,1.15582,-1,-1.0,0,0
79,2018-08-07 00:00:00+00:00,1.15556,-1,-1.0,-1,0
80,2018-08-07 04:00:00+00:00,1.15773,-1,-1.0,1,0
81,2018-08-07 08:00:00+00:00,1.16019,1,-1.0,0,0


---

## Visualize 4H Candlestick Chart with Signals & Weekly Pivots
- Green = Long signals
- Red = Short signals
- Blue = Pivot (P)
- Green dotted = Support (S1)
- Red dotted = Resistance (R1)


In [9]:
fig = go.Figure(data=[go.Candlestick(
    x=df_multi['timestamp'],
    open=df_multi['open'],
    high=df_multi['high'],
    low=df_multi['low'],
    close=df_multi['close'],
    name='4H Candlestick'
)])

# Long signals
longs = df_multi[df_multi['signal'] == 1]
fig.add_trace(go.Scatter(
    x=longs['timestamp'], y=longs['close'],
    mode='markers', marker=dict(color='green', size=8),
    name='Long Signal'
))

# Short signals
shorts = df_multi[df_multi['signal'] == -1]
fig.add_trace(go.Scatter(
    x=shorts['timestamp'], y=shorts['close'],
    mode='markers', marker=dict(color='red', size=8),
    name='Short Signal'
))

# Weekly Pivots
fig.add_trace(go.Scatter(
    x=df_multi['timestamp'], y=df_multi['P'],
    mode='lines', line=dict(color='blue', width=1, dash='dot'),
    name='Pivot (P)'
))
fig.add_trace(go.Scatter(
    x=df_multi['timestamp'], y=df_multi['S1'],
    mode='lines', line=dict(color='green', width=1, dash='dot'),
    name='Support (S1)'
))
fig.add_trace(go.Scatter(
    x=df_multi['timestamp'], y=df_multi['R1'],
    mode='lines', line=dict(color='red', width=1, dash='dot'),
    name='Resistance (R1)'
))

fig.update_layout(
    title=f"{pair} 4H Multi-Timeframe Signals with Weekly Pivots",
    xaxis_rangeslider_visible=False,
    yaxis_title='Price'
)
fig.show()

---

## Save Feature-Engineered Dataset

We now save the multi-timeframe feature dataset to `data/features/`  
for use in Notebook 4 (Model Training).

In [10]:
df_multi.to_csv(features_path / f"{pair}_multi_tf_features.csv", index=False)
print(f"Saved multi-timeframe features for {pair}: {len(df_multi)} rows")

Saved multi-timeframe features for EUR_USD: 11254 rows


---

## Batch Process All Currency Pairs
Loop through all pairs:
1. Load 4H, 1D, 1W
2. Generate features
3. Merge & clean
4. Label signals
5. Save ML-ready feature dataset


In [11]:
for pair in pairs:
    # Load
    df_4h = pd.read_csv(processed_path / f"{pair}_4H.csv", parse_dates=["timestamp"])
    df_1d = pd.read_csv(processed_path / f"{pair}_1D.csv", parse_dates=["timestamp"])
    df_1w = pd.read_csv(processed_path / f"{pair}_1W.csv", parse_dates=["timestamp"])

    # Feature Engineering
    df_4h = calculate_trend(df_4h)
    df_1d = calculate_trend(df_1d)
    df_1w = calculate_trend(df_1w)

    df_4h = detect_patterns(df_4h)
    df_1d = detect_patterns(df_1d)
    df_1w = detect_patterns(df_1w)

    df_4h = calculate_support_resistance(df_4h, freq='W', tolerance=0.001)
    df_1d = calculate_support_resistance(df_1d, freq='M', tolerance=0.001)
    df_1w = calculate_support_resistance(df_1w, freq='M', tolerance=0.002)

    # Merge & Label
    df_multi = merge_multi_timeframe_features(df_4h, df_1d, df_1w)
    df_multi = label_trade_signals(df_multi)

    # Save
    df_multi.to_csv(features_path / f"{pair}_multi_tf_features.csv", index=False)
    print(f"Saved multi-timeframe features for {pair}: {len(df_multi)} rows")


'M' is deprecated and will be removed in a future version, please use 'ME' instead.


'M' is deprecated and will be removed in a future version, please use 'ME' instead.


'H' is deprecated and will be removed in a future version, please use 'h' instead.


'H' is deprecated and will be removed in a future version, please use 'h' instead.



Saved multi-timeframe features for EUR_USD: 11254 rows



'M' is deprecated and will be removed in a future version, please use 'ME' instead.


'M' is deprecated and will be removed in a future version, please use 'ME' instead.


'H' is deprecated and will be removed in a future version, please use 'h' instead.


'H' is deprecated and will be removed in a future version, please use 'h' instead.



Saved multi-timeframe features for USD_JPY: 11254 rows



'M' is deprecated and will be removed in a future version, please use 'ME' instead.


'M' is deprecated and will be removed in a future version, please use 'ME' instead.


'H' is deprecated and will be removed in a future version, please use 'h' instead.


'H' is deprecated and will be removed in a future version, please use 'h' instead.



Saved multi-timeframe features for GBP_USD: 11254 rows



'M' is deprecated and will be removed in a future version, please use 'ME' instead.


'M' is deprecated and will be removed in a future version, please use 'ME' instead.


'H' is deprecated and will be removed in a future version, please use 'h' instead.


'H' is deprecated and will be removed in a future version, please use 'h' instead.



Saved multi-timeframe features for USD_CHF: 11254 rows



'M' is deprecated and will be removed in a future version, please use 'ME' instead.


'M' is deprecated and will be removed in a future version, please use 'ME' instead.


'H' is deprecated and will be removed in a future version, please use 'h' instead.


'H' is deprecated and will be removed in a future version, please use 'h' instead.



Saved multi-timeframe features for AUD_USD: 11254 rows



'M' is deprecated and will be removed in a future version, please use 'ME' instead.


'M' is deprecated and will be removed in a future version, please use 'ME' instead.


'H' is deprecated and will be removed in a future version, please use 'h' instead.


'H' is deprecated and will be removed in a future version, please use 'h' instead.



Saved multi-timeframe features for USD_CAD: 11254 rows



'M' is deprecated and will be removed in a future version, please use 'ME' instead.


'M' is deprecated and will be removed in a future version, please use 'ME' instead.


'H' is deprecated and will be removed in a future version, please use 'h' instead.


'H' is deprecated and will be removed in a future version, please use 'h' instead.



Saved multi-timeframe features for NZD_USD: 11258 rows


# Push files to Repo

### 1. Check current git status

In [12]:
!git status

On branch main
Your branch is up to date with 'origin/main'.

Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   jupyter_notebooks/03_feature_engineering.ipynb[m
	[31mdeleted:    src/signal_generator.py[m
	[31mmodified:   src/support_resistance.py[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31mdata/features/[m

no changes added to commit (use "git add" and/or "git commit -a")


### 2. Stage all new/updated files

In [13]:
!git add .

### 3. Commit with a descriptive message

In [14]:
!git commit -m "Features engineered by added support resistance via pivots, identify candle patterns and analyse trend"

[main 803fbdf] Features engineered by added support resistance via pivots, identify candle patterns and analyse trend
 10 files changed, 125824 insertions(+), 59 deletions(-)
 create mode 100644 data/features/AUD_USD_multi_tf_features.csv
 create mode 100644 data/features/EUR_USD_multi_tf_features.csv
 create mode 100644 data/features/GBP_USD_multi_tf_features.csv
 create mode 100644 data/features/NZD_USD_multi_tf_features.csv
 create mode 100644 data/features/USD_CAD_multi_tf_features.csv
 create mode 100644 data/features/USD_CHF_multi_tf_features.csv
 create mode 100644 data/features/USD_JPY_multi_tf_features.csv
 delete mode 100644 src/signal_generator.py
