## Creating Initial Balance (IB) Lines

The **Initial Balance (IB)** is a very simple but useful concept for intraday trading.

For **each trading day** we do the following:

1. Look only at the **first 60 minutes** of the regular U.S. stock market session.
2. This time window is **09:30–10:30 U.S. Eastern Time (ET)**.
3. Inside this window, we find:
   - the **highest price** → this is the **IB high**
   - the **lowest price** → this is the **IB low**

So, for **every single day**, the Initial Balance is just:
> **A price range defined by the first hour’s high and low.**


### Why is the First Hour Important?

Many intraday traders focus on the **first hour** because:

- A big part of **institutional trading** (funds, banks, large players) happens early.
- This often creates:
  - **High volume**
  - **Strong price moves**
- The **IB high** and **IB low** can act as:
  - **Support** (a floor where price may bounce)
  - **Resistance** (a ceiling where price may reverse)

In simple terms:

- The first hour shows the **initial battle** between buyers and sellers.
- The Initial Balance gives a **compact summary** of this battle.
- We later use these IB levels to detect:
  - **Strong bursts**
  - **Possible intraday direction**


### What Exactly Do We Compute?

For **each trading day**, between **09:30 and 10:30 ET**, we compute:

- **IB high** → the **maximum price** in that first hour  
- **IB low** → the **minimum price** in that first hour  

So, the result for a single day is:

- **One pair:**  
  - **(IB high, IB low)**


### How Many Initial Balance Pairs Do We Have?

- Our dataset has **55 trading days**.
- For each trading day, we calculate **one IB high** and **one IB low**.
- Therefore, we have:

> **55 Initial Balance pairs**  
> (one pair of **IB high** and **IB low** for each of the 55 days).


In [108]:
from pathlib import Path
import pandas as pd

In [109]:
# Now I can use our cleaned data which we made in "00_data_setup.ipynb"

PROJECT_ROOT = Path("..").resolve()

DATA_RAW = PROJECT_ROOT / "data" / "raw"
DATA_CLEAN = PROJECT_ROOT / "data" / "clean"

CLEAN_FILE = DATA_CLEAN / "spy_1min_et_clean.csv"

df_clean = pd.read_csv(CLEAN_FILE, parse_dates=['datetime'])

df_clean.head()

Unnamed: 0,datetime,high,low,close,Volume
0,2025-09-08 09:30:00,648.86,648.24,648.26,141588
1,2025-09-08 09:31:00,648.45,648.15,648.27,42118
2,2025-09-08 09:32:00,648.46,648.1,648.26,37143
3,2025-09-08 09:33:00,648.47,648.23,648.4,42231
4,2025-09-08 09:34:00,648.68,648.32,648.665,23659


In [110]:
# We must define initial hour candles from all data

# We are creating filter variable for all cleaned data to just looking at the time frames between 09:30 - 10:30
is_ib = (df_clean["datetime"].dt.hour == 9) | (df_clean["datetime"].dt.hour == 10) & (df_clean["datetime"].dt.minute <= 30)

# After creating variable, we define this filter in our cleaned data
df_ib = df_clean[is_ib]

df_ib.head(184)



Unnamed: 0,datetime,high,low,close,Volume
0,2025-09-08 09:30:00,648.86,648.24,648.260,141588
1,2025-09-08 09:31:00,648.45,648.15,648.270,42118
2,2025-09-08 09:32:00,648.46,648.10,648.260,37143
3,2025-09-08 09:33:00,648.47,648.23,648.400,42231
4,2025-09-08 09:34:00,648.68,648.32,648.665,23659
...,...,...,...,...,...
837,2025-09-10 10:27:00,654.29,654.13,654.140,26221
838,2025-09-10 10:28:00,654.25,653.92,653.960,27867
839,2025-09-10 10:29:00,654.02,653.91,653.940,22403
840,2025-09-10 10:30:00,654.03,653.82,653.930,31068


## 1) Summarizing the Initial Balance

First, we took our full intraday dataset and **filtered out only the 1-minute candles that belong to the Initial Balance (IB) window**.

- The IB window is the **first 60 minutes** of the regular session  
  → **09:30–10:30 U.S. Eastern Time**

So now, this special dataframe contains:

- Only **1-minute bars**
- Only for **09:30–10:30**
- For **each trading day**


### Why do we summarize the Initial Balance?

The next step is to turn each day’s Initial Balance into **four simple numbers**:

- **ib_high**
- **ib_low**
- **ib_mid**
- **ib_width**

Instead of looking at **60 separate one-minute bars** for each day, we compress this noisy data into a **clean summary**.  
This summary acts like a **simple intraday “map”** of the first hour.

Later, when we add **AVWAP lines** and **directional signals**, we will interpret them **relative to these IB levels**.  
This helps us:

- Classify **day types** (trend day, range day, etc.)
- Spot **continuation setups** (trend continues)
- Spot **reversal setups** (trend may flip)


### The four IB features

#### **ib_high**

- Definition:  
  **The highest price reached during the IB window (09:30–10:30).**

- Intuition:  
  This level is the **upper edge** of the early auction between buyers and sellers.

- Why it matters:  
  Later in the day, price often **reacts** to this level:
  - **Rejection** → price hits ib_high and fails → possible resistance  
  - **Breakout** → price breaks above ib_high and holds → possible trend continuation


#### **ib_low**

- Definition:  
  **The lowest price reached during the IB window.**

- Intuition:  
  This level is the **lower edge** of the early auction.

- Why it matters:  
  It often acts as a **natural support level**:
  - If price bounces from ib_low → buyers defend the level  
  - If price breaks below ib_low → possible downside expansion or trend day down


#### **ib_mid**

- Definition:  
  **The midpoint between ib_high and ib_low:**

  $\text{ib\_mid} = \frac{\text{ib\_high} + \text{ib\_low}}{2}$

- Intuition:  
  This is like a **“fair price”** for the early session.

- How we use it:

  - If **current price > ib_mid** → the market has a **bullish tilt**
  - If **current price < ib_mid** → the market has a **bearish tilt**

So ib_mid helps us quickly see **which side controls the early battle** (buyers or sellers).


#### **ib_width**

- Definition:  
  **The distance between ib_high and ib_low:**

  $\text{ib\_width} = \text{ib\_high} - \text{ib\_low}$

- Intuition:  
  This tells us **how wide or narrow** the first-hour range is → a simple measure of **early volatility**.

- Interpretation:

  - **Narrow ib_width**:
    - The first hour was **quiet and tight**
    - The market may have **stored potential energy**
    - There is often room for a **range expansion** later in the day

  - **Wide ib_width**:
    - The first hour was already **very active**
    - A big move may have already happened
    - There might be **less room** for further large moves, or later action may be more **two-sided** (back and forth)


In [111]:
#The logic is very simple when finding ib_high and ib_low which are 'high'est highs and 'low'est lows in our filtered dataframe (df_ib)
ib_stats = df_ib.groupby(df_ib["datetime"].dt.date).agg(
    ib_high=("high", "max"),
    ib_low=("low", "min")
).reset_index()

#ib_mid is the middle line between ib_high and ib_low
ib_stats["ib_mid"] = 0.5 * (ib_stats["ib_high"] + ib_stats["ib_low"])

#ib_width represents the distance
ib_stats["ib_width"] = ib_stats["ib_high"] - ib_stats["ib_low"]

ib_stats.head()


Unnamed: 0,datetime,ib_high,ib_low,ib_mid,ib_width
0,2025-09-08,649.06,647.75,648.405,1.31
1,2025-09-09,649.72,648.43,649.075,1.29
2,2025-09-10,654.55,652.7,653.625,1.85
3,2025-09-11,656.25,653.59,654.92,2.66
4,2025-09-12,658.19,657.14,657.665,1.05


In [112]:
# We need to define whether ib_width is large or small relative to our data
import numpy as np

median_width = ib_stats["ib_width"].median()

ib_stats["ib_width_type"] = np.where(
    ib_stats["ib_width"] <= median_width, "narrow", "wide"
)

ib_stats.head()

# IMPORTANT REMINDER: THIS CLASSIFICATION IS JUST FOR OUR DATA. IN MORE LARGER (MORE PAST) DATA SAMPLES OUR "WIDE" IB_WIDTHS CAN BE 
# CONSIDERED AS NARROW, BECAUSE WE ARE JUST CONSIDERING 55 DISTINCT TRADING DAYS.  

Unnamed: 0,datetime,ib_high,ib_low,ib_mid,ib_width,ib_width_type
0,2025-09-08,649.06,647.75,648.405,1.31,narrow
1,2025-09-09,649.72,648.43,649.075,1.29,narrow
2,2025-09-10,654.55,652.7,653.625,1.85,narrow
3,2025-09-11,656.25,653.59,654.92,2.66,wide
4,2025-09-12,658.19,657.14,657.665,1.05,narrow


## 2) Adding end-of-day close and gap information

So far, our **IB summary (ib_stats)** tells us only about the **start of the day**  
→ what happened in the **first hour (09:30–10:30 ET)**.

But a trading day is a **full story**:

- It **starts** with the Initial Balance (IB)
- It **ends** with the **daily close price**

To really understand the day, we need **both**.



### Why do we add the daily close?

We add the **end-of-day close price** to the IB table because:

- The IB shows us **how the day begins**
- The **close** shows us **how the day finishes**

For our hypotheses, we care about:

- Did the market **continue** in the same direction as the IB?
- Did it **reverse** later in the day?
- Did it just **mean-revert** back toward the middle?

To answer these questions, we must know:

- **Where the day closes** (final price of the session)
- **How big the “gap” was** from the **previous day’s close** to **today’s open/IB**



### What is gap in this context?

- **Gap** = the difference between **today’s opening zone** and **yesterday’s close**  
- It tells us if the day started:
  - **Above** yesterday’s close → **gap up**
  - **Below** yesterday’s close → **gap down**
  - Very close → **almost no gap**

Big gaps can signal:

- Strong **overnight sentiment** (news, macro, earnings, etc.)
- Potential for:
  - **Trend days** (if the gap holds and extends)
  - **Fade/mean-reversion** trades (if the gap gets filled)



### What does merging `day_close` into `ib_stats` give us?

We take:

- **`ib_stats`** → contains:
  - ib_high, ib_low, ib_mid, ib_width (IB structure)
- **`day_close`** → contains:
  - The **final close price** for each day  
  - (and we can also derive the **gap** using previous day’s close)

Then we **merge `day_close` into `ib_stats`** so that **each row (each day)** now includes:

- **IB information at the start:**
  - **ib_high**
  - **ib_low**
  - **ib_mid**
  - **ib_width**
- **End-of-day information:**
  - **day_close**
  - **gap** (from previous close to today’s session)



### Why is this powerful for our tests?

Once each day is a **complete unit**:

> **“How the day starts (IB) + how the day ends (close + gap)”**

we can test questions like:

- Do **certain IB shapes** (wide/narrow, skewed up/down) lead to:
  - More **trend days**?
  - More **reversal days**?
- Do **large gaps** combined with specific IB behavior:
  - Continue in the same direction?
  - Get **faded** (mean-revert back toward the prior close)?
- Does closing **above ib_high** or **below ib_low** correlate with:
  - Strong directional conviction?


In [113]:
#Adding day_close to our ib_stats dataframe

day_close = df_clean.groupby(df_ib["datetime"].dt.date)["close"].last().reset_index().rename(
    columns={"close": "day_close"}
)
ib_stats = ib_stats.merge(day_close, on="datetime")

ib_stats.head(20)


Unnamed: 0,datetime,ib_high,ib_low,ib_mid,ib_width,ib_width_type,day_close
0,2025-09-08,649.06,647.75,648.405,1.31,narrow,649.0
1,2025-09-09,649.72,648.43,649.075,1.29,narrow,648.64
2,2025-09-10,654.55,652.7,653.625,1.85,narrow,653.93
3,2025-09-11,656.25,653.59,654.92,2.66,wide,656.02
4,2025-09-12,658.19,657.14,657.665,1.05,narrow,657.855
5,2025-09-15,660.75,659.475,660.1125,1.275,narrow,660.74
6,2025-09-16,661.78,659.63,660.705,2.15,narrow,659.92
7,2025-09-17,660.65,659.07,659.86,1.58,narrow,659.17
8,2025-09-18,664.67,660.27,662.47,4.4,wide,664.67
9,2025-09-19,662.7,661.31,662.005,1.39,narrow,661.9


In [114]:
#Adding previous day logic, so we can directly see previous day's close and understand the gap between today's close
# At the first row there is no previous day, so it returns NAN

# We get previous day from datetime
ib_stats["prev_day"] = ib_stats["datetime"].shift(1)

# We get previous day's close from day_close column
ib_stats["prev_close"] = ib_stats["day_close"].shift(1)

# Creating new column named gap shows us: day_close - prev_close value
ib_stats["gap"] = ib_stats["day_close"] - ib_stats["prev_close"]

#Make NA's to 0(zero)
ib_stats["gap_dir"] = np.sign(ib_stats["gap"]).fillna(0)
#if gap is negative --> gap_dir -1.0
#if gap is 0(zero) or NaN --> gap_dir 0.0 (zero)
#if gap is positive --> gap_dir +1.0

#gap_dir is directly shows us the direction of price, bullish/bearish/range

ib_stats.head(20)


Unnamed: 0,datetime,ib_high,ib_low,ib_mid,ib_width,ib_width_type,day_close,prev_day,prev_close,gap,gap_dir
0,2025-09-08,649.06,647.75,648.405,1.31,narrow,649.0,,,,0.0
1,2025-09-09,649.72,648.43,649.075,1.29,narrow,648.64,2025-09-08,649.0,-0.36,-1.0
2,2025-09-10,654.55,652.7,653.625,1.85,narrow,653.93,2025-09-09,648.64,5.29,1.0
3,2025-09-11,656.25,653.59,654.92,2.66,wide,656.02,2025-09-10,653.93,2.09,1.0
4,2025-09-12,658.19,657.14,657.665,1.05,narrow,657.855,2025-09-11,656.02,1.835,1.0
5,2025-09-15,660.75,659.475,660.1125,1.275,narrow,660.74,2025-09-12,657.855,2.885,1.0
6,2025-09-16,661.78,659.63,660.705,2.15,narrow,659.92,2025-09-15,660.74,-0.82,-1.0
7,2025-09-17,660.65,659.07,659.86,1.58,narrow,659.17,2025-09-16,659.92,-0.75,-1.0
8,2025-09-18,664.67,660.27,662.47,4.4,wide,664.67,2025-09-17,659.17,5.5,1.0
9,2025-09-19,662.7,661.31,662.005,1.39,narrow,661.9,2025-09-18,664.67,-2.77,-1.0


## 3) Merging `ib_stats` into our main `spy_1min_et_clean` dataframe

Now we have two important pieces:

- **`spy_1min_et_clean`** → our main dataframe  
  - Contains **every 1-minute candle** for all trading days  
- **`ib_stats`** → our Initial Balance summary  
  - One row **per day** with:
    - **ib_high**
    - **ib_low**
    - **ib_mid**
    - **ib_width**
    - **ib_width_type**
    - **day_close**
    - **prev_day**
    - **prev_close**
    - **gap**
    - **gap_dir**

The goal of this step is simple:

> **Attach the IB information of each day to every 1-minute candle of that same day.**



### Why do we merge these two dataframes?

For future analysis, we want to be able to do things like:

- “At this minute, was the price above or below **ib_mid**?”
- “Did this breakout happen above **ib_high** or below **ib_low**?”
- “How did the price behave on days with **narrow vs wide ib_width**?”

To answer these kinds of questions easily, it’s much better if:

- Every row (each 1-minute bar) in `spy_1min_et_clean`  
  **already has the IB statistics of its own day**.

So instead of jumping between two dataframes, we just work with **one enriched dataframe**.



### What exactly happens in the merge?

We:

1. Use a **common key** (for example, the **date** of the trading day).
2. Merge **`ib_stats`** into **`spy_1min_et_clean`** on that key.

After the merge:

- We still have the same number of rows as `spy_1min_et_clean`  
  → one row **per 1-minute candle**
- But now each row also includes that day’s IB values



### Final result

We end up with:

> The same `spy_1min_et_clean` dataframe,  
> **plus** extra columns from `ib_stats` for each row.

That means:

- **Every 1-minute candle** now “knows” its day’s:
  - Initial Balance high and low  
  - Midpoint and width  
  - And other IB-related features  


In [115]:
ib_stats["datetime"] = pd.to_datetime(ib_stats["datetime"])


df_clean = (
    df_clean
    .merge(
        ib_stats,
        how="left",
        left_on=df_clean["datetime"].dt.normalize(),
        right_on=ib_stats["datetime"],
        suffixes=("", "_ib")
    )
    .drop(columns=["key_0", "datetime_ib"])
)

df_clean.head(20)


Unnamed: 0,datetime,high,low,close,Volume,ib_high,ib_low,ib_mid,ib_width,ib_width_type,day_close,prev_day,prev_close,gap,gap_dir
0,2025-09-08 09:30:00,648.86,648.24,648.26,141588,649.06,647.75,648.405,1.31,narrow,649.0,,,,0.0
1,2025-09-08 09:31:00,648.45,648.15,648.27,42118,649.06,647.75,648.405,1.31,narrow,649.0,,,,0.0
2,2025-09-08 09:32:00,648.46,648.1,648.26,37143,649.06,647.75,648.405,1.31,narrow,649.0,,,,0.0
3,2025-09-08 09:33:00,648.47,648.23,648.4,42231,649.06,647.75,648.405,1.31,narrow,649.0,,,,0.0
4,2025-09-08 09:34:00,648.68,648.32,648.665,23659,649.06,647.75,648.405,1.31,narrow,649.0,,,,0.0
5,2025-09-08 09:35:00,648.88,648.62,648.78,38252,649.06,647.75,648.405,1.31,narrow,649.0,,,,0.0
6,2025-09-08 09:36:00,648.92,648.78,648.79,36436,649.06,647.75,648.405,1.31,narrow,649.0,,,,0.0
7,2025-09-08 09:37:00,649.06,648.8,648.87,35151,649.06,647.75,648.405,1.31,narrow,649.0,,,,0.0
8,2025-09-08 09:38:00,648.91,648.23,648.23,52975,649.06,647.75,648.405,1.31,narrow,649.0,,,,0.0
9,2025-09-08 09:39:00,648.35,648.06,648.11,58512,649.06,647.75,648.405,1.31,narrow,649.0,,,,0.0


In [116]:
#checking 'datetime' is okey for time-series data

df_clean["datetime"].is_monotonic_increasing

True

In [117]:
# checking if I duplicated any timeframe

df_clean["datetime"].duplicated().any()

np.False_

In [118]:
df_clean.info()
df_clean.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21450 entries, 0 to 21449
Data columns (total 15 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   datetime       21450 non-null  datetime64[ns]
 1   high           21450 non-null  float64       
 2   low            21450 non-null  float64       
 3   close          21450 non-null  float64       
 4   Volume         21450 non-null  int64         
 5   ib_high        21450 non-null  float64       
 6   ib_low         21450 non-null  float64       
 7   ib_mid         21450 non-null  float64       
 8   ib_width       21450 non-null  float64       
 9   ib_width_type  21450 non-null  object        
 10  day_close      21450 non-null  float64       
 11  prev_day       21060 non-null  object        
 12  prev_close     21060 non-null  float64       
 13  gap            21060 non-null  float64       
 14  gap_dir        21450 non-null  float64       
dtypes: datetime64[ns](1

Unnamed: 0,datetime,high,low,close,Volume,ib_high,ib_low,ib_mid,ib_width,day_close,prev_close,gap,gap_dir
count,21450,21450.0,21450.0,21450.0,21450.0,21450.0,21450.0,21450.0,21450.0,21450.0,21060.0,21060.0,21450.0
mean,2025-10-15 12:44:30,668.153902,667.867178,668.010579,31363.66,669.798091,666.728364,668.263227,3.069727,668.486727,668.765556,0.082037,0.018182
min,2025-09-08 09:30:00,647.51,647.22,647.31,1314.0,649.06,647.75,648.405,0.98,648.64,648.64,-21.47,-1.0
25%,2025-09-25 14:22:15,661.33,660.91225,661.1225,14615.5,663.23,659.63,660.975,1.85,660.74,661.9,-2.77,-1.0
50%,2025-10-15 12:44:30,666.97,666.66,666.82,22888.0,670.23,666.36,668.14,2.645,669.29,669.29,0.255,0.0
75%,2025-11-04 11:06:45,673.12,672.87,673.0,36983.0,677.38,672.52,674.95,4.13,674.9,674.9,2.885,1.0
max,2025-11-21 15:59:00,689.7,689.52,689.59,1362579.0,689.7,688.15,688.925,7.45,688.97,688.97,12.7,1.0
std,,9.467176,9.486371,9.477293,35760.77,9.540465,9.566758,9.518547,1.635759,9.387359,9.245447,5.765459,0.981504


## 4) Saving the new dataframe as a `.csv` file

At this point, we have:

- Started from our cleaned main dataframe: **`spy_1min_et_clean`**
- Added all the **Initial Balance (IB) statistics** to it  
  (ib_high, ib_low, ib_mid, ib_width, day_close, gap, etc.)

So now we have **one enriched dataframe** that contains:

- Every **1-minute candle**
- Plus all the **IB-related features** for that candle’s day

This is a good place to **freeze** the result and save it for later use.



### Why do we save it as a `.csv` file?

We want:

- A **reusable file** that other notebooks or scripts can easily load
- A **stable snapshot** of this processing step, so we don’t need to:
  - Re-run all the IB calculations every time
  - Rebuild merges from scratch

The most convenient format for this is a **`.csv` file**, because:

- It is **simple** and **widely supported**
- It can be opened in:
  - Python (pandas)
  - Excel
  - Other tools, if needed



### Where do we store this file?

We save the new dataframe as a `.csv` inside the folder:

- **`data/cache/`**

This folder is used for:

- Intermediate or **processed data**
- Files that are **ready to be reused** by multiple notebooks

So the flow is:

1. Raw data → **`data/raw/`**  
2. Cleaned base data → **`data/clean/`**  
3. Enriched / feature-added data (like IB-augmented 1-min data) → **`data/cache/`**



### Final step

We **export** our enriched dataframe (the IB-augmented `spy_1min_et_clean`) as a `.csv` file into:

> **`data/cache/`**

From now on, any **later notebook** can simply:

- **Load this `.csv`**
- Immediately work with:
  - 1-minute SPY bars
  - Plus all Initial Balance features  
  **→ without repeating the IB calculation pipeline**

In [120]:
from pathlib import Path

# 1) Define project root which is the main branch in our repository
PROJECT_ROOT = Path("..").resolve()

# 2) We need to go to data/cache folder so define that pathway
DATA_CACHE = PROJECT_ROOT / "data" / "cache"
DATA_CACHE.mkdir(parents=True, exist_ok=True)  # yoksa oluştur

# 3) Kaydedilecek CSV dosyasının adı
clean_csv_path = DATA_CACHE / "spy_1min_et_clean_with_IBlevels.csv"

# 4) df_raw şu anda en son işlenmiş (temiz) halinse, onu kaydediyoruz
df_clean.to_csv(clean_csv_path, index=False)

print("Saved CSV to:", clean_csv_path)

Saved CSV to: /Users/canka/Dev/python/DSA210-Project-Can-Karadogan/data/cache/spy_1min_et_clean_with_IBlevels.csv
