# Creating Initial Balance lines

The initial balance is the first hour of the regular trading session. In U.S. Eastern Time this is 09:30–10:30. Many traders consider this first hour special because it often sets the day’s early high and low and defines the early character of the day. It also summarizes the first hour of trading, when a large part of institutional positioning happens. We use the initial balance to locate strong bursts that may shape intraday direction.

So, we need to determine the highest and lowest price levels of the first hour in each U.S. Stock Market trading day.

- We have 55 trading days which means there must be 55 Initial Balance pairs with high/low levels.

In [28]:
from pathlib import Path
import pandas as pd

In [29]:
# Now I can use our cleaned data which we made in "00_data_setup.ipynb"

PROJECT_ROOT = Path("..").resolve()

DATA_RAW = PROJECT_ROOT / "data" / "raw"
DATA_CLEAN = PROJECT_ROOT / "data" / "clean"

CLEAN_FILE = DATA_CLEAN / "spy_1min_et_clean.csv"

df_clean = pd.read_csv(CLEAN_FILE, parse_dates=['datetime'])

df_clean.head()

Unnamed: 0,datetime,high,low,close,Volume
0,2025-09-08 09:30:00,648.86,648.24,648.26,141588
1,2025-09-08 09:31:00,648.45,648.15,648.27,42118
2,2025-09-08 09:32:00,648.46,648.1,648.26,37143
3,2025-09-08 09:33:00,648.47,648.23,648.4,42231
4,2025-09-08 09:34:00,648.68,648.32,648.665,23659


In [30]:
# We must define initial hour candles from all data

# We are creating filter variable for all cleaned data to just looking at the time frames between 09:30 - 10:30
is_ib = (df_clean["datetime"].dt.hour == 9) | (df_clean["datetime"].dt.hour == 10) & (df_clean["datetime"].dt.minute <= 30)

# After creating variable, we define this filter in our cleaned data
df_ib = df_clean[is_ib]

df_ib.head(184)



Unnamed: 0,datetime,high,low,close,Volume
0,2025-09-08 09:30:00,648.86,648.24,648.260,141588
1,2025-09-08 09:31:00,648.45,648.15,648.270,42118
2,2025-09-08 09:32:00,648.46,648.10,648.260,37143
3,2025-09-08 09:33:00,648.47,648.23,648.400,42231
4,2025-09-08 09:34:00,648.68,648.32,648.665,23659
...,...,...,...,...,...
837,2025-09-10 10:27:00,654.29,654.13,654.140,26221
838,2025-09-10 10:28:00,654.25,653.92,653.960,27867
839,2025-09-10 10:29:00,654.02,653.91,653.940,22403
840,2025-09-10 10:30:00,654.03,653.82,653.930,31068


## 1) We had a distinct dataframe for just initial balance 1 minute candles

The next step will be define each trading days' initial balance **high/low/range/mid**. By compressing these periods into four numbers (high, low, mid, width), we turn noisy one-minute bars into a simple intraday “map”. Later AVWAPs and directional signals can then be interpreted relative to these IB levels, which helps us classify the day type and potential continuation or reversal setups.

- **ib_high:** The highest price reached during the Initial Balance (IB) window which shows the upper edge of early auction. Later in the day, price often reacts to this level (rejection or breakout).

- **ib_low:** The lowest price reached during the IB window which shows the lower edge of early auction. It works as a natural support level and a reference for downside breaks

- **ib_mid:** The midpoint between ib_high and ib_low which acts as a fair price for the early session. We can compare current price vs. this mid to detect bias (above = bullish tilt, below = bearish tilt).

- **ib_width:** The distance between ib_high and ib_low (IB range) which measures how volatile and wide the first hour was. A narrow width suggests potential for expansion; a wide width suggests the market already moved a lot.

In [32]:
#The logic is very simple when finding ib_high and ib_low which are 'high'est highs and 'low'est lows in our filtered dataframe (df_ib)
ib_stats = df_ib.groupby(df_ib["datetime"].dt.date).agg(
    ib_high=("high", "max"),
    ib_low=("low", "min")
).reset_index()

#ib_mid is the middle line between ib_high and ib_low
ib_stats["ib_mid"] = 0.5 * (ib_stats["ib_high"] + ib_stats["ib_low"])

#ib_width represents the distance
ib_stats["ib_width"] = ib_stats["ib_high"] - ib_stats["ib_low"]

ib_stats.head()


Unnamed: 0,datetime,ib_high,ib_low,ib_mid,ib_width
0,2025-09-08,649.06,647.75,648.405,1.31
1,2025-09-09,649.72,648.43,649.075,1.29
2,2025-09-10,654.55,652.7,653.625,1.85
3,2025-09-11,656.25,653.59,654.92,2.66
4,2025-09-12,658.19,657.14,657.665,1.05


In [None]:
# We need to define whether ib_width is large or small relative to our data
import numpy as np

median_width = ib_stats["ib_width"].median()

ib_stats["ib_width_type"] = np.where(
    ib_stats["ib_width"] <= median_width, "narrow", "wide"
)

ib_stats.head()

# IMPORTANT REMINDER: THIS CLASSIFICATION IS JUST FOR OUR DATA. IN MORE LARGER (MORE PAST) DATA SAMPLES OUR "WIDE" IB_WIDTHS CAN BE 
# CONSIDERED AS NARROW, BECAUSE WE ARE JUST CONSIDERING 55 DISTINCT TRADING DAYS.  

Unnamed: 0,datetime,ib_high,ib_low,ib_mid,ib_width,ib_width_type
0,2025-09-08,649.06,647.75,648.405,1.31,narrow
1,2025-09-09,649.72,648.43,649.075,1.29,narrow
2,2025-09-10,654.55,652.7,653.625,1.85,narrow
3,2025-09-11,656.25,653.59,654.92,2.66,wide
4,2025-09-12,658.19,657.14,657.665,1.05,narrow


## 2) Adding end-of-day close and gap info

We add the daily close price to the IB summary because the IB only describes the start of the session.

- For our hypothesises, we also need to know how the story ends: where the day actually closes and how large the gap was from the previous close.

- By merging **day_close** into **ib_stats**, each day becomes a complete unit: **IB structure at the open + final close at the end**. This allows us to test whether certain IB patterns or gap conditions are linked to stronger trend days, reversals, or mean-reversion behavior.