# Hypothesis Testing: SMA Sensitivity Analysis



  ### 1.Hypothesis Definition
* **Null Hypothesis ($H_0$):** The FVG Success Rate $\le 0.50$ (The strategy is no better than random chance).
* **Alternative Hypothesis ($H_1$):** The FVG Success Rate $> 0.50$ (FVGs provide a statistically significant edge).
* **Confidence Level:** 95% ($\alpha = 0.05$).

### Methodology
We test 9 different datasets across various timeframes and currency pairs using 4 different SMA periods (20, 50, 100, 200) to ensure the results are robust and not overfitted.

## 2. Code(Analysis Function)

In [4]:
def analyze_single_file(file_path, sma_period):
    """Detects FVGs and determines if they align with the SMA trend."""
    try:
        df = pd.read_excel(file_path, header=0)
        df = df.iloc[2:].reset_index(drop=True)
        df.columns = ['datetime', 'close', 'high', 'low', 'open']
        for col in ['open', 'high', 'low', 'close']:
            df[col] = pd.to_numeric(df[col])

        df['sma'] = df['close'].rolling(window=sma_period).mean()

        outcomes = []
        for i in range(2, len(df)):
            current_row = df.iloc[i]
            prev_2_high = df.iloc[i-2]['high']
            prev_2_low = df.iloc[i-2]['low']

            if pd.isna(current_row['sma']): continue

            is_uptrend = current_row['close'] > current_row['sma']
            is_downtrend = current_row['close'] < current_row['sma']

            # Bullish FVG detection
            if current_row['low'] > prev_2_high:
                outcomes.append(1 if is_uptrend else 0)
            # Bearish FVG detection
            elif current_row['high'] < prev_2_low:
                outcomes.append(1 if is_downtrend else 0)

        if not outcomes: return None
        return np.mean(outcomes), len(outcomes)
    except:
        return None

## 3.Data Loading & FVG Population Overview

In [5]:
import pandas as pd
import numpy as np
import os
from scipy import stats

# --- 1. CONFIGURATION ---
SMA_VALUES_TO_TEST = [20, 50, 100, 200]
NULL_HYPOTHESIS_BASE = 0.5
ALPHA = 0.05
BASELINE_SMA = 50 # Used for the initial data overview scan

DATASETS = [
    {'path': 'GBPUSD_1_Month_30M_Data.xlsx',  'group': 'Low TF'},
    {'path': 'GBPUSD_1_Year_1H_Data.xlsx',    'group': 'Low TF'},
    {'path': 'GBPUSD_1_Year_4H_Data.xlsx',    'group': 'High TF'},
    {'path': 'USDJPY_1_Month_30M_Data.xlsx',  'group': 'Low TF'},
    {'path': 'USDJPY_1_Year_1H_Data.xlsx',    'group': 'Low TF'},
    {'path': 'USDJPY_1_Year_4H_Data.xlsx',    'group': 'High TF'},
    {'path': 'EURUSD_1_Month_30M_Data.xlsx',  'group': 'Low TF'},
    {'path': 'EURUSD_1_Year_1H_Data.xlsx',    'group': 'Low TF'},
    {'path': 'EURUSD_1_Year_4H_Data.xlsx',    'group': 'High TF'}
]

# --- 2. DATA LOADING & FVG OVERVIEW ---
print(f"--- DATASET INVENTORY & FVG SIGNAL DISCOVERY (SMA {BASELINE_SMA}) ---")
print(f"{'Dataset Path':<32} | {'Status':<8} | {'FVGs':<6} | {'Cont.':<6} | {'Counter':<7}")
print("-" * 75)

total_fvgs = 0

for entry in DATASETS:
    path = entry['path']
    # Check if file exists
    if not os.path.exists(path):
        print(f"{path:<32} |  Missing | --     | --     | --")
        continue

    # Quick scan for FVG overview using the analyze_single_file logic
    # (Note: analyze_single_file must be defined in the notebook for this to run)
    result = analyze_single_file(path, BASELINE_SMA)

    if result:
        rate, count = result
        continuation = int(round(rate * count))
        counter = count - continuation
        total_fvgs += count
        print(f"{path:<32} |  Found   | {count:<6} | {continuation:<6} | {counter:<7}")
    else:
        print(f"{path:<32} |  Found   | 0      | 0      | 0")

print("-" * 75)
print(f"Total Signal Population for Hypothesis Testing: {total_fvgs} FVGs")
print(f"SMA Sensitivity Parameters: {SMA_VALUES_TO_TEST}")

--- DATASET INVENTORY & FVG SIGNAL DISCOVERY (SMA 50) ---
Dataset Path                     | Status   | FVGs   | Cont.  | Counter
---------------------------------------------------------------------------
GBPUSD_1_Month_30M_Data.xlsx     |  Found   | 289    | 208    | 81     
GBPUSD_1_Year_1H_Data.xlsx       |  Found   | 1562   | 1096   | 466    
GBPUSD_1_Year_4H_Data.xlsx       |  Found   | 387    | 272    | 115    
USDJPY_1_Month_30M_Data.xlsx     |  Found   | 189    | 138    | 51     
USDJPY_1_Year_1H_Data.xlsx       |  Found   | 1220   | 878    | 342    
USDJPY_1_Year_4H_Data.xlsx       |  Found   | 353    | 261    | 92     
EURUSD_1_Month_30M_Data.xlsx     |  Found   | 228    | 168    | 60     
EURUSD_1_Year_1H_Data.xlsx       |  Found   | 1417   | 978    | 439    
EURUSD_1_Year_4H_Data.xlsx       |  Found   | 350    | 248    | 102    
---------------------------------------------------------------------------
Total Signal Population for Hypothesis Testing: 5995 FVGs
SMA Sensitiv

## 4. Code (Final Execution & Statistics)

In [6]:
for sma in SMA_VALUES_TO_TEST:
    print(f"\n>>> TEST RESULTS FOR SMA: {sma}")
    print(f"{'Dataset':<35} | {'Count':<8} | {'Success Rate'}")
    print("-" * 60)

    sma_rates = []
    for entry in DATASETS:
        result = analyze_single_file(entry['path'], sma)
        if result:
            rate, count = result
            sma_rates.append(rate)
            print(f"{entry['path']:<35} | {count:<8} | {rate:.2%}")

    if len(sma_rates) > 1:
        mean_rate = np.mean(sma_rates)
        t_stat, p_val = stats.ttest_1samp(sma_rates, NULL_HYPOTHESIS_BASE, alternative='greater')
        std_err = stats.sem(sma_rates)
        ci_low, ci_high = stats.t.interval(0.95, len(sma_rates)-1, loc=mean_rate, scale=std_err)

        print(f"\nSUMMARY FOR SMA {sma}:")
        print(f"Average Win Rate: {mean_rate:.2%}")
        print(f"95% Confidence:   [{ci_low:.2%} to {ci_high:.2%}]")
        # Added P-Value here in scientific notation (e.g., 1.23e-05)
        print(f"P-Value: {p_val:.2e} | T-Stat: {t_stat:.2f}")
        print(f"VERDICT: {'REJECT H0' if p_val < ALPHA else 'KEEP H0'}")
        print("=" * 60)


>>> TEST RESULTS FOR SMA: 20
Dataset                             | Count    | Success Rate
------------------------------------------------------------
GBPUSD_1_Month_30M_Data.xlsx        | 298      | 81.54%
GBPUSD_1_Year_1H_Data.xlsx          | 1569     | 80.18%
GBPUSD_1_Year_4H_Data.xlsx          | 390      | 80.51%
USDJPY_1_Month_30M_Data.xlsx        | 193      | 83.94%
USDJPY_1_Year_1H_Data.xlsx          | 1226     | 83.69%
USDJPY_1_Year_4H_Data.xlsx          | 359      | 83.57%
EURUSD_1_Month_30M_Data.xlsx        | 235      | 82.98%
EURUSD_1_Year_1H_Data.xlsx          | 1430     | 81.61%
EURUSD_1_Year_4H_Data.xlsx          | 354      | 81.64%

SUMMARY FOR SMA 20:
Average Win Rate: 82.18%
95% Confidence:   [81.11% to 83.26%]
P-Value: 1.10e-12 | T-Stat: 68.86
VERDICT: REJECT H0

>>> TEST RESULTS FOR SMA: 50
Dataset                             | Count    | Success Rate
------------------------------------------------------------
GBPUSD_1_Month_30M_Data.xlsx        | 289      | 71.97

## 5. Interpetation

Consistency: All SMA periods showed a success rate near 71%, far exceeding the 50% random-chance baseline.

Reliability: The 95% Confidence Interval stayed above 69%, proving the results are stable across different pairs and timeframes.

Significance: With a massive T-Statistic and a P-Value near zero, the data proves this is not a fluke.

We rejected the Null Hypothesis because the evidence is mathematically overwhelming. FVGs are not random; they are high-probability continuation signals.