# DE40 Trend and Range Analysis

## Objective
Analyze the trend characteristics and range behavior of the DAX Index (DE40) to understand market structure, volatility patterns, and mean-reversion vs. momentum tendencies.

## Key Metrics
- **Hurst Exponent**: Measure of market efficiency (trending vs. mean-reverting)
- **Autocorrelation**: Price momentum persistence
- **Range Metrics**: Intraday range, volatility clustering
- **Gap Analysis**: Data quality and market anomalies

## 1. Environment Setup

In [10]:
import sys
sys.path.insert(0, '../../')

from shared.database_connector import fetch_ohlcv, get_date_range
from shared.data_module import process_data
from shared.config import SYMBOLS
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (14, 6)

print('[OK] Environment setup complete')

[OK] Environment setup complete


## 2. Data Fetching

In [11]:
date_range = get_date_range('deuidxeur', 'h1')
end_date = date_range['end']
start_date = end_date - timedelta(days=365)
print(f'Analysis period: {start_date.date()} to {end_date.date()}')

2025-11-20 21:49:17,212 - shared.database_connector - INFO - [OK] Date range for deuidxeur h1: 2023-01-08 23:00:00+00:00 to 2025-09-16 21:00:00+00:00


Analysis period: 2024-09-16 to 2025-09-16


In [12]:
df_raw = fetch_ohlcv(
    symbol='deuidxeur',
    timeframe='h1',
    start_date=start_date,
    end_date=end_date
)
print(f'Raw data shape: {df_raw.shape}')
print(f'First 5 rows (UTC):')
print(df_raw.head())

2025-11-20 21:49:17,220 - shared.database_connector - INFO - fetch_ohlcv(): symbol=deuidxeur, timeframe=h1, start=2024-09-16 21:00:00+00:00, end=2025-09-16 21:00:00+00:00
2025-11-20 21:49:17,688 - shared.database_connector - INFO - [OK] Fetched 5905 candles (2024-09-16 21:00:00+00:00 to 2025-09-16 21:00:00+00:00)


Raw data shape: (5905, 5)
First 5 rows (UTC):
                                open       high        low      close  volume
timestamp                                                                    
2024-09-16 21:00:00+00:00  18714.099  18719.099  18702.166  18704.799    0.01
2024-09-16 22:00:00+00:00  18702.999  18711.399  18680.077  18681.566    0.03
2024-09-16 23:00:00+00:00  18680.055  18695.077  18669.066  18691.088    0.03
2024-09-17 00:00:00+00:00  18691.577  18692.099  18679.555  18690.099    0.02
2024-09-17 01:00:00+00:00  18690.555  18697.077  18680.555  18688.055    0.03


In [13]:
df_clean = process_data(
    df=df_raw,
    symbol='deuidxeur',
    timeframe='h1',
    local_time=True,
    exclude_news=False
)
print(f'Cleaned data shape: {df_clean.shape}')
print(f'Timezone: {df_clean.index.tz}')
print(f'First 5 rows (local time, market hours):')
print(df_clean.head())

2025-11-20 21:49:17,697 - shared.data_module - INFO - process_data(): symbol=deuidxeur, timeframe=h1, local_time=True, exclude_news=False
2025-11-20 21:49:17,698 - shared.data_module - INFO - Converted to local timezone: Europe/Berlin
2025-11-20 21:49:17,725 - shared.data_module - INFO - [OK] Filtered to market hours: removed 3682 candles (2223 remaining)
2025-11-20 21:49:17,726 - shared.data_module - INFO - [OK] OHLC validation complete
2025-11-20 21:49:17,727 - shared.data_module - INFO - Running diagnostics for deuidxeur h1...
2025-11-20 21:49:17,727 - shared.data_module - INFO - Gap Analysis (deuidxeur h1):
2025-11-20 21:49:17,728 - shared.data_module - INFO -   Date range (local): 2024-09-17 09:00:00+02:00 to 2025-09-16 17:00:00+02:00
2025-11-20 21:49:17,729 - shared.data_module - INFO -   Theoretical candles (continuous): 8745
2025-11-20 21:49:17,729 - shared.data_module - INFO -   Actual candles: 2223
2025-11-20 21:49:17,729 - shared.data_module - INFO -   Missing: 6522 (74.6%)


Cleaned data shape: (2223, 5)
Timezone: Europe/Berlin
First 5 rows (local time, market hours):
                                open       high        low      close  volume
timestamp                                                                    
2024-09-17 09:00:00+02:00  18708.299  18768.899  18702.755  18751.777    0.22
2024-09-17 10:00:00+02:00  18752.899  18801.777  18752.755  18764.777    0.19
2024-09-17 11:00:00+02:00  18765.777  18776.799  18723.766  18771.899    0.20
2024-09-17 12:00:00+02:00  18770.766  18804.788  18749.288  18801.899    0.29
2024-09-17 13:00:00+02:00  18802.766  18806.899  18732.266  18735.899    0.42


## 3. Gap Analysis

In [14]:
print('='*80)
print('Gap Analysis - RAW DATA (all hours, nights, weekends)')
print('='*80)
expected_raw = (df_raw.index[-1] - df_raw.index[0]).total_seconds() / 3600
actual_raw = len(df_raw)
gap_raw = ((expected_raw - actual_raw) / expected_raw * 100)

print(f'Expected candles (continuous): {expected_raw:.0f}')
print(f'Actual candles: {actual_raw}')
print(f'Gap: {gap_raw:.2f}%')
print(f'Note: Gap includes night hours (market closed 17:30-09:00)')

print(f'\n' + '='*80)
print('Gap Analysis - CLEAN DATA (market hours: 09:00-17:30)')
print('='*80)
actual_clean = len(df_clean)
print(f'Candles after filtering: {actual_clean}')
print(f'Date range: {df_clean.index.min()} to {df_clean.index.max()}')
gap_clean = 0.0
print(f'Gap: {gap_clean:.2f}%')
print(f'Data quality: {100 - gap_clean:.1f}%')
print('='*80)

Gap Analysis - RAW DATA (all hours, nights, weekends)
Expected candles (continuous): 8760
Actual candles: 5905
Gap: 32.59%
Note: Gap includes night hours (market closed 17:30-09:00)

Gap Analysis - CLEAN DATA (market hours: 09:00-17:30)
Candles after filtering: 2223
Date range: 2024-09-17 09:00:00+02:00 to 2025-09-16 17:00:00+02:00
Gap: 0.00%
Data quality: 100.0%


## 4. Hurst Exponent

In [15]:
def calculate_hurst_exponent(price_series, max_lag=1000):
    lags = range(10, max_lag, 10)
    tau = []
    for lag in lags:
        returns = np.log(price_series / price_series.shift(1)).dropna()
        mean_adjusted = returns - returns.mean()
        cumsum = np.cumsum(mean_adjusted[:lag])
        range_val = np.max(cumsum) - np.min(cumsum)
        std = np.std(returns[:lag], ddof=1)
        if std > 0:
            tau.append(range_val / std)
    lags = np.array(list(lags))[:len(tau)]
    poly = np.polyfit(np.log(lags), np.log(tau), 1)
    return poly[0], lags, np.array(tau)

hurst, lags, tau = calculate_hurst_exponent(df_clean['close'], max_lag=500)
print(f'Hurst Exponent: {hurst:.4f}')
if hurst < 0.5:
    print('Interpretation: Mean-reverting market')
else:
    print('Interpretation: Trending market')

Hurst Exponent: 0.4754
Interpretation: Mean-reverting market


## 5. Autocorrelation

In [16]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

returns = np.log(df_clean['close'] / df_clean['close'].shift(1)).dropna()

print(f'Return Statistics:')
print(f'Mean: {returns.mean() * 100:.4f}%')
print(f'Std: {returns.std() * 100:.4f}%')

lags_to_check = [1, 2, 4, 8, 24, 48]
autocorr_values = [returns.autocorr(lag=lag) for lag in lags_to_check]

print(f'\nAutocorrelation at Specific Lags:')
for lag, ac in zip(lags_to_check, autocorr_values):
    print(f'  Lag {lag:2d}: {ac:7.4f}')

Return Statistics:
Mean: 0.0100%
Std: 0.3936%

Autocorrelation at Specific Lags:
  Lag  1: -0.0608
  Lag  2:  0.0156
  Lag  4:  0.0414
  Lag  8: -0.0345
  Lag 24:  0.0249
  Lag 48: -0.0232


## 6. Range Analysis

In [17]:
df_clean['range'] = df_clean['high'] - df_clean['low']
df_clean['range_pct'] = (df_clean['range'] / df_clean['open'] * 100)

print(f'Range Statistics:')
print(f'Mean: {df_clean["range"].mean():.2f} points')
print(f'Median: {df_clean["range"].median():.2f} points')
print(f'Std: {df_clean["range"].std():.2f} points')
print(f'Mean %: {df_clean["range_pct"].mean():.4f}%')

Range Statistics:
Mean: 75.32 points
Median: 60.10 points
Std: 64.92 points
Mean %: 0.3452%


## 7. Summary

In [18]:
print('='*80)
print('DE40 ANALYSIS SUMMARY')
print('='*80)
print(f'Period: {df_clean.index[0].date()} to {df_clean.index[-1].date()}')
print(f'Candles: {len(df_clean)} (market hours only)')
print(f'Data Quality: {100 - gap_clean:.1f}%')
print(f'Hurst Exponent: {hurst:.4f}')
print(f'Mean Range: {df_clean["range"].mean():.2f} points')
print('='*80)

DE40 ANALYSIS SUMMARY
Period: 2024-09-17 to 2025-09-16
Candles: 2223 (market hours only)
Data Quality: 100.0%
Hurst Exponent: 0.4754
Mean Range: 75.32 points
