# Step 1: Data Loading and Normalization

**Data already included!** üéâ

- Gas: `../data/logret_gas.dat`
- Electricity: `../data/logret_electricity.dat`

**Period**: 2019-2023 (1825 daily observations)

## Import Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import skew, kurtosis

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 5)

print('‚úÖ Libraries loaded')

## Load Log-Returns

The data are already **log-returns** calculated with LOESS preprocessing.

In [None]:
# Load log-returns
gas_returns = np.loadtxt('../data/logret_gas.dat')
el_returns = np.loadtxt('../data/logret_electricity.dat')

print(f'‚úÖ Data loaded: {len(gas_returns)} observations')
print(f'   Period: 2019-01-02 to 2023-12-31')
print(f'   Gas range: [{gas_returns.min():.4f}, {gas_returns.max():.4f}]')
print(f'   Electricity range: [{el_returns.min():.4f}, {el_returns.max():.4f}]')

## Descriptive Statistics (Table 1 in Paper)

We calculate the **first 4 moments** as in Table 1 of the paper.

In [None]:
# Calculate statistics
stats = pd.DataFrame({
    'Market': ['Natural Gas', 'Electricity'],
    'Mean': [gas_returns.mean(), el_returns.mean()],
    'Std': [gas_returns.std(), el_returns.std()],
    'Skewness': [skew(gas_returns), skew(el_returns)],
    'Kurtosis': [kurtosis(gas_returns, fisher=False), kurtosis(el_returns, fisher=False)]
})

print('\n' + '='*70)
print('üìä TABLE 1 - Descriptive Statistics (from paper)')
print('='*70)
print(stats.to_string(index=False))
print('='*70)

# Correlation
corr = np.corrcoef(gas_returns, el_returns)[0,1]
print(f'\nüîó Correlation: œÅ = {corr:.4f}')

# Key observations
print('\nüí° Key Observations:')
print(f'   ‚Ä¢ Electricity volatility / gas: {el_returns.std()/gas_returns.std():.2f}x')
print(f'   ‚Ä¢ Gas kurtosis / electricity: {kurtosis(gas_returns, fisher=False)/kurtosis(el_returns, fisher=False):.2f}x')
print(f'   ‚Ä¢ Electricity is {el_returns.std()/gas_returns.std():.1f}x more volatile')
print(f'   ‚Ä¢ Gas has heavier tails (kurtosis {kurtosis(gas_returns, fisher=False):.1f} vs {kurtosis(el_returns, fisher=False):.1f})')

## Visualize Log-Returns

In [None]:
# Create time index
dates = pd.date_range('2019-01-02', periods=len(gas_returns), freq='D')

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 8))

# Gas
ax1.plot(dates, gas_returns, alpha=0.7, linewidth=0.8, label='Natural Gas')
ax1.axhline(0, color='red', linestyle='--', alpha=0.3)
ax1.set_title('Natural Gas - Log-Returns', fontsize=12, fontweight='bold')
ax1.set_ylabel('Log-return')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Electricity
ax2.plot(dates, el_returns, alpha=0.7, linewidth=0.8, color='orange', label='Electricity')
ax2.axhline(0, color='red', linestyle='--', alpha=0.3)
ax2.set_title('Electricity - Log-Returns', fontsize=12, fontweight='bold')
ax2.set_ylabel('Log-return')
ax2.set_xlabel('Date')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('../figures/01_logreturn_series.png', dpi=150, bbox_inches='tight')
plt.show()

print('‚úÖ Time series visualized')

## Distributions

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Gas
ax1.hist(gas_returns, bins=50, edgecolor='black', alpha=0.7, density=True)
ax1.axvline(gas_returns.mean(), color='red', linestyle='--', linewidth=2, 
            label=f'Mean = {gas_returns.mean():.4f}')
ax1.set_title('Natural Gas - Distribution')
ax1.set_xlabel('Log-return')
ax1.set_ylabel('Density')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Electricity
ax2.hist(el_returns, bins=50, edgecolor='black', alpha=0.7, color='orange', density=True)
ax2.axvline(el_returns.mean(), color='red', linestyle='--', linewidth=2,
            label=f'Mean = {el_returns.mean():.4f}')
ax2.set_title('Electricity - Distribution')
ax2.set_xlabel('Log-return')
ax2.set_ylabel('Density')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('../figures/01_distributions.png', dpi=150, bbox_inches='tight')
plt.show()

print('‚úÖ Distributions visualized')

## Min-Max Normalization

Formula: $\bar{r} = \frac{r - r_{min}}{r_{max} - r_{min}}$

Normalize all values to the interval **[0, 1]**

In [None]:
def normalize_minmax(data):
    """
    Min-max normalization to [0,1]
    """
    return (data - data.min()) / (data.max() - data.min())

# Normalize
gas_norm = normalize_minmax(gas_returns)
el_norm = normalize_minmax(el_returns)

print('‚úÖ Normalization completed')
print(f'\nüìä Normalized range:')
print(f'   Gas: [{gas_norm.min():.3f}, {gas_norm.max():.3f}]')
print(f'   Electricity: [{el_norm.min():.3f}, {el_norm.max():.3f}]')

# Verify: skewness and kurtosis do NOT change with normalization
print(f'\n‚úì Verification: Gas skewness = {skew(gas_norm):.4f} (same as before)')
print(f'‚úì Verification: Gas kurtosis = {kurtosis(gas_norm, fisher=False):.4f} (same as before)')

## Visualize Normalized Data

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Normalized time series
ax1.plot(dates, gas_norm, alpha=0.6, label='Gas', linewidth=0.8)
ax1.plot(dates, el_norm, alpha=0.6, label='Electricity', linewidth=0.8, color='orange')
ax1.set_title('Normalized Log-Returns [0,1]')
ax1.set_ylabel('Normalized value')
ax1.set_xlabel('Date')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Normalized distributions
ax2.hist(gas_norm, bins=50, alpha=0.6, label='Gas', density=True)
ax2.hist(el_norm, bins=50, alpha=0.6, label='Electricity', color='orange', density=True)
ax2.set_title('Normalized Distributions')
ax2.set_xlabel('Normalized value [0,1]')
ax2.set_ylabel('Density')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('../figures/01_normalized.png', dpi=150, bbox_inches='tight')
plt.show()

print('‚úÖ Normalized data visualized')

## Save Processed Data

In [None]:
# Create DataFrame
df = pd.DataFrame({
    'Date': dates,
    'gas_return': gas_returns,
    'el_return': el_returns,
    'gas_norm': gas_norm,
    'el_norm': el_norm
})

# Save as CSV
df.to_csv('../data/preprocessed_data.csv', index=False)

# Save as NumPy arrays (faster loading)
np.save('../data/gas_normalized.npy', gas_norm)
np.save('../data/electricity_normalized.npy', el_norm)

print('‚úÖ Data saved:')
print('   ‚Ä¢ ../data/preprocessed_data.csv (readable format)')
print('   ‚Ä¢ ../data/gas_normalized.npy (fast format)')
print('   ‚Ä¢ ../data/electricity_normalized.npy')
print(f'\n   Shape: {len(gas_norm)} observations')
print('\nüéØ Next step: 02_visibility_graphs.ipynb')

---

## Summary

### Data Loaded
- ‚úÖ 1825 observations (2019-2023)
- ‚úÖ Natural gas and electricity
- ‚úÖ Period includes COVID, Ukraine war, energy crisis

### Key Statistics (Table 1 Paper)
- Mean gas: ~2.47e-05, electricity: ~2.49e-04
- Std gas: ~0.0705, electricity: ~0.1467  
- Skewness: ~0.40 (gas), ~0.32 (electricity)
- Kurtosis: ~14.0 (gas), ~5.4 (electricity)
- **Correlation œÅ = 0.46**

### Observations
1. **Electricity 2x more volatile** than gas
2. **Gas has heavier tails** (high kurtosis ‚Üí extreme events)
3. **Moderate correlation** (0.46) ‚Üí markets connected but not perfectly

### Output
- ‚úÖ Normalized data [0,1] ready for Step 2
- ‚úÖ Figures saved in `figures/`
- ‚úÖ CSV and NumPy saved in `data/`

---

## Next Step

**Notebook 02**: Transform these log-returns into **Natural Visibility Graphs**!

Open: `02_visibility_graphs.ipynb`