# Step 2: Natural Visibility Graphs

**Goal**: Transform time series into graphs

**Input**: Normalized log-returns  
**Output**: Graphs representing temporal relationships

## Import Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import networkx as nx
from ts2vg import NaturalVG
import pickle

plt.style.use('seaborn-v0_8-whitegrid')
print('âœ… Libraries loaded')

## Load Preprocessed Data

In [None]:
# Load normalized log-returns
gas_norm = np.load('../data/gas_normalized.npy')
el_norm = np.load('../data/electricity_normalized.npy')

print(f'âœ… Data loaded: {len(gas_norm)} observations')
print(f'   Gas range: [{gas_norm.min():.3f}, {gas_norm.max():.3f}]')
print(f'   Electricity range: [{el_norm.min():.3f}, {el_norm.max():.3f}]')

## What is a Visibility Graph?

**Idea**: Imagine each point in the series as a vertical bar.

Two points are **connected** if they can "see" each other (no bar in between blocks them).

**NVG Algorithm**: Two nodes $i$ and $j$ are connected if:

$$x_k < x_j + \frac{j-k}{j-i}(x_i - x_j) \quad \forall k: i < k < j$$

**Advantages**:
- âœ… No parameters to tune
- âœ… Preserves temporal order
- âœ… Captures complex dependencies

## Build Visibility Graphs

In [None]:
print('ðŸ”„ Building Natural Visibility Graphs...')
print('   (may take 1-2 minutes)\n')

# Natural Gas
print('[1/2] Natural gas...')
nvg_gas = NaturalVG()
nvg_gas.build(gas_norm)
G_gas = nvg_gas.as_networkx()
print(f'   âœ… Nodes: {G_gas.number_of_nodes()}, Edges: {G_gas.number_of_edges()}')

# Electricity
print('[2/2] Electricity...')
nvg_el = NaturalVG()
nvg_el.build(el_norm)
G_el = nvg_el.as_networkx()
print(f'   âœ… Nodes: {G_el.number_of_nodes()}, Edges: {G_el.number_of_edges()}')

print('\nâœ… Visibility graphs built!')

## Graph Statistics

In [None]:
def graph_stats(G, name):
    """Calculate and print graph statistics"""
    print(f'\nðŸ“ˆ {name}')
    print('='*50)
    print(f'Nodes: {G.number_of_nodes()}')
    print(f'Edges: {G.number_of_edges()}')
    print(f'Density: {nx.density(G):.4f}')
    
    degrees = [d for n, d in G.degree()]
    print(f'Average degree: {np.mean(degrees):.2f}')
    print(f'Degree min/max: {min(degrees)} / {max(degrees)}')
    print(f'Clustering coefficient: {nx.average_clustering(G):.4f}')
    
    return degrees

deg_gas = graph_stats(G_gas, 'Natural Gas')
deg_el = graph_stats(G_el, 'Electricity')

## Degree Distribution

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 4))

# Gas
ax1.hist(deg_gas, bins=30, edgecolor='black', alpha=0.7)
ax1.axvline(np.mean(deg_gas), color='red', linestyle='--', 
            label=f'Mean = {np.mean(deg_gas):.1f}')
ax1.set_title('Natural Gas - Degree Distribution')
ax1.set_xlabel('Degree')
ax1.set_ylabel('Frequency')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Electricity
ax2.hist(deg_el, bins=30, edgecolor='black', alpha=0.7, color='orange')
ax2.axvline(np.mean(deg_el), color='red', linestyle='--',
            label=f'Mean = {np.mean(deg_el):.1f}')
ax2.set_title('Electricity - Degree Distribution')
ax2.set_xlabel('Degree')
ax2.set_ylabel('Frequency')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('../figures/02_degree_distributions.png', dpi=150, bbox_inches='tight')
plt.show()

print('âœ… Degree distributions visualized')

## Visualize Graph (Subsample)

Visualize first 100 nodes for clarity

In [None]:
def plot_graph_sample(G, values, title, n_nodes=100):
    """
    Visualize a subsample of the graph preserving temporal order.
    """
    # Subgraph
    nodes = list(range(n_nodes))
    G_sub = G.subgraph(nodes)
    
    # Positions: x=time, y=value
    pos = {i: (i, values[i]) for i in nodes}
    
    fig, ax = plt.subplots(figsize=(14, 6))
    
    # Draw edges (connections)
    nx.draw_networkx_edges(G_sub, pos, alpha=0.3, width=0.5, ax=ax)
    
    # Draw time series with nodes
    ax.plot(nodes, values[nodes], 'o-', markersize=4, linewidth=1.5, alpha=0.8)
    
    ax.set_title(title, fontsize=14, fontweight='bold')
    ax.set_xlabel('Time (days)', fontsize=12)
    ax.set_ylabel('Normalized log-return', fontsize=12)
    ax.grid(True, alpha=0.3)
    
    return fig

# Visualize Gas
print('Natural Gas graph (first 100 days)...')
plot_graph_sample(G_gas, gas_norm, 'Natural Gas - Visibility Graph')
plt.savefig('../figures/02_graph_gas.png', dpi=150, bbox_inches='tight')
plt.show()

# Visualize Electricity
print('Electricity graph (first 100 days)...')
plot_graph_sample(G_el, el_norm, 'Electricity - Visibility Graph')
plt.savefig('../figures/02_graph_electricity.png', dpi=150, bbox_inches='tight')
plt.show()

print('âœ… Graphs visualized')

## Interpretation

**What do we see in the graphs?**

1. **Edges = Visibility**: If two points are connected, they can "see" each other
2. **High degree**: Point visible from many others â†’ important event
3. **Low degree**: Hidden point â†’ normal period
4. **Clusters**: Groups of strongly connected points â†’ recurring patterns

**Observations**:
- Peaks (spikes) have high degree
- Valleys have low degree
- Graph structure captures volatility

## Save Graphs

In [None]:
# Save as pickle files
with open('../data/graph_gas.pkl', 'wb') as f:
    pickle.dump(G_gas, f)

with open('../data/graph_electricity.pkl', 'wb') as f:
    pickle.dump(G_el, f)

print('âœ… Graphs saved:')
print('   â€¢ ../data/graph_gas.pkl')
print('   â€¢ ../data/graph_electricity.pkl')
print('\nðŸŽ¯ Next step: 03_embeddings.ipynb')

---

## Summary

1. âœ… Built Natural Visibility Graphs
2. âœ… Analyzed graph statistics (nodes, edges, degree, clustering)
3. âœ… Visualized degree distribution
4. âœ… Visualized graphs (subsample)
5. âœ… Saved graphs

**Output**: NetworkX graphs saved as `.pkl`

**Next**: `03_embeddings.ipynb` - Transform graphs into 128-dimensional vectors!