# Steel Casting Sensor Correlation Analysis Demo

This notebook demonstrates the correlation analysis capabilities for steel casting sensor data, including:

1. **Cross-Sensor Correlation Heatmaps** - Visualize correlations between all sensor pairs
2. **Defect-Specific Correlations** - Compare correlation patterns in good vs defective casts
3. **Time-Lagged Correlations** - Analyze delayed relationships between sensors
4. **Feature Importance Indicators** - Identify which sensor combinations are most predictive


In [None]:
# Import required libraries
import sys
import os
sys.path.append('..')

import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.io as pio
from pathlib import Path

# Import our modules
from src.data.data_generator import SteelCastingDataGenerator
from src.features.correlation_analyzer import SensorCorrelationAnalyzer
from src.visualization.plotting_utils import PlottingUtils

# Configure plotly for notebook display
pio.renderers.default = "notebook"

print("Libraries imported successfully!")

## 1. Generate Sample Data

First, let's generate some sample steel casting data with both good and defective casts.

In [None]:
# Initialize data generator
generator = SteelCastingDataGenerator('../configs/data_generation.yaml')

# Generate 10 sample casts for demonstration
cast_data_list = []
print("Generating sample casts...")

for i in range(10):
    cast_id = f"demo_cast_{i+1:03d}"
    time_series, metadata = generator.generate_cast_sequence(cast_id)
    cast_data_list.append((time_series, metadata))
    
    status = "DEFECTIVE" if metadata['defect_label'] else "GOOD"
    print(f"Cast {cast_id}: {status} - Triggers: {metadata['defect_trigger_events']}")

print(f"\nGenerated {len(cast_data_list)} casts for analysis")
print(f"Data shape per cast: {cast_data_list[0][0].shape}")
print(f"Sensor columns: {cast_data_list[0][0].columns.tolist()}")

## 2. Initialize Analysis Tools

In [None]:
# Initialize correlation analyzer and plotting utilities
analyzer = SensorCorrelationAnalyzer()
plotter = PlottingUtils()

print(f"Analyzing sensors: {analyzer.sensor_columns}")

## 3. Cross-Sensor Correlation Heatmaps

Let's start by analyzing the overall correlations between sensors using data from a sample cast.

In [None]:
# Use the first cast as an example
sample_cast_data = cast_data_list[0][0]
sample_metadata = cast_data_list[0][1]

print(f"Analyzing cast: {sample_metadata['cast_id']}")
print(f"Status: {'DEFECTIVE' if sample_metadata['defect_label'] else 'GOOD'}")
print(f"Duration: {sample_metadata['duration_minutes']} minutes")

# Compute correlation matrix
correlation_matrix = analyzer.compute_cross_sensor_correlations(sample_cast_data)
print("\nCorrelation Matrix:")
print(correlation_matrix.round(3))

In [None]:
# Create correlation heatmap
fig_heatmap = plotter.plot_correlation_heatmap(
    sample_cast_data,
    title=f"Sensor Correlations - {sample_metadata['cast_id']}"
)
fig_heatmap.show()

# Identify strongest positive and negative correlations
# Remove diagonal (self-correlations)
corr_values = correlation_matrix.values
np.fill_diagonal(corr_values, np.nan)

# Find max and min correlations
max_corr_idx = np.nanargmax(corr_values)
min_corr_idx = np.nanargmin(corr_values)

max_row, max_col = divmod(max_corr_idx, corr_values.shape[1])
min_row, min_col = divmod(min_corr_idx, corr_values.shape[1])

max_sensors = (correlation_matrix.index[max_row], correlation_matrix.columns[max_col])
min_sensors = (correlation_matrix.index[min_row], correlation_matrix.columns[min_col])

print(f"\nStrongest positive correlation: {max_sensors[0]} ↔ {max_sensors[1]} ({corr_values[max_row, max_col]:.3f})")
print(f"Strongest negative correlation: {min_sensors[0]} ↔ {min_sensors[1]} ({corr_values[min_row, min_col]:.3f})")

## 4. Defect-Specific Correlation Analysis

Now let's compare correlation patterns between good and defective casts to identify differences that might indicate defect precursors.

In [None]:
# Analyze defect-specific correlations
defect_analysis = analyzer.compute_defect_specific_correlations(cast_data_list)

print("Defect-specific correlation analysis:")
for key in defect_analysis.keys():
    print(f"- {key}: {defect_analysis[key].shape}")

# Count good vs defective casts
good_count = sum(1 for _, metadata in cast_data_list if metadata['defect_label'] == 0)
defect_count = sum(1 for _, metadata in cast_data_list if metadata['defect_label'] == 1)
print(f"\nDataset composition: {good_count} good casts, {defect_count} defective casts")

In [None]:
# Create comparison visualization
fig_comparison = plotter.plot_defect_correlation_comparison(
    defect_analysis['good_casts'],
    defect_analysis['defect_casts'],
    defect_analysis['difference']
)
fig_comparison.show()

# Identify largest differences
diff_matrix = defect_analysis['difference']
diff_values = diff_matrix.values
np.fill_diagonal(diff_values, 0)  # Remove diagonal

# Find largest positive and negative differences
max_diff_idx = np.argmax(np.abs(diff_values))
max_diff_row, max_diff_col = divmod(max_diff_idx, diff_values.shape[1])
max_diff_value = diff_values[max_diff_row, max_diff_col]
max_diff_sensors = (diff_matrix.index[max_diff_row], diff_matrix.columns[max_diff_col])

print(f"\nLargest correlation difference between good and defective casts:")
print(f"{max_diff_sensors[0]} ↔ {max_diff_sensors[1]}: {max_diff_value:.3f}")
print(f"(Positive = stronger correlation in defective casts)")

## 5. Time-Lagged Correlation Analysis

Let's analyze delayed relationships between sensors to identify leading indicators.

In [None]:
# Compute time-lagged correlations for key sensor
lagged_correlations = analyzer.compute_time_lagged_correlations(
    sample_cast_data,
    max_lag=60,  # 60 seconds max lag
    target_sensor='mold_temperature'  # Use mold temperature as target
)

print(f"Time-lagged correlation analysis for mold_temperature:")
for pair, lag_data in lagged_correlations.items():
    # Find lag with maximum absolute correlation
    max_corr_idx = lag_data['correlation'].abs().idxmax()
    max_lag = lag_data.loc[max_corr_idx, 'lag']
    max_corr = lag_data.loc[max_corr_idx, 'correlation']
    
    print(f"  {pair}: Max correlation {max_corr:.3f} at lag {max_lag}s")

In [None]:
# Visualize time-lagged correlations
fig_lagged = plotter.plot_time_lagged_correlations(
    lagged_correlations
)
fig_lagged.show()

# Analyze specific pair in detail
pair_to_analyze = list(lagged_correlations.keys())[0]
lag_data = lagged_correlations[pair_to_analyze]

print(f"\nDetailed analysis for {pair_to_analyze}:")
print(f"Correlation at lag 0: {lag_data[lag_data['lag'] == 0]['correlation'].iloc[0]:.3f}")

# Find optimal positive and negative lags
positive_lags = lag_data[lag_data['lag'] > 0]
negative_lags = lag_data[lag_data['lag'] < 0]

if not positive_lags.empty:
    best_positive = positive_lags.loc[positive_lags['correlation'].abs().idxmax()]
    print(f"Best positive lag: {best_positive['lag']}s (correlation: {best_positive['correlation']:.3f})")

if not negative_lags.empty:
    best_negative = negative_lags.loc[negative_lags['correlation'].abs().idxmax()]
    print(f"Best negative lag: {best_negative['lag']}s (correlation: {best_negative['correlation']:.3f})")

## 6. Feature Importance for Defect Prediction

Let's identify which sensor combinations are most predictive of defects.

In [None]:
# Identify predictive sensor combinations
importance_df = analyzer.identify_predictive_sensor_combinations(
    cast_data_list,
    top_k=15
)

print("Top predictive features for defect detection:")
print(importance_df.to_string(index=False))

# Categorize features
statistical_features = importance_df[importance_df['feature'].str.contains('_mean|_std|_min|_max')]
correlation_features = importance_df[importance_df['feature'].str.contains('corr_')]

print(f"\nFeature breakdown:")
print(f"Statistical features: {len(statistical_features)}")
print(f"Correlation features: {len(correlation_features)}")

In [None]:
# Visualize feature importance
fig_importance = plotter.plot_feature_importance_ranking(
    importance_df,
    title="Sensor Feature Importance for Defect Prediction"
)
fig_importance.show()

# Analyze top correlation features
top_corr_features = correlation_features.head(5)
if not top_corr_features.empty:
    print("\nTop correlation-based predictive features:")
    for _, row in top_corr_features.iterrows():
        feature_name = row['feature']
        importance = row['importance']
        # Extract sensor names from correlation feature
        sensors = feature_name.replace('corr_', '').split('_')
        if len(sensors) >= 2:
            print(f"  {sensors[0]} ↔ {sensors[1]}: {importance:.4f}")

## 7. Rolling Correlation Analysis

Let's analyze how correlations change over time during a cast.

In [None]:
# Compute rolling correlations for the sample cast
rolling_correlations = analyzer.compute_rolling_correlations(
    sample_cast_data,
    window_size=300,  # 5-minute rolling window
    sensor_pair=('casting_speed', 'mold_temperature')
)

print(f"Rolling correlation analysis:")
print(f"Window size: 300 seconds (5 minutes)")
print(f"Data shape: {rolling_correlations.shape}")

# Plot rolling correlation
fig_rolling = go.Figure()

fig_rolling.add_trace(go.Scatter(
    x=rolling_correlations.index,
    y=rolling_correlations.iloc[:, 0],
    mode='lines',
    name='Rolling Correlation',
    line=dict(width=2)
))

fig_rolling.add_hline(y=0, line_dash="dash", line_color="gray", opacity=0.5)

fig_rolling.update_layout(
    title="Rolling Correlation: Casting Speed ↔ Mold Temperature",
    xaxis_title="Time",
    yaxis_title="Correlation Coefficient",
    height=400,
    width=800
)

fig_rolling.show()

# Analyze correlation stability
corr_values = rolling_correlations.iloc[:, 0].dropna()
print(f"\nRolling correlation statistics:")
print(f"Mean: {corr_values.mean():.3f}")
print(f"Std: {corr_values.std():.3f}")
print(f"Min: {corr_values.min():.3f}")
print(f"Max: {corr_values.max():.3f}")
print(f"Range: {corr_values.max() - corr_values.min():.3f}")

## 8. Comprehensive Analysis Export

Finally, let's export a comprehensive correlation analysis for future use.

In [None]:
# Create output directory
output_dir = Path('../data/correlation_analysis')
output_dir.mkdir(exist_ok=True)

# Export comprehensive analysis
analysis_file = output_dir / 'comprehensive_correlation_analysis.json'
analyzer.export_correlation_analysis(cast_data_list, str(analysis_file))

# Save key visualizations
print("\nSaving visualizations...")

# Save correlation heatmap
plotter.save_plot(
    fig_heatmap, 
    str(output_dir / 'correlation_heatmap.html'), 
    format='html'
)

# Save defect comparison
plotter.save_plot(
    fig_comparison, 
    str(output_dir / 'defect_correlation_comparison.html'), 
    format='html'
)

# Save time-lagged analysis
plotter.save_plot(
    fig_lagged, 
    str(output_dir / 'time_lagged_correlations.html'), 
    format='html'
)

# Save feature importance
plotter.save_plot(
    fig_importance, 
    str(output_dir / 'feature_importance_ranking.html'), 
    format='html'
)

print(f"Analysis complete! Results saved to {output_dir}")
print(f"Files created:")
for file in output_dir.glob('*'):
    print(f"  - {file.name}")

## Summary

This notebook has demonstrated the comprehensive correlation analysis capabilities for steel casting sensor data:

### Key Findings:

1. **Cross-Sensor Correlations**: We identified the strongest positive and negative correlations between sensor pairs
2. **Defect-Specific Patterns**: We compared correlation patterns between good and defective casts to identify differences
3. **Time-Lagged Relationships**: We analyzed delayed relationships to identify leading indicators
4. **Predictive Features**: We ranked sensor combinations by their predictive power for defect detection
5. **Temporal Dynamics**: We analyzed how correlations change over time during casting

### Applications:

- **Process Monitoring**: Use correlation changes as early warning indicators
- **Sensor Validation**: Identify when sensor relationships deviate from normal patterns
- **Defect Prevention**: Monitor critical sensor combinations with high predictive power
- **Process Optimization**: Understand sensor interdependencies for better control strategies

The analysis results have been exported for further use in production monitoring systems.