# EDA: Polycarbonate Prices

This notebook performs exploratory data analysis on polycarbonate (PC) prices from Asian and European suppliers.

## Objectives
1. **Understand temporal patterns**: trends, seasonality, cyclicality
2. **Compare regional differences**: Asia vs Europe pricing dynamics
3. **Analyze supplier variations**: price dispersion and best-price strategies
4. **Identify anomalies**: outliers, structural breaks, unusual patterns
5. **Prepare for modeling**: stationarity checks, transformation needs

In [None]:
# Import libraries
import warnings

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import seaborn as sns

warnings.filterwarnings("ignore")

# Set style
sns.set_style("whitegrid")
plt.rcParams["figure.figsize"] = (12, 6)

# Configure paths
from config.paths import RAW_DATA_DIR  # noqa: E402

## 1. Data Loading

In [None]:
# Load PC price data
pc_price_dir = RAW_DATA_DIR / "pc_price"

df_pc_asia = pd.read_csv(pc_price_dir / "pc_price_asia.csv")
df_pc_eu = pd.read_csv(pc_price_dir / "pc_price_eu.csv")

# Convert dates to datetime
df_pc_asia["date"] = pd.to_datetime(df_pc_asia["date"])
df_pc_eu["date"] = pd.to_datetime(df_pc_eu["date"])

print("Asian PC Prices:")
print(f"  Shape: {df_pc_asia.shape}")
date_min = df_pc_asia["date"].dt.date.min()
date_max = df_pc_asia["date"].dt.date.max()
print(f"  Date range: {date_min} to {date_max}")
print(f"  Suppliers: {df_pc_asia.shape[1] - 1}")

print("\nEuropean PC Prices:")
print(f"  Shape: {df_pc_eu.shape}")
print(f"  Date range: {df_pc_eu['date'].min()} to {df_pc_eu['date'].max()}")
print(f"  Suppliers: {df_pc_eu.shape[1] - 1}")

In [None]:
# Preview data
print("Asia - First few rows:")
display(df_pc_asia.head())

print("\nEurope - First few rows:")
display(df_pc_eu.head())

## 2. Data Quality Assessment

In [None]:
# Check for missing values
def analyze_missing_data(df, region_name):
    """Analyze and display missing data in the dataframe.

    Args:
        df: pandas DataFrame to analyze
        region_name: string name of the region for display purposes

    Returns:
        DataFrame with missing value counts and percentages
    """
    print(f"\n{'=' * 60}")
    print(f"{region_name} - Missing Data Analysis")
    print(f"{'=' * 60}")

    missing = df.isnull().sum()
    missing_pct = (missing / len(df)) * 100

    missing_df = pd.DataFrame(
        {"Missing_Count": missing, "Missing_Percentage": missing_pct}
    )
    missing_df = missing_df[missing_df["Missing_Count"] > 0].sort_values(
        "Missing_Count", ascending=False
    )

    if len(missing_df) > 0:
        print("\nColumns with missing values:")
        display(missing_df)
    else:
        print("\nâœ“ No missing values found!")

    return missing_df


missing_asia = analyze_missing_data(df_pc_asia, "ASIA")
missing_eu = analyze_missing_data(df_pc_eu, "EUROPE")

In [None]:
# Summary statistics
def summary_stats(df, region_name):
    """Calculate and display summary statistics for price data.

    Args:
        df: pandas DataFrame containing price data
        region_name: string name of the region for display purposes

    Returns:
        DataFrame with summary statistics including coefficient of variation
    """
    print(f"\n{'=' * 60}")
    print(f"{region_name} - Summary Statistics")
    print(f"{'=' * 60}")

    price_cols = [col for col in df.columns if col != "date"]
    stats = df[price_cols].describe().T
    stats["cv"] = stats["std"] / stats["mean"]  # Coefficient of variation

    display(
        stats.style.format("{:.2f}").background_gradient(
            cmap="YlOrRd", subset=["mean", "std"]
        )
    )

    return stats


stats_asia = summary_stats(df_pc_asia, "ASIA")
stats_eu = summary_stats(df_pc_eu, "EUROPE")

## 3. Temporal Trends Analysis

### 3.1 All Suppliers Over Time

In [None]:
# Plot all suppliers - Europe
fig_eu = go.Figure()

for col in df_pc_eu.columns:
    if col != "date":
        fig_eu.add_trace(
            go.Scatter(
                x=df_pc_eu["date"],
                y=df_pc_eu[col],
                mode="lines",
                name=col,
                line=dict(width=2),
            )
        )

fig_eu.update_layout(
    title="European PC Prices - All Suppliers",
    xaxis_title="Date",
    yaxis_title="Price (EUR/kg)",
    hovermode="x unified",
    height=500,
    template="plotly_white",
)

fig_eu.show()

In [None]:
# Plot all suppliers - Asia
fig_asia = go.Figure()

for col in df_pc_asia.columns:
    if col != "date":
        fig_asia.add_trace(
            go.Scatter(
                x=df_pc_asia["date"],
                y=df_pc_asia[col],
                mode="lines",
                name=col,
                line=dict(width=2),
            )
        )

fig_asia.update_layout(
    title="Asian PC Prices - All Suppliers",
    xaxis_title="Date",
    yaxis_title="Price (USD/kg)",
    hovermode="x unified",
    height=500,
    template="plotly_white",
)

fig_asia.show()

### 3.2 Best Price Analysis (Minimum Across Suppliers)

In [None]:
# Calculate best prices (minimum across suppliers)
price_cols_eu = [col for col in df_pc_eu.columns if col != "date"]
price_cols_asia = [col for col in df_pc_asia.columns if col != "date"]

df_pc_eu["best_price"] = df_pc_eu[price_cols_eu].min(axis=1)
df_pc_asia["best_price"] = df_pc_asia[price_cols_asia].min(axis=1)

# Also calculate mean and max for comparison
df_pc_eu["mean_price"] = df_pc_eu[price_cols_eu].mean(axis=1)
df_pc_eu["max_price"] = df_pc_eu[price_cols_eu].max(axis=1)

df_pc_asia["mean_price"] = df_pc_asia[price_cols_asia].mean(axis=1)
df_pc_asia["max_price"] = df_pc_asia[price_cols_asia].max(axis=1)

print("âœ“ Best price metrics calculated")

In [None]:
# Plot best price with confidence band
fig = make_subplots(
    rows=2,
    cols=1,
    subplot_titles=("European PC Prices", "Asian PC Prices"),
    vertical_spacing=0.12,
)

# Europe
fig.add_trace(
    go.Scatter(
        x=df_pc_eu["date"],
        y=df_pc_eu["max_price"],
        mode="lines",
        name="Max Price (EU)",
        line=dict(width=0),
        showlegend=False,
    ),
    row=1,
    col=1,
)

fig.add_trace(
    go.Scatter(
        x=df_pc_eu["date"],
        y=df_pc_eu["best_price"],
        mode="lines",
        name="Price Range",
        fill="tonexty",
        fillcolor="rgba(231, 76, 60, 0.2)",
        line=dict(width=0),
        showlegend=True,
    ),
    row=1,
    col=1,
)

fig.add_trace(
    go.Scatter(
        x=df_pc_eu["date"],
        y=df_pc_eu["mean_price"],
        mode="lines",
        name="Mean Price (EU)",
        line=dict(color="blue", width=2, dash="dot"),
    ),
    row=1,
    col=1,
)

fig.add_trace(
    go.Scatter(
        x=df_pc_eu["date"],
        y=df_pc_eu["best_price"],
        mode="lines+markers",
        name="Best Price (EU)",
        line=dict(color="green", width=3),
        marker=dict(size=4),
    ),
    row=1,
    col=1,
)

# Asia
fig.add_trace(
    go.Scatter(
        x=df_pc_asia["date"],
        y=df_pc_asia["max_price"],
        mode="lines",
        name="Max Price (Asia)",
        line=dict(width=0),
        showlegend=False,
    ),
    row=2,
    col=1,
)

fig.add_trace(
    go.Scatter(
        x=df_pc_asia["date"],
        y=df_pc_asia["best_price"],
        mode="lines",
        name="Price Range",
        fill="tonexty",
        fillcolor="rgba(231, 76, 60, 0.2)",
        line=dict(width=0),
        showlegend=False,
    ),
    row=2,
    col=1,
)

fig.add_trace(
    go.Scatter(
        x=df_pc_asia["date"],
        y=df_pc_asia["mean_price"],
        mode="lines",
        name="Mean Price (Asia)",
        line=dict(color="blue", width=2, dash="dot"),
    ),
    row=2,
    col=1,
)

fig.add_trace(
    go.Scatter(
        x=df_pc_asia["date"],
        y=df_pc_asia["best_price"],
        mode="lines+markers",
        name="Best Price (Asia)",
        line=dict(color="green", width=3),
        marker=dict(size=4),
    ),
    row=2,
    col=1,
)

fig.update_xaxes(title_text="Date", row=2, col=1)
fig.update_yaxes(title_text="Price (EUR/kg)", row=1, col=1)
fig.update_yaxes(title_text="Price (USD/kg)", row=2, col=1)

fig.update_layout(
    height=800, hovermode="x unified", template="plotly_white", showlegend=True
)

fig.show()

### 3.3 Year-over-Year Changes

In [None]:
# Calculate percentage changes
df_pc_eu["pct_change"] = df_pc_eu["best_price"].pct_change() * 100
df_pc_asia["pct_change"] = df_pc_asia["best_price"].pct_change() * 100

# Plot percentage changes
fig = make_subplots(
    rows=2,
    cols=1,
    subplot_titles=("Europe - Monthly % Change", "Asia - Monthly % Change"),
    vertical_spacing=0.12,
)

fig.add_trace(
    go.Bar(
        x=df_pc_eu["date"],
        y=df_pc_eu["pct_change"],
        name="EU % Change",
        marker_color=np.where(df_pc_eu["pct_change"] >= 0, "green", "red"),
    ),
    row=1,
    col=1,
)

fig.add_trace(
    go.Bar(
        x=df_pc_asia["date"],
        y=df_pc_asia["pct_change"],
        name="Asia % Change",
        marker_color=np.where(df_pc_asia["pct_change"] >= 0, "green", "red"),
    ),
    row=2,
    col=1,
)

fig.update_xaxes(title_text="Date", row=2, col=1)
fig.update_yaxes(title_text="% Change", row=1, col=1)
fig.update_yaxes(title_text="% Change", row=2, col=1)

fig.update_layout(height=700, showlegend=False, template="plotly_white")

fig.show()

# Summary statistics of changes
print("Europe - % Change Statistics:")
print(df_pc_eu["pct_change"].describe())
eu_max_date = df_pc_eu.loc[df_pc_eu["pct_change"].idxmax(), "date"]
print(f"\nLargest increase: {df_pc_eu['pct_change'].max():.2f}% on {eu_max_date}")
eu_min_date = df_pc_eu.loc[df_pc_eu["pct_change"].idxmin(), "date"]
print(f"Largest decrease: {df_pc_eu['pct_change'].min():.2f}% on {eu_min_date}")

print("\n" + "=" * 60)
print("Asia - % Change Statistics:")
print(df_pc_asia["pct_change"].describe())
asia_max_date = df_pc_asia.loc[df_pc_asia["pct_change"].idxmax(), "date"]
print(f"\nLargest increase: {df_pc_asia['pct_change'].max():.2f}% on {asia_max_date}")
asia_min_date = df_pc_asia.loc[df_pc_asia["pct_change"].idxmin(), "date"]
print(f"Largest decrease: {df_pc_asia['pct_change'].min():.2f}% on {asia_min_date}")

## 4. Distribution Analysis

In [None]:
# Distribution of best prices
fig = make_subplots(
    rows=1,
    cols=2,
    subplot_titles=("Europe Price Distribution", "Asia Price Distribution"),
    specs=[[{"type": "histogram"}, {"type": "histogram"}]],
)

fig.add_trace(
    go.Histogram(
        x=df_pc_eu["best_price"],
        name="EU",
        nbinsx=30,
        marker_color="steelblue",
        opacity=0.7,
    ),
    row=1,
    col=1,
)

fig.add_trace(
    go.Histogram(
        x=df_pc_asia["best_price"],
        name="Asia",
        nbinsx=30,
        marker_color="coral",
        opacity=0.7,
    ),
    row=1,
    col=2,
)

fig.update_xaxes(title_text="Price (EUR/kg)", row=1, col=1)
fig.update_xaxes(title_text="Price (USD/kg)", row=1, col=2)
fig.update_yaxes(title_text="Frequency", row=1, col=1)

fig.update_layout(height=400, showlegend=False, template="plotly_white")

fig.show()

In [None]:
# Box plots for supplier comparison
fig = make_subplots(
    rows=2,
    cols=1,
    subplot_titles=("European Suppliers", "Asian Suppliers"),
    vertical_spacing=0.15,
)

# Europe
for col in price_cols_eu:
    fig.add_trace(go.Box(y=df_pc_eu[col], name=col, boxmean="sd"), row=1, col=1)

# Asia
for col in price_cols_asia:
    fig.add_trace(go.Box(y=df_pc_asia[col], name=col, boxmean="sd"), row=2, col=1)

fig.update_yaxes(title_text="Price (EUR/kg)", row=1, col=1)
fig.update_yaxes(title_text="Price (USD/kg)", row=2, col=1)

fig.update_layout(height=800, showlegend=False, template="plotly_white")

fig.show()

## 5. Supplier Correlation Analysis

In [None]:
# Correlation heatmap - Europe
corr_eu = df_pc_eu[price_cols_eu].corr()

fig_corr_eu = go.Figure(
    data=go.Heatmap(
        z=corr_eu.values,
        x=corr_eu.columns,
        y=corr_eu.columns,
        colorscale="RdBu",
        zmid=0,
        text=corr_eu.values.round(2),
        texttemplate="%{text}",
        textfont={"size": 10},
        colorbar=dict(title="Correlation"),
    )
)

fig_corr_eu.update_layout(
    title="European Suppliers - Price Correlation Matrix",
    height=600,
    template="plotly_white",
)

fig_corr_eu.show()

In [None]:
# Correlation heatmap - Asia
corr_asia = df_pc_asia[price_cols_asia].corr()

fig_corr_asia = go.Figure(
    data=go.Heatmap(
        z=corr_asia.values,
        x=corr_asia.columns,
        y=corr_asia.columns,
        colorscale="RdBu",
        zmid=0,
        text=corr_asia.values.round(2),
        texttemplate="%{text}",
        textfont={"size": 10},
        colorbar=dict(title="Correlation"),
    )
)

fig_corr_asia.update_layout(
    title="Asian Suppliers - Price Correlation Matrix",
    height=600,
    template="plotly_white",
)

fig_corr_asia.show()

## 6. Seasonality & Decomposition Analysis

In [None]:
# Monthly patterns
df_pc_eu["month"] = df_pc_eu["date"].dt.month
df_pc_asia["month"] = df_pc_asia["date"].dt.month

# Box plot by month
fig = make_subplots(
    rows=1,
    cols=2,
    subplot_titles=("Europe - Monthly Patterns", "Asia - Monthly Patterns"),
)

fig.add_trace(
    go.Box(x=df_pc_eu["month"], y=df_pc_eu["best_price"], name="EU"), row=1, col=1
)

fig.add_trace(
    go.Box(x=df_pc_asia["month"], y=df_pc_asia["best_price"], name="Asia"), row=1, col=2
)

fig.update_xaxes(title_text="Month", row=1, col=1)
fig.update_xaxes(title_text="Month", row=1, col=2)
fig.update_yaxes(title_text="Price (EUR/kg)", row=1, col=1)
fig.update_yaxes(title_text="Price (USD/kg)", row=1, col=2)

fig.update_layout(height=400, showlegend=False, template="plotly_white")

fig.show()

In [None]:
# Time series decomposition
from statsmodels.tsa.seasonal import seasonal_decompose

# Europe decomposition
df_pc_eu_ts = df_pc_eu.set_index("date")[["best_price"]].dropna()
decomposition_eu = seasonal_decompose(
    df_pc_eu_ts["best_price"], model="additive", period=12
)

fig = make_subplots(
    rows=4,
    cols=1,
    subplot_titles=("Original", "Trend", "Seasonal", "Residual"),
    vertical_spacing=0.08,
    row_heights=[0.25, 0.25, 0.25, 0.25],
)

fig.add_trace(
    go.Scatter(
        x=df_pc_eu_ts.index,
        y=decomposition_eu.observed,
        name="Original",
        line=dict(color="blue"),
    ),
    row=1,
    col=1,
)
fig.add_trace(
    go.Scatter(
        x=df_pc_eu_ts.index,
        y=decomposition_eu.trend,
        name="Trend",
        line=dict(color="orange"),
    ),
    row=2,
    col=1,
)
fig.add_trace(
    go.Scatter(
        x=df_pc_eu_ts.index,
        y=decomposition_eu.seasonal,
        name="Seasonal",
        line=dict(color="green"),
    ),
    row=3,
    col=1,
)
fig.add_trace(
    go.Scatter(
        x=df_pc_eu_ts.index,
        y=decomposition_eu.resid,
        name="Residual",
        mode="markers",
        marker=dict(color="red", size=3),
    ),
    row=4,
    col=1,
)

fig.update_layout(
    height=900,
    title_text="Europe - Time Series Decomposition",
    showlegend=False,
    template="plotly_white",
)

fig.show()

In [None]:
# Asia decomposition
df_pc_asia_ts = df_pc_asia.set_index("date")[["best_price"]].dropna()
decomposition_asia = seasonal_decompose(
    df_pc_asia_ts["best_price"], model="additive", period=12
)

fig = make_subplots(
    rows=4,
    cols=1,
    subplot_titles=("Original", "Trend", "Seasonal", "Residual"),
    vertical_spacing=0.08,
    row_heights=[0.25, 0.25, 0.25, 0.25],
)

fig.add_trace(
    go.Scatter(
        x=df_pc_asia_ts.index,
        y=decomposition_asia.observed,
        name="Original",
        line=dict(color="blue"),
    ),
    row=1,
    col=1,
)
fig.add_trace(
    go.Scatter(
        x=df_pc_asia_ts.index,
        y=decomposition_asia.trend,
        name="Trend",
        line=dict(color="orange"),
    ),
    row=2,
    col=1,
)
fig.add_trace(
    go.Scatter(
        x=df_pc_asia_ts.index,
        y=decomposition_asia.seasonal,
        name="Seasonal",
        line=dict(color="green"),
    ),
    row=3,
    col=1,
)
fig.add_trace(
    go.Scatter(
        x=df_pc_asia_ts.index,
        y=decomposition_asia.resid,
        name="Residual",
        mode="markers",
        marker=dict(color="red", size=3),
    ),
    row=4,
    col=1,
)

fig.update_layout(
    height=900,
    title_text="Asia - Time Series Decomposition",
    showlegend=False,
    template="plotly_white",
)

fig.show()

## 7. Stationarity Tests

In [None]:
# Augmented Dickey-Fuller test
from statsmodels.tsa.stattools import adfuller, kpss


def test_stationarity(timeseries, title):
    """Perform ADF and KPSS tests for stationarity."""
    print(f"\n{'=' * 60}")
    print(f"{title}")
    print(f"{'=' * 60}")

    # ADF Test
    adf_result = adfuller(timeseries.dropna())
    print("\nðŸ“Š Augmented Dickey-Fuller Test:")
    print(f"   ADF Statistic: {adf_result[0]:.4f}")
    print(f"   p-value: {adf_result[1]:.4f}")
    print("   Critical Values:")
    for key, value in adf_result[4].items():
        print(f"      {key}: {value:.4f}")

    if adf_result[1] <= 0.05:
        print("   âœ“ Series is STATIONARY (reject H0)")
    else:
        print("   âœ— Series is NON-STATIONARY (fail to reject H0)")

    # KPSS Test
    kpss_result = kpss(timeseries.dropna(), regression="ct")
    print("\nðŸ“Š KPSS Test:")
    print(f"   KPSS Statistic: {kpss_result[0]:.4f}")
    print(f"   p-value: {kpss_result[1]:.4f}")
    print("   Critical Values:")
    for key, value in kpss_result[3].items():
        print(f"      {key}: {value:.4f}")

    if kpss_result[1] >= 0.05:
        print("   âœ“ Series is STATIONARY (fail to reject H0)")
    else:
        print("   âœ— Series is NON-STATIONARY (reject H0)")


test_stationarity(df_pc_eu_ts["best_price"], "EUROPE - Best Price Stationarity")
test_stationarity(df_pc_asia_ts["best_price"], "ASIA - Best Price Stationarity")

## 8. Key Findings & Summary

### Summary Statistics

This section will be populated after running the analysis. Key points to look for:

**Data Quality:**
- Missing value patterns
- Data consistency across suppliers
- Outlier detection

**Temporal Patterns:**
- Overall trend direction (increasing/decreasing)
- Volatility levels
- Structural breaks or regime changes

**Regional Differences:**
- Europe vs Asia price levels
- Synchronization of price movements
- Regional-specific patterns

**Supplier Dynamics:**
- Price dispersion (best vs worst)
- Supplier correlations (market integration)
- Consistent vs volatile suppliers

**Seasonality:**
- Monthly/quarterly patterns
- Predictable cycles
- Seasonal strength

**Modeling Implications:**
- Stationarity (need for differencing?)
- Transformation needs (log, etc.)
- Forecast horizon feasibility
- Feature engineering opportunities

---

### Next Steps

1. **Supply Chain EDA**: Analyze phenol/acetone shutdowns and capacity losses
2. **Energy Cost EDA**: Explore electricity price relationships
3. **Demand EDA**: Investigate automotive industry trends
4. **Cross-Dataset Analysis**: Identify lead-lag relationships between datasets
5. **Feature Engineering**: Create predictive features from all data sources
6. **Modeling**: Develop forecasting models for 3, 6, and 9-month horizons