# OMIE Silver Layer Data Processing for Microsoft Fabric

This notebook transforms Bronze layer OMIE data into clean, enriched Silver layer datasets following Medallion architecture principles.

## Silver Layer Objectives
- **Clean**: Remove duplicates, handle nulls, fix anomalies
- **Standardize**: Consistent formats, units, and schemas  
- **Enrich**: Add calculated fields and classifications
- **Validate**: Apply business rules and quality checks
- **Integrate**: Combine data sources into unified models

## Data Flow
**Bronze** → **Silver** → Gold
- Bronze: Raw OMIE files organized by year (2023, 2024, 2025)
- Silver: Cleaned and enriched datasets
- Gold: Aggregated tables for Power BI

## Key Transformations
1. **Data Quality**: Anomaly detection, null handling
2. **Energy Classifications**: Renewable vs conventional technology mapping
3. **Temporal Features**: Hour/day/season enrichment
4. **Carbon Metrics**: CO2 emissions calculation
5. **Market Indicators**: Price volatility, renewable penetration

In [None]:
# Install required packages if not available
import sys
import subprocess
import importlib

reqs = ['pandas', 'numpy', 'pyarrow', 'matplotlib', 'seaborn']
missing = [p for p in reqs if importlib.util.find_spec(p) is None]
if missing:
    print('Installing missing packages:', missing)
    subprocess.check_call([sys.executable, '-m', 'pip', 'install', *missing])
    print('Installed packages successfully')
else:
    print('All required packages are already installed')

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
from pathlib import Path
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# PySpark for Fabric environment
try:
    from pyspark.sql import SparkSession
    from pyspark.sql.functions import *
    from pyspark.sql.types import *
    spark = SparkSession.builder.getOrCreate()
    print('✅ SparkSession available - using Fabric environment')
    USE_SPARK = True
except Exception as e:
    print('⚠️ PySpark not available - using pandas for local processing')
    USE_SPARK = False

# Configuration
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
plt.style.use('default')
sns.set_palette('husl')

print(f'🐍 Python version: {sys.version.split()[0]}')
print(f'🐼 Pandas version: {pd.__version__}')
print(f'📊 Processing mode: {"Spark" if USE_SPARK else "Pandas"}')