# Wind Farm Data Preprocessing

This notebook uses Sphinx AI to clean and prepare wind farm data for the wake steering optimization algorithm.

## Objectives:
1. Load raw wind data
2. Clean and validate data
3. Extract relevant features (wind speed, direction, turbine positions)
4. Export cleaned data for use in optimization scripts

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Set style for visualizations
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)

print("Libraries imported successfully!")

## 1. Load Raw Data

Load your raw wind farm data here. This might include:
- Wind speed measurements
- Wind direction data
- Turbine performance data
- SCADA data from wind farm sensors

In [None]:
# Example: Load data from CSV file
# raw_data = pd.read_csv('data/raw_wind_data.csv')
# raw_data.head()

# For now, create sample data structure
# TODO: Replace with actual data loading
print("Ready to load raw data...")

## 1.5 Determine Optimal Yaw Range (Sphinx AI Analysis)

**Ask Sphinx AI to analyze:**

Based on the wind data, determine the optimal yaw angle range for wake steering optimization:

### Factors to consider:
1. **Wind Speed Distribution**
   - Higher wind speeds ‚Üí smaller yaw angles work better (more power to lose)
   - Lower wind speeds ‚Üí can use larger yaw angles (less power to sacrifice)
   
2. **Wind Direction Variability**
   - If wind direction is highly variable, wider yaw range may be needed
   - Steady winds ‚Üí narrower range is sufficient
   
3. **Turbulence Intensity**
   - High turbulence ‚Üí smaller yaw angles (wake already disperses quickly)
   - Low turbulence ‚Üí larger yaw angles beneficial (wake persists longer)

4. **Power Loss vs Gain Trade-off**
   - Yawed turbine loses power proportional to cos¬≥(yaw_angle)
   - At 10¬∞ yaw, you lose ~5% power
   - At 15¬∞ yaw, you lose ~11% power
   - At 20¬∞ yaw, you lose ~19% power

### Typical Ranges:
- **Conservative**: ¬±5¬∞ (11 options, 14,641 combos for 4 turbines)
- **Moderate**: ¬±10¬∞ (21 options, 194,481 combos for 4 turbines)
- **Aggressive**: ¬±15¬∞ (31 options, 923,521 combos for 4 turbines)

### Questions for Sphinx AI:
1. What is the average wind speed in the dataset?
2. What is the standard deviation of wind speed?
3. What percentage of time is wind speed between 6-10 m/s? (optimal range for wake steering)
4. What is the average turbulence intensity?
5. Based on these factors, what yaw range would maximize gains while keeping computation reasonable?

In [None]:
# Example: Use helper functions to get yaw range recommendation
from yaw_range_helper import recommend_yaw_range, print_recommendation_summary, plot_yaw_range_analysis

# Get recommendation based on wind statistics
recommendation = recommend_yaw_range(
    wind_speed_mean=wind_stats['mean'],
    wind_speed_std=wind_stats['std'],
    turbulence_intensity_mean=ti_stats['mean'],
    max_power_loss_pct=5.0,      # Maximum acceptable power loss
    max_computation_minutes=5.0   # Maximum computation time
)

# Print detailed recommendation
print_recommendation_summary(recommendation)

# Visualize trade-offs
fig = plot_yaw_range_analysis(recommendation['all_options'])
plt.savefig('figures/yaw_range_analysis.png', dpi=150, bbox_inches='tight')
plt.show()

# Update config.py with recommended range
print(f"\nüìù UPDATE config.py with:")
print(f"YAW_ANGLE_MIN = -{recommendation['recommended_yaw_range']}")
print(f"YAW_ANGLE_MAX = {recommendation['recommended_yaw_range']}")


In [None]:
# Analyze wind data with Sphinx AI to determine optimal yaw range
# 
# Instructions for Sphinx AI:
# 1. Calculate wind speed statistics (mean, std, percentiles)
# 2. Analyze turbulence intensity distribution
# 3. Check wind direction variability
# 4. Recommend yaw range based on:
#    - Power loss tolerance (e.g., max 5% loss per turbine)
#    - Computational budget (prefer <100k combinations)
#    - Wind speed distribution (most frequent speeds)
#
# Example analysis code:

# Assuming you have wind data loaded as 'wind_data' with columns:
# - 'wind_speed' (m/s)
# - 'wind_direction' (degrees)
# - 'turbulence_intensity' (decimal)

# wind_speed_mean = wind_data['wind_speed'].mean()
# wind_speed_std = wind_data['wind_speed'].std()
# ti_mean = wind_data['turbulence_intensity'].mean()

# # Power loss calculation for different yaw angles
# yaw_angles = np.array([5, 10, 15, 20, 25])
# power_loss_pct = (1 - np.cos(np.radians(yaw_angles))**3) * 100

# print("Power Loss by Yaw Angle:")
# for angle, loss in zip(yaw_angles, power_loss_pct):
#     print(f"  {angle}¬∞: {loss:.1f}% power loss")

# # Recommendation based on wind speed
# if wind_speed_mean > 10:
#     recommended_range = 5
#     print(f"\nRecommended: ¬±{recommended_range}¬∞ (high wind speeds)")
# elif wind_speed_mean > 8:
#     recommended_range = 8
#     print(f"\nRecommended: ¬±{recommended_range}¬∞ (moderate wind speeds)")
# else:
#     recommended_range = 10
#     print(f"\nRecommended: ¬±{recommended_range}¬∞ (lower wind speeds)")

# # Calculate number of combinations
# n_turbines = 4
# n_angles = 2 * recommended_range + 1
# n_combinations = n_angles ** n_turbines
# print(f"Total combinations to test: {n_combinations:,}")

print("Ready for Sphinx AI analysis...")

## 2. Data Exploration

Use Sphinx AI here to explore the data:
- Check for missing values
- Identify outliers
- Understand data distributions
- Visualize key variables

In [None]:
# Data exploration goes here
# Ask Sphinx AI to help with:
# - data.info()
# - data.describe()
# - Missing value analysis
# - Distribution plots

pass

## 3. Data Cleaning

Clean the data by:
- Removing or imputing missing values
- Filtering out invalid measurements
- Standardizing units
- Handling outliers

In [None]:
# Data cleaning steps
# Use Sphinx AI to help clean the data

pass

## 4. Feature Extraction

Extract the specific features needed for FLORIS simulation:
- Wind speed (m/s)
- Wind direction (degrees)
- Turbine positions (x, y coordinates)
- Any other relevant parameters

In [None]:
# Feature extraction
# Create cleaned dataset with only the features needed for optimization

# Example structure:
cleaned_data = {
    'wind_speed': [],       # m/s
    'wind_direction': [],   # degrees
    'turbine_positions': [], # [(x1,y1), (x2,y2), ...]
}

# Convert to DataFrame
# cleaned_df = pd.DataFrame(cleaned_data)

pass

## 5. Data Validation

Validate the cleaned data:
- Check value ranges are reasonable
- Ensure no missing values remain
- Verify data types are correct

In [None]:
# Validation checks
# Example:
# assert cleaned_df['wind_speed'].min() >= 0, "Wind speed cannot be negative"
# assert cleaned_df['wind_direction'].between(0, 360).all(), "Wind direction must be 0-360 degrees"

pass

## 6. Export Cleaned Data

Save the cleaned data for use in the optimization scripts

In [None]:
# Export cleaned data
# Create data directory if it doesn't exist
Path('data/processed').mkdir(parents=True, exist_ok=True)

# Save to CSV
# cleaned_df.to_csv('data/processed/cleaned_wind_data.csv', index=False)

# Or save as pickle for Python objects
# cleaned_df.to_pickle('data/processed/cleaned_wind_data.pkl')

print("Data preprocessing complete!")
print("Cleaned data saved to: data/processed/")

## Summary Statistics

Display final summary of cleaned data

In [None]:
# Display summary statistics
# cleaned_df.describe()

pass

In [1]:
# Load historical NREL wind data (example file and columns) - adjust if actual file/column names differ
import pandas as pd
from pathlib import Path

# Assume possible data file path
nrel_data_path = Path('data/NREL_historical_wind_data.csv')

if nrel_data_path.exists():
    nrel_data = pd.read_csv(nrel_data_path)
    print('Loaded NREL data. Columns:', nrel_data.columns.tolist())
    print(nrel_data.head())
    # Filter for similar conditions
    similar = nrel_data[
        (nrel_data['wind_speed'].between(7.5, 9.5)) &
        (nrel_data['wind_direction'].between(260, 280)) &
        (nrel_data['turbulence_intensity'].between(0.05, 0.09))
    ]
    print(f"Found {len(similar)} rows with similar historical conditions.")
    display(similar.head())
else:
    print("NREL data file not found. Please provide the historical wind data file at data/NREL_historical_wind_data.csv.")

NREL data file not found. Please provide the historical wind data file at data/NREL_historical_wind_data.csv.
