# Wind Farm Data Preprocessing

This notebook uses Sphinx AI to clean and prepare wind farm data for the wake steering optimization algorithm.

## Objectives:
1. Load raw wind data
2. Clean and validate data
3. Extract relevant features (wind speed, direction, turbine positions)
4. Export cleaned data for use in optimization scripts

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Set style for visualizations
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)

print("Libraries imported successfully!")

## 1. Load Raw Data

Load your raw wind farm data here. This might include:
- Wind speed measurements
- Wind direction data
- Turbine performance data
- SCADA data from wind farm sensors

In [None]:
# Example: Load data from CSV file
# raw_data = pd.read_csv('data/raw_wind_data.csv')
# raw_data.head()

# For now, create sample data structure
# TODO: Replace with actual data loading
print("Ready to load raw data...")

## 2. Data Exploration

Use Sphinx AI here to explore the data:
- Check for missing values
- Identify outliers
- Understand data distributions
- Visualize key variables

In [None]:
# Data exploration goes here
# Ask Sphinx AI to help with:
# - data.info()
# - data.describe()
# - Missing value analysis
# - Distribution plots

pass

## 3. Data Cleaning

Clean the data by:
- Removing or imputing missing values
- Filtering out invalid measurements
- Standardizing units
- Handling outliers

In [None]:
# Data cleaning steps
# Use Sphinx AI to help clean the data

pass

## 4. Feature Extraction

Extract the specific features needed for FLORIS simulation:
- Wind speed (m/s)
- Wind direction (degrees)
- Turbine positions (x, y coordinates)
- Any other relevant parameters

In [None]:
# Feature extraction
# Create cleaned dataset with only the features needed for optimization

# Example structure:
cleaned_data = {
    'wind_speed': [],       # m/s
    'wind_direction': [],   # degrees
    'turbine_positions': [], # [(x1,y1), (x2,y2), ...]
}

# Convert to DataFrame
# cleaned_df = pd.DataFrame(cleaned_data)

pass

## 5. Data Validation

Validate the cleaned data:
- Check value ranges are reasonable
- Ensure no missing values remain
- Verify data types are correct

In [None]:
# Validation checks
# Example:
# assert cleaned_df['wind_speed'].min() >= 0, "Wind speed cannot be negative"
# assert cleaned_df['wind_direction'].between(0, 360).all(), "Wind direction must be 0-360 degrees"

pass

## 6. Export Cleaned Data

Save the cleaned data for use in the optimization scripts

In [None]:
# Export cleaned data
# Create data directory if it doesn't exist
Path('data/processed').mkdir(parents=True, exist_ok=True)

# Save to CSV
# cleaned_df.to_csv('data/processed/cleaned_wind_data.csv', index=False)

# Or save as pickle for Python objects
# cleaned_df.to_pickle('data/processed/cleaned_wind_data.pkl')

print("Data preprocessing complete!")
print("Cleaned data saved to: data/processed/")

## Summary Statistics

Display final summary of cleaned data

In [None]:
# Display summary statistics
# cleaned_df.describe()

pass