# Library Usage Demo - Replacing Notebook Code with Library Functions

This notebook demonstrates how to replace the manual data processing code in your experiment notebooks with the new library functions.


## Before: Manual Data Processing (from your original notebooks)

Your original notebooks had code like this:

```python
# Manual API call
import requests
import pandas as pd

LAT, LON = -33.8678, 151.2073
TZ = "Australia/Sydney"
START, END = "2016-01-01", "2024-12-31"

DAILY_VARS = [
    "weather_code", "temperature_2m_max", "temperature_2m_min",
    "precipitation_sum", "rain_sum", "wind_speed_10m_max", ...
]

url = f"https://archive-api.open-meteo.com/v1/archive?latitude={LAT}&longitude={LON}&start_date={START}&end_date={END}&daily={','.join(DAILY_VARS)}&timezone={TZ}"

response = requests.get(url)
data_rain = pd.DataFrame(response.json()["daily"])
data_rain["time"] = pd.to_datetime(data_rain["time"]).dt.date

# Manual target creation
data_rain["precip_3day_next"] = (
    data_rain["precipitation_sum"].shift(-1) + 
    data_rain["precipitation_sum"].shift(-2) + 
    data_rain["precipitation_sum"].shift(-3)
)

# Manual feature engineering
data_rain["month"] = data_rain["time"].dt.month
data_rain["year"] = data_rain["time"].dt.year
# ... many more manual steps
```


## After: Using the Library (Clean and Simple)

Now you can replace all that manual code with just a few lines:


In [None]:
# Import the local development version
import sys
import os
sys.path.insert(0, os.path.abspath('../src'))

# Import directly from the module to avoid conflicts with installed package
from brayam_pineda_ml.weather_data_processor import WeatherDataProcessor
import pandas as pd
import numpy as np


In [None]:
# Initialize processor
processor = WeatherDataProcessor(
    lat=-33.8678,
    lon=151.2073,
    timezone="Australia/Sydney"
)

# Run complete pipeline for regression
data = processor.process_full_pipeline(
    start_date="2016-01-01",
    end_date="2024-12-31",
    task_type="regression"
)

print("✅ Data processing complete!")
print(f"Training set: {data['X_train'].shape}")
print(f"Features: {len(data['feature_names'])}")
print(f"Target: {data['target_name']}")


## Step-by-Step Processing (Alternative Approach)

You can also run the pipeline step by step for more control:


In [None]:
# Step 1: Fetch data
df = processor.fetch_weather_data("2016-01-01", "2024-12-31")
print(f"Raw data shape: {df.shape}")

# Step 2: Create target
df = processor.create_regression_target(df, "precip_3day_next")
print(f"After target creation: {df.shape}")

# Step 3: Add temporal features
df = processor.add_temporal_features(df)
print(f"After temporal features: {df.shape}")

# Step 4: Create lag features
df = processor.create_lag_features(df, "precip_3day_next", [1, 2])
print(f"After lag features: {df.shape}")

# Step 5: Create rolling features
df = processor.create_rolling_features(df, [3, 7, 14, 30])
print(f"After rolling features: {df.shape}")

# Step 6: Create advanced features
df = processor.create_advanced_features(df)
print(f"After advanced features: {df.shape}")

# Step 7: Encode categorical features
df = processor.encode_categorical_features(df, ["weather_code", "season"])
print(f"After encoding: {df.shape}")

# Step 8: Split data
X_train, X_val, X_test, y_train, y_val, y_test = processor.split_time_series_data(df, "precip_3day_next")
print(f"Split complete - Train: {X_train.shape}, Val: {X_val.shape}, Test: {X_test.shape}")

# Step 9: Impute missing values
X_train, X_val, X_test = processor.impute_missing_values(X_train, X_val, X_test)
print("Missing values imputed")

# Step 10: Scale features
X_train, X_val, X_test = processor.scale_features(X_train, X_val, X_test)
print("Features scaled")


## Classification Example


In [None]:
# For classification tasks, use the classification target
df_class = processor.fetch_weather_data("2016-01-01", "2024-12-31")
df_class = processor.create_classification_target(df_class, "target_rain", threshold=0.1, horizon_days=7)

# Run the rest of the pipeline
data_class = processor.process_full_pipeline(
    start_date="2016-01-01",
    end_date="2024-12-31",
    task_type="classification"
)

print(f"Classification data ready!")
print(f"Training set: {data_class['X_train'].shape}")
print(f"Class distribution: {data_class['y_train'].value_counts().to_dict()}")


## Benefits of Using the Library

1. **Code Reduction**: Replace 100+ lines of manual processing with 5-10 lines
2. **Consistency**: Same processing logic across all experiments
3. **Maintainability**: Changes to processing logic only need to be made in one place
4. **Reusability**: Easy to use the same processing for new experiments
5. **Documentation**: Well-documented functions with clear parameters
6. **Error Handling**: Built-in validation and error checking
7. **Flexibility**: Can use full pipeline or individual steps as needed
