# Bayesian Rainfall Analysis - Eugene, OR

This notebook demonstrates a simplified Bayesian rainfall model for Eugene, Oregon using historical weather data from 2019-2024.

## Recent Updates

This notebook has been updated to reflect recent simplifications to the codebase:
- Removed complex posterior predictive sampling with `pm.set_data()`
- Simplified function signatures (no longer require `model` parameter)
- Direct evaluation approach for all predictions
- Same analytical capabilities with cleaner, more efficient code

The model predicts both the probability of rain and the amount of rainfall for any day of the year based on seasonal patterns.


In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import pymc as pm
import numpy as np
import pandas as pd

In [3]:
import bayesian_rainfall as br

In [4]:
data = br.model.load_data("../data/noaa_historical_weather_eugene_or_2019-2024.csv")
data.head()

Unnamed: 0,DATE,PRCP,day_of_year
0,2019-01-01,0.0,1
1,2019-01-02,0.0,2
2,2019-01-03,0.0,3
3,2019-01-04,0.3,4
4,2019-01-05,4.6,5


In [5]:
model = br.model.create_rainfall_model(data)
model

         a_rain ~ Normal(0, 1)
         b_rain ~ Normal(0, 1)
         c_rain ~ Normal(0, 1)
       a_amount ~ Normal(0, 1)
       b_amount ~ Normal(0, 1)
       c_amount ~ Normal(1, 1)
   alpha_amount ~ Gamma(2, f())
         p_rain ~ Deterministic(f(c_rain, b_rain, a_rain))
 rain_indicator ~ Bernoulli(p_rain)
rainfall_amount ~ Gamma(alpha_amount, f(alpha_amount, c_amount, b_amount, a_amount))

In [None]:
%%time
trace = br.model.sample_model(model)
trace


Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [a_rain, b_rain, c_rain, a_amount, b_amount, c_amount, alpha_amount]


Output()

In [None]:
trace.posterior.a_rain.values.flatten().shape

In [None]:
trace.posterior.a_amount.shape

## Simplified Model Approach

With our recent simplifications, we no longer need complex posterior predictive sampling with `pm.set_data()`. Instead, we can directly evaluate the model for any day using our simplified functions. This is much more efficient since we have data for every day of the year.


In [None]:
# Test posterior predictive sampling using our simplified function
# This demonstrates how to get predictions for specific days
rain_indicators, rainfall_amounts = br.analysis.sample_posterior_predictive_for_day(trace, 15, n_samples=100)
print(f"Rain indicators shape: {rain_indicators.shape}")
print(f"Rainfall amounts shape: {rainfall_amounts.shape}")
print(f"Sample rain frequency: {rain_indicators.mean():.3f}")
print(f"Sample mean rainfall: {rainfall_amounts.mean():.3f} mm")

In [None]:
br.analysis.print_model_summary(trace, data)

In [None]:
trace.posterior

In [None]:
# Trace plots to check MCMC sampling
br.visualizations.plot_trace(trace)


In [None]:
# Combined visualization: Rain probability and amount predictions with both CIs
# Note: The model parameter is no longer needed with our simplified approach
br.visualizations.plot_combined_predictions(trace, data)


In [None]:
# Posterior predictive checks
# Note: The model parameter is no longer needed with our simplified approach
br.visualizations.plot_posterior_predictive_checks(trace, data)


In [None]:
# Plot observed vs predicted rainfall distributions for specific days
br.visualizations.plot_specific_days_comparison(trace, data)

In [None]:
# Seasonal comparison: Observed vs Predicted distributions by season
br.visualizations.plot_seasonal_summaries(trace, data)


# Single Day Analysis Examples

The following examples demonstrate the new single day analysis functions that provide comprehensive analysis of model predictions for specific days of the year.


In [None]:
# Example 1: Comprehensive analysis for a specific day
# You can specify the date in multiple ways:
# - Day of year: 15 (January 15)
# - String format: "01/15" 
# - Tuple format: (1, 15)

# Using day of year
results_jan15 = br.analysis.analyze_single_day(
    trace=trace, 
    data=data, 
    date_input=15,  # Day 15 (January 15)
    show_plots=True
)


In [None]:
# Example 2: Analysis using string format (July 15)
# Summer days typically have lower rain probability
results_jul15 = br.analysis.analyze_single_day(
    trace=trace, 
    data=data, 
    date_input="09/14",  # July 15
    show_plots=True
)


In [None]:
import matplotlib.pyplot as plt

## Probability Analysis Examples

These functions allow you to calculate specific probabilities for rainfall events on given days.


In [None]:
# Example 3: Calculate probability of any rain on a given day
# This is useful for planning outdoor activities
# Using tuple format (month, day)
any_rain_prob = br.analysis.print_any_rain_probability(
    trace=trace, 
    date_input="09/14"
)


In [None]:
# Example 4: Calculate probability of rainfall within various intervals
# This is useful for agricultural planning or flood risk assessment

# Any rainfall (no bounds) - equivalent to calculate_any_rain_probability
any_rainfall_prob = br.analysis.print_rainfall_interval_probability(
    trace=trace, 
    date_input="09/14",
    interval_min=1.0,
    # interval_max=3.0,
)


In [None]:
# Example usage:
br.analysis.print_simple_daily_rainfall_analysis(trace, date_input="09/14")


In [None]:
# Example 5: Compare different days of the year
# Let's analyze a few different days to see seasonal patterns
# Mix of different input formats to show flexibility

days_to_analyze = [
    15,           # day of year (January 15)
    "04/10",      # string format (April 10)
    (7, 19),      # tuple format (July 19)
    300           # day of year (October 27)
]

print("SEASONAL COMPARISON OF RAIN PROBABILITIES")
print("=" * 60)

for date_input in days_to_analyze:
    rain_prob = br.analysis.calculate_any_rain_probability(trace, date_input)
    print(f"{rain_prob['day_name']:25} | P(rain) = {rain_prob['mean_probability']:.3f} ± {rain_prob['std_probability']:.3f}")


# Weekly Rain Probability Analysis

This section shows the chance of any rain each week throughout the year, providing a more granular view of seasonal patterns than monthly analysis.


## Key Simplifications Made

The model has been simplified to remove unnecessary complexity around posterior predictive sampling:

1. **No more `pm.set_data()`**: We can directly evaluate the model for any day since we have data for every day of the year
2. **Simplified function signatures**: Functions like `plot_combined_predictions()` no longer require the `model` parameter
3. **Direct evaluation**: All predictions use direct evaluation of the posterior samples rather than complex PyMC posterior predictive sampling
4. **Same functionality**: All analysis capabilities are preserved but with cleaner, more efficient code

This makes the code easier to understand and maintain while providing the same analytical capabilities.


In [None]:
# Demonstration of simplified prediction approach
# We can now easily get predictions for any day without complex PyMC setup

# Get predictions for multiple days
days_to_test = [15, 100, 200, 300]  # January 15, April 10, July 19, October 27

print("SIMPLIFIED PREDICTION APPROACH")
print("=" * 50)
for day in days_to_test:
    # Get rain probability and expected amount directly
    rain_probs, expected_amounts, alpha_amounts = br.analysis._evaluate_model_for_day(trace, day)
    
    # Calculate statistics
    mean_rain_prob = rain_probs.mean()
    mean_expected_amount = expected_amounts.mean()
    
    # Get day name
    from datetime import datetime
    date_obj = datetime(2024, 1, 1) + pd.Timedelta(days=day - 1)
    day_name = date_obj.strftime("%B %d")
    
    print(f"{day_name:12} (Day {day:3d}): P(rain) = {mean_rain_prob:.3f}, Expected amount = {mean_expected_amount:.2f} mm")

print("\nThis approach is much simpler and more efficient than the previous method!")


In [None]:
# Plot weekly rain probability throughout the year
weekly_results = br.visualizations.plot_weekly_rain_probability(trace, data)
