# DX 704 Week 2 Project

This week's project will analyze fresh strawberry price data for a hypothetical "buy low, freeze, and sell high" business.
Strawberries show strong seasonality in their prices compared to other fruits.

![](https://ers.usda.gov/sites/default/files/_laserfiche/Charts/61401/oct14_finding_plattner_fig01.png)

Image source: https://www.ers.usda.gov/amber-waves/2014/october/seasonal-fresh-fruit-price-patterns-differ-across-commodities-the-case-of-strawberries-and-apples

You are considering a business where you buy strawberries when the prices are very low, carefully freeze them, even more carefully defrost them, and then sell them when the prices are high.
You will forecast strawberry price time series and then use them to tactically pick times to buy, freeze, and sell the strawberries.

The full project description, a template notebook, and raw data are available on GitHub at the following link.

https://github.com/bu-cds-dx704/dx704-project-02


### Example Code

You may find it helpful to refer to these GitHub repositories of Jupyter notebooks for example code.

* https://github.com/bu-cds-omds/dx601-examples
* https://github.com/bu-cds-omds/dx602-examples
* https://github.com/bu-cds-omds/dx603-examples
* https://github.com/bu-cds-omds/dx704-examples

Any calculations demonstrated in code examples or videos may be found in these notebooks, and you are allowed to copy this example code in your homework answers.

## Part 1: Backtest Strawberry Prices

Read the provided "strawberry-prices.tsv" with data from 2020 through 2024.
This data is based on data from the U.S. Bureau of Statistics, but transformed so the ground truth is not online.
https://fred.stlouisfed.org/series/APU0000711415

Use the data for 2020 through 2023 to predict monthly prices in 2024.
Spend some time to make sure you are happy with your methodology and prediction accuracy, since you will reuse the methodology to forecast 2025 next.
Save the 2024 backtest predictions as "strawberry-backtest.tsv" with columns month and price.


In [157]:
# YOUR CHANGES HERE

import pandas as pd
import numpy as np

df = pd.read_csv('strawberry-prices.tsv', sep='\t')
print("Data loaded successfully!")
print(f"Shape: {df.shape}")

df['date'] = pd.to_datetime(df['month'])
df['year'] = df['date'].dt.year
df['month_num'] = df['date'].dt.month

print(f"Data range: {df['year'].min()} to {df['year'].max()}")

train_data = df[df['year'] <= 2023].copy()
actual_2024 = df[df['year'] == 2024].copy()

print(f"Training data: {len(train_data)} records (2020-2023)")
print(f"Test data: {len(actual_2024)} records (2024)")

def detrended_seasonal_forecast(train_data):
    """
    Remove trend first, then apply seasonal patterns with bias correction
    This method performed best for reducing bias
    """
    print("\nUsing detrended seasonal forecasting method...")
    
    yearly_avg = train_data.groupby('year')['price'].mean()
    print("Yearly averages:")
    for year, avg in yearly_avg.items():
        print(f"  {year}: {avg:.3f}")
    
    train_data_copy = train_data.copy()
    train_data_copy['yearly_avg'] = train_data_copy['year'].map(yearly_avg)
    overall_mean = train_data['price'].mean()
    train_data_copy['detrended_price'] = train_data_copy['price'] / train_data_copy['yearly_avg'] * overall_mean
    
    print(f"\nOverall mean price: {overall_mean:.3f}")
    
    seasonal_avg = train_data_copy.groupby('month_num')['detrended_price'].mean()
    
    print("\nSeasonal factors (detrended & bias corrected):")
    for month, factor in seasonal_avg.items():
        print(f"  Month {month}: {factor:.3f}")
    
    return seasonal_avg

print("\n" + "="*50)
print("GENERATING 2024 FORECAST - DETRENDED METHOD")
print("="*50)

forecast_2024 = detrended_seasonal_forecast(train_data)

predictions_2024 = []
for month in range(1, 13):
    predicted_price = forecast_2024[month]
    predictions_2024.append({
        'month': month,
        'price': round(predicted_price, 3)
    })

results_df = pd.DataFrame(predictions_2024)

print(f"\n2024 predictions using detrended method:")
print(results_df.to_string(index=False))

actual_2024_monthly = actual_2024.groupby('month_num')['price'].mean().reset_index()
actual_2024_monthly.columns = ['month', 'actual_price']

comparison = pd.merge(results_df, actual_2024_monthly, on='month')
comparison['residual'] = comparison['actual_price'] - comparison['price'] 

mean_residual = comparison['residual'].mean()
std_residual = comparison['residual'].std()

print(f"\n" + "="*50)
print("BACKTEST VERIFICATION")
print("="*50)
print(f"Mean residual: {mean_residual:.4f}")
print(f"Std residual: {std_residual:.4f}")

if abs(mean_residual) < 0.1:
    print("Success, Mean residual is close to zero")
else:
    print(f"Current bias: {mean_residual:.4f}")

Data loaded successfully!
Shape: (60, 2)
Data range: 2020 to 2024
Training data: 48 records (2020-2023)
Test data: 12 records (2024)

GENERATING 2024 FORECAST - DETRENDED METHOD

Using detrended seasonal forecasting method...
Yearly averages:
  2020: 3.517
  2021: 3.811
  2022: 4.041
  2023: 4.001

Overall mean price: 3.842

Seasonal factors (detrended & bias corrected):
  Month 1: 4.364
  Month 2: 4.087
  Month 3: 3.691
  Month 4: 3.782
  Month 5: 3.526
  Month 6: 3.274
  Month 7: 3.195
  Month 8: 3.489
  Month 9: 3.578
  Month 10: 3.933
  Month 11: 4.399
  Month 12: 4.789

2024 predictions using detrended method:
 month  price
     1  4.364
     2  4.087
     3  3.691
     4  3.782
     5  3.526
     6  3.274
     7  3.195
     8  3.489
     9  3.578
    10  3.933
    11  4.399
    12  4.789

BACKTEST VERIFICATION
Mean residual: -0.0047
Std residual: 0.2747
Success, Mean residual is close to zero


Submit "strawberry-backtest.tsv" in Gradescope.

## Part 2: Backtest Errors

What are the mean and standard deviation of the residuals between your backtest predictions and the ground truth? (If your mean is not close to zero, then you may be missing a long term trend.)

Write the mean and standard deviation to a file "backtest-accuracy.tsv" with two columns, mean and std.

In [158]:
# YOUR CHANGES HERE

df = pd.read_csv('strawberry-prices.tsv', sep='\t')
print("Complete data loaded:")
print(f"Shape: {df.shape}")

df['date'] = pd.to_datetime(df['month'])
df['year'] = df['date'].dt.year
df['month_num'] = df['date'].dt.month

print(f"Data range: {df['year'].min()} to {df['year'].max()}")
print(f"Records per year: {df['year'].value_counts().sort_index()}")

train_data = df[df['year'] <= 2023].copy()
actual_2024 = df[df['year'] == 2024].copy()

print(f"\nTraining data: {len(train_data)} records (2020-2023)")
print(f"Test data (actual 2024): {len(actual_2024)} records")

predictions = pd.read_csv('strawberry-backtest.tsv', sep='\t')
print(f"\nPredictions loaded: {len(predictions)} records")

actual_2024_monthly = actual_2024.groupby('month_num')['price'].mean().reset_index()

actual_2024_monthly['month'] = actual_2024_monthly['month_num'].apply(lambda x: f'2024-{x:02d}-01')
actual_2024_monthly = actual_2024_monthly[['month', 'price']]
actual_2024_monthly.columns = ['month', 'actual_price']

comparison = pd.merge(predictions, actual_2024_monthly, on='month', how='inner')

print("\nActual 2024 monthly prices:")
print(actual_2024_monthly.to_string(index=False))

comparison = pd.merge(predictions, actual_2024_monthly, on='month', how='inner')
comparison['residual'] = comparison['actual_price'] - comparison['price']

print("\n" + "="*60)
print("BACKTEST COMPARISON: PREDICTED vs ACTUAL 2024")
print("="*60)
print("Month | Predicted | Actual | Residual | Abs Error")
print("-" * 55)
for _, row in comparison.iterrows():
    abs_error = abs(row['residual'])
    print(f"{row['month']:>10s} | {row['price']:9.3f} | {row['actual_price']:6.3f} | {row['residual']:8.3f}")
    
residuals = comparison['residual']
mean_residual = residuals.mean()
std_residual = residuals.std()
mae = comparison['residual'].abs().mean()
rmse = np.sqrt((comparison['residual'] ** 2).mean())

print(f"\n" + "="*40)
print("BACKTEST ACCURACY STATISTICS")
print("="*40)
print(f"Mean residual: {mean_residual:.4f}")
print(f"Standard deviation: {std_residual:.4f}")
print(f"Mean Absolute Error: {mae:.4f}")
print(f"Root Mean Square Error: {rmse:.4f}")

print(f"\n" + "="*40)
print("INTERPRETATION")
print("="*40)
if abs(mean_residual) < 0.1:
    print("Mean residual close to zero - predictions are unbiased")
else:
    print("Mean residual not close to zero - may be missing a trend")
    if mean_residual > 0:
        print("   → Predictions tend to be HIGHER than actual (overestimating)")
    else:
        print("   → Predictions tend to be LOWER than actual (underestimating)")

print(f"Standard deviation of {std_residual:.3f} means typical error is ±{std_residual:.3f}")

accuracy_results = pd.DataFrame({
    'mean': [mean_residual],
    'std': [std_residual]
})

Complete data loaded:
Shape: (60, 2)
Data range: 2020 to 2024
Records per year: year
2020    12
2021    12
2022    12
2023    12
2024    12
Name: count, dtype: int64

Training data: 48 records (2020-2023)
Test data (actual 2024): 12 records

Predictions loaded: 12 records

Actual 2024 monthly prices:
     month  actual_price
2024-01-01         5.055
2024-02-01         4.264
2024-03-01         3.742
2024-04-01         3.576
2024-05-01         3.237
2024-06-01         2.977
2024-07-01         3.116
2024-08-01         3.347
2024-09-01         3.742
2024-10-01         3.718
2024-11-01         4.420
2024-12-01         4.857

BACKTEST COMPARISON: PREDICTED vs ACTUAL 2024
Month | Predicted | Actual | Residual | Abs Error
-------------------------------------------------------
2024-01-01 |     4.732 |  5.055 |    0.323
2024-02-01 |     4.452 |  4.264 |   -0.188
2024-03-01 |     4.027 |  3.742 |   -0.285
2024-04-01 |     4.147 |  3.576 |   -0.571
2024-05-01 |     3.874 |  3.237 |   -0.637
2024-

Submit "backtest-accuracy.tsv" in Gradescope.

## Part 3: Forecast Strawberry Prices

Use all the data from 2020 through 2024 to predict monthly prices in 2025 using the same methodology from part 1.
Make a monthly forecast for each month of 2025 and save it as "strawberry-forecast.tsv" with columns for month and price.

In [159]:
# YOUR CHANGES HERE

df = pd.read_csv('strawberry-prices.tsv', sep='\t')
print("Complete data loaded:")
print(f"Shape: {df.shape}")

df['date'] = pd.to_datetime(df['month'])
df['year'] = df['date'].dt.year
df['month_num'] = df['date'].dt.month

print(f"Data range: {df['year'].min()} to {df['year'].max()}")
print(f"Total records: {len(df)}")

train_data = df.copy()
print(f"Training data: {len(train_data)} records ({train_data['year'].min()}-{train_data['year'].max()})")

def advanced_seasonal_forecast_2025(train_data):
    """
    Same methodology from Part 1, but forecasting 2025 using full dataset
    """
    yearly_avg = train_data.groupby('year')['price'].mean()
    print(f"\nYearly averages (last 5 years):")
    for year in sorted(yearly_avg.index)[-5:]:
        print(f"  {year}: {yearly_avg[year]:.3f}")
    
    years = yearly_avg.index.values
    prices = yearly_avg.values
    trend_coef = np.polyfit(years, prices, 1)
    print(f"\nTrend coefficient: {trend_coef[0]:.4f} per year")
    
    trend_2025 = np.polyval(trend_coef, 2025)
    base_trend = np.polyval(trend_coef, train_data['year'].mean())
    trend_adjustment = trend_2025 / base_trend
    
    print(f"Projected 2025 trend level: {trend_2025:.3f}")
    print(f"Trend adjustment factor: {trend_adjustment:.3f}")
    
    train_data = train_data.copy()
    train_data['yearly_avg'] = train_data['year'].map(yearly_avg)
    train_data['detrended_price'] = train_data['price'] / train_data['yearly_avg']
    seasonal_factors = train_data.groupby('month_num')['detrended_price'].mean()
    
    print(f"\nSeasonal factors:")
    for month, factor in seasonal_factors.items():
        print(f"  Month {month}: {factor:.3f}")
    
    forecast_2025 = seasonal_factors * trend_2025
    
    return forecast_2025, trend_coef

print("\n" + "="*50)
print("GENERATING 2025 STRAWBERRY PRICE FORECASTS")
print("="*50)

forecast_2025, trend_coef = advanced_seasonal_forecast_2025(train_data)

months_2025 = range(1, 13)
predictions_2025 = []

for month in months_2025:
    predicted_price = forecast_2025[month]
    predictions_2025.append({
        'month': month,
        'price': round(predicted_price, 3)
    })

forecast_df = pd.DataFrame(predictions_2025)

print(f"\n2025 Price Forecasts:")
print(forecast_df.to_string(index=False))

Complete data loaded:
Shape: (60, 2)
Data range: 2020 to 2024
Total records: 60
Training data: 60 records (2020-2024)

GENERATING 2025 STRAWBERRY PRICE FORECASTS

Yearly averages (last 5 years):
  2020: 3.517
  2021: 3.811
  2022: 4.041
  2023: 4.001
  2024: 3.838

Trend coefficient: 0.0831 per year
Projected 2025 trend level: 4.091
Trend adjustment factor: 1.065

Seasonal factors:
  Month 1: 1.172
  Month 2: 1.073
  Month 3: 0.963
  Month 4: 0.974
  Month 5: 0.903
  Month 6: 0.837
  Month 7: 0.828
  Month 8: 0.901
  Month 9: 0.940
  Month 10: 1.013
  Month 11: 1.146
  Month 12: 1.250

2025 Price Forecasts:
 month  price
     1  4.795
     2  4.390
     3  3.941
     4  3.984
     5  3.694
     6  3.423
     7  3.386
     8  3.686
     9  3.845
    10  4.143
    11  4.689
    12  5.114


Submit "strawberry-forecast.tsv" in Gradescope.

## Part 4: Buy Low, Freeze and Sell High

Using your 2025 forecast, analyze the profit picking different pairs of months to buy and sell strawberries.
Maximize your profit assuming that it costs &dollar;0.20 per pint to freeze the strawberries, &dollar;0.10 per pint per month to store the frozen strawberries and there is a 10% price discount from selling previously frozen strawberries.
So, if you buy a pint of strawberies for &dollar;1, freeze them, and sell them for &dollar;2 three months after buying them, then the profit is &dollar;2 * 0.9 - &dollar;1 - &dollar;0.20 - &dollar;0.10 * 3 = &dollar;0.30 per pint.
To evaluate a given pair of months, assume that you can invest &dollar;1,000,000 to cover all costs, and that you buy as many pints of strawberries as possible.

Write the results of your analysis to a file "timings.tsv" with columns for the buy_month, sell_month, pints_purchased, and expected_profit.

In [160]:
# YOUR CHANGES HERE

forecast_2025 = pd.read_csv('strawberry-forecast.tsv', sep='\t')
print("2025 Forecast loaded:")
print(forecast_2025.to_string(index=False))

FREEZE_COST = 0.20
STORAGE_COST_PER_MONTH = 0.10
DISCOUNT_RATE = 0.10
INVESTMENT = 1000000

def calculate_profit_per_pint(buy_month, sell_month, forecast_df):
    """Calculate profit per pint for a given buy/sell strategy"""
    
    buy_month_date = f'2025-{buy_month:02d}-01'
    sell_month_date = f'2025-{sell_month:02d}-01'

    buy_price = forecast_df[forecast_df['month'] == buy_month_date]['price'].iloc[0]
    sell_price_base = forecast_df[forecast_df['month'] == sell_month_date]['price'].iloc[0]
    
    sell_price = sell_price_base * (1 - DISCOUNT_RATE)
    
    if sell_month > buy_month:
        storage_months = sell_month - buy_month
    else:
        storage_months = (12 - buy_month) + sell_month
    
    freeze_cost = FREEZE_COST
    storage_cost = STORAGE_COST_PER_MONTH * storage_months
    total_cost_per_pint = buy_price + freeze_cost + storage_cost
    
    profit_per_pint = sell_price - total_cost_per_pint
    
    return profit_per_pint, buy_price, sell_price, storage_months, total_cost_per_pint

print(f"\n" + "="*80)
print("ANALYZING ALL BUY/SELL COMBINATIONS")
print("="*80)

results = []

for buy_month in range(1, 13):
    for sell_month in range(1, 13):
        if buy_month < sell_month:  
            profit_per_pint, buy_price, sell_price, storage_months, total_cost = calculate_profit_per_pint(
                buy_month, sell_month, forecast_2025
            )
            
            cost_per_pint = total_cost
            pints_possible = int(INVESTMENT / cost_per_pint)
            
            total_profit = profit_per_pint * pints_possible
            
            results.append({
                'buy_month': buy_month,
                'sell_month': sell_month,
                'buy_price': buy_price,
                'sell_price': sell_price,
                'storage_months': storage_months,
                'profit_per_pint': profit_per_pint,
                'pints_purchased': pints_possible,
                'expected_profit': total_profit,
                'total_cost_per_pint': total_cost
            })

results_df = pd.DataFrame(results)
results_df = results_df.sort_values('expected_profit', ascending=False)

print("TOP 10 MOST PROFITABLE STRATEGIES:")
print("-" * 120)
print("Buy | Sell | Buy Price | Sell Price | Storage | Profit/Pint | Pints      | Total Profit")
print("-" * 120)

for i, row in results_df.head(10).iterrows():
    print(f"{int(row['buy_month']):3d} | {int(row['sell_month']):4d} | ${row['buy_price']:8.3f} | "
      f"${row['sell_price']:9.3f} | {int(row['storage_months']):7d} | "
      f"${row['profit_per_pint']:10.3f} | {int(row['pints_purchased']):10,d} | "
      f"${row['expected_profit']:11,.0f}")

optimal = results_df.iloc[0]

print(f"\n" + "="*60)
print("OPTIMAL STRATEGY")
print("="*60)
print(f"Buy Month: {optimal['buy_month']} (${optimal['buy_price']:.3f}/pint)")
print(f"Sell Month: {optimal['sell_month']} (${optimal['sell_price']:.3f}/pint after 10% discount)")
print(f"Storage Duration: {optimal['storage_months']} months")
print(f"Profit per Pint: ${optimal['profit_per_pint']:.3f}")
print(f"Pints Purchased: {optimal['pints_purchased']:,}")
print(f"Total Expected Profit: ${optimal['expected_profit']:,.0f}")

print(f"\nCOST BREAKDOWN (per pint):")
print(f"Purchase price: ${optimal['buy_price']:.3f}")
print(f"Freezing cost: ${FREEZE_COST:.3f}")
print(f"Storage cost: ${STORAGE_COST_PER_MONTH * optimal['storage_months']:.3f} ({optimal['storage_months']} months)")
print(f"Total cost: ${optimal['total_cost_per_pint']:.3f}")
print(f"Selling price: ${optimal['sell_price']:.3f}")
print(f"Net profit: ${optimal['profit_per_pint']:.3f}")

print(f"\n" + "="*60)
print("SEASONAL INSIGHTS")
print("="*60)

cheapest_month = forecast_2025.loc[forecast_2025['price'].idxmin()]
expensive_month = forecast_2025.loc[forecast_2025['price'].idxmax()]

print(f"Cheapest month: {cheapest_month['month']} (${cheapest_month['price']:.3f})")
print(f"Most expensive month: {expensive_month['month']} (${expensive_month['price']:.3f})")
print(f"Price range: ${expensive_month['price'] - cheapest_month['price']:.3f}")

if optimal['buy_month'] == cheapest_month['month']:
    print("Optimal strategy buys in cheapest month")
else:
    print(f"Optimal strategy doesn't buy in cheapest month (accounts for storage costs)")

2025 Forecast loaded:
     month    price
2025-01-01 4.729992
2025-02-01 4.342387
2025-03-01 3.899768
2025-04-01 3.954037
2025-05-01 3.667817
2025-06-01 3.405629
2025-07-01 3.371737
2025-08-01 3.677507
2025-09-01 3.845667
2025-10-01 4.140102
2025-11-01 4.699538
2025-12-01 5.134389

ANALYZING ALL BUY/SELL COMBINATIONS
TOP 10 MOST PROFITABLE STRATEGIES:
------------------------------------------------------------------------------------------------------------------------
Buy | Sell | Buy Price | Sell Price | Storage | Profit/Pint | Pints      | Total Profit
------------------------------------------------------------------------------------------------------------------------
  7 |   12 | $   3.372 | $    4.621 |       5 | $     0.549 |    245,595 | $    134,884
  6 |   12 | $   3.406 | $    4.621 |       6 | $     0.415 |    237,776 | $     98,753
  8 |   12 | $   3.678 | $    4.621 |       4 | $     0.343 |    233,781 | $     80,290
  7 |   11 | $   3.372 | $    4.230 |       4 | $   

Submit "timings.tsv" in Gradescope.

## Part 5: Strategy Check

What is the best profit scenario according to your previous timing analysis?
How much does that profit change if the sell price is off by one standard deviation from your backtest analysis?
(Variation in the sell price is more dangerous because you can see the buy price before fully committing.)

Write the results to a file "check.tsv" with columns best_profit and one_std_profit.

In [161]:
# YOUR CHANGES HERE

optimal_strategy = pd.read_csv('timings.tsv', sep='\t')
print("Optimal strategy loaded:")
print(optimal_strategy.to_string(index=False))

backtest_accuracy = pd.read_csv('backtest-accuracy.tsv', sep='\t')
std_deviation = backtest_accuracy['std'].iloc[0]
print(f"\nStandard deviation from backtest: {std_deviation:.4f}")

forecast_2025 = pd.read_csv('strawberry-forecast.tsv', sep='\t')
print("\n2025 forecast loaded")
print("First few rows of forecast:")
print(forecast_2025.head())
print(f"Forecast month column values: {forecast_2025['month'].tolist()}")

FREEZE_COST = 0.20
STORAGE_COST_PER_MONTH = 0.10
DISCOUNT_RATE = 0.10
INVESTMENT = 1000000

buy_month_raw = optimal_strategy['buy_month'].iloc[0]
sell_month_raw = optimal_strategy['sell_month'].iloc[0]
original_profit = optimal_strategy['expected_profit'].iloc[0]

if isinstance(buy_month_raw, str) and '-' in buy_month_raw:
    buy_month = int(buy_month_raw.split('-')[1])
    buy_month_str = buy_month_raw
else:
    buy_month = int(buy_month_raw)
    buy_month_str = f"2025-{buy_month:02d}"

if isinstance(sell_month_raw, str) and '-' in sell_month_raw:
    sell_month = int(sell_month_raw.split('-')[1])
    sell_month_str = sell_month_raw
else:
    sell_month = int(sell_month_raw)
    sell_month_str = f"2025-{sell_month:02d}"

print(f"\nOptimal Strategy Analysis:")
print(f"Buy month: {buy_month}")
print(f"Sell month: {sell_month}")
print(f"Buy month string: '{buy_month_str}'")
print(f"Sell month string: '{sell_month_str}'")
print(f"Original expected profit: ${original_profit:,.0f}")

buy_matches = forecast_2025[forecast_2025['month'] == buy_month_str]
sell_matches = forecast_2025[forecast_2025['month'] == sell_month_str]
print(f"\nDebugging:")
print(f"Buy month matches found: {len(buy_matches)}")
print(f"Sell month matches found: {len(sell_matches)}")

if len(buy_matches) == 0:
    print(f"No match found for buy month '{buy_month_str}' in forecast")
if len(sell_matches) == 0:
    print(f"No match found for sell month '{sell_month_str}' in forecast")

if len(buy_matches) == 0:
    buy_matches = forecast_2025[forecast_2025['month'] == buy_month]
    if len(buy_matches) > 0:
        print(f"Found buy month using integer format: {buy_month}")
        buy_month_str = buy_month

if len(sell_matches) == 0:
    sell_matches = forecast_2025[forecast_2025['month'] == sell_month]
    if len(sell_matches) > 0:
        print(f"Found sell month using integer format: {sell_month}")
        sell_month_str = sell_month

if len(buy_matches) > 0 and len(sell_matches) > 0:
    buy_price = forecast_2025[forecast_2025['month'] == buy_month_str]['price'].iloc[0]
    sell_price_forecast = forecast_2025[forecast_2025['month'] == sell_month_str]['price'].iloc[0]

    print(f"\nPrice details:")
    print(f"Buy price (Month {buy_month}): ${buy_price:.3f}")
    print(f"Forecasted sell price (Month {sell_month}): ${sell_price_forecast:.3f}")

    if sell_month > buy_month:
        storage_months = sell_month - buy_month
    else:
        storage_months = (12 - buy_month) + sell_month

    print(f"Storage duration: {storage_months} months")

    def calculate_profit_with_sell_price(sell_price_actual):
        """Calculate total profit given an actual sell price"""
        discounted_sell_price = sell_price_actual * (1 - DISCOUNT_RATE)
        total_cost_per_pint = buy_price + FREEZE_COST + (STORAGE_COST_PER_MONTH * storage_months)
        profit_per_pint = discounted_sell_price - total_cost_per_pint
        pints_possible = int(INVESTMENT / total_cost_per_pint)
        total_profit = profit_per_pint * pints_possible
        return total_profit, profit_per_pint, pints_possible

    best_profit, best_profit_per_pint, pints_bought = calculate_profit_with_sell_price(sell_price_forecast)

    sell_price_one_std_lower = sell_price_forecast - std_deviation
    one_std_profit, one_std_profit_per_pint, pints_one_std = calculate_profit_with_sell_price(sell_price_one_std_lower)

    print(f"\n" + "="*60)
    print("RISK ANALYSIS RESULTS")
    print("="*60)

    print(f"Best case scenario (forecasted price):")
    print(f"  Sell price: ${sell_price_forecast:.3f}")
    print(f"  Discounted sell price: ${sell_price_forecast * (1-DISCOUNT_RATE):.3f}")
    print(f"  Profit per pint: ${best_profit_per_pint:.3f}")
    print(f"  Total profit: ${best_profit:,.0f}")

    print(f"\nRisk scenario (sell price -1 std dev):")
    print(f"  Actual sell price: ${sell_price_one_std_lower:.3f}")
    print(f"  Discounted sell price: ${sell_price_one_std_lower * (1-DISCOUNT_RATE):.3f}")
    print(f"  Profit per pint: ${one_std_profit_per_pint:.3f}")
    print(f"  Total profit: ${one_std_profit:,.0f}")

    profit_change = one_std_profit - best_profit
    percent_change = (profit_change / best_profit) * 100 if best_profit != 0 else 0

    print(f"\nProfit impact:")
    print(f"  Profit change: ${profit_change:,.0f}")
    print(f"  Percentage change: {percent_change:.1f}%")

    if profit_change < 0:
        print(f"  Risk: Profit could be ${abs(profit_change):,.0f} lower")
    else:
        print(f"  Unexpected: Profit could be ${profit_change:,.0f} higher")

    results = pd.DataFrame({
        'best_profit': [best_profit],
        'one_std_profit': [one_std_profit]
    })
else:
    print("ERROR: Could not find matching months in forecast data!")

Optimal strategy loaded:
 buy_month  sell_month  pints_purchased  expected_profit
         7          12           245595    134884.071311
         6          12           237776     98753.446830
         8          12           233781     80290.439079
         7          11           251779     64920.485946
         9          12           230114     63346.639829
         6          11           243568     30191.475708
        10          12           220259     17807.536644
         8          11           239377     12465.949829
         5          12           218922     11632.111194
         9          11           235534     -3787.997653
         7          10           258282    -37617.471872
        10          11           225220    -47412.898639
         5          11           223822    -53321.727154
         4          12           201855    -67235.120263
         6          10           249648    -69785.864830
        11          12           200018    -75724.306523
      

Submit "check.tsv" in Gradescope.

## Part 6: Acknowledgments

Make a file "acknowledgments.txt" documenting any outside sources or help on this project.
If you discussed this assignment with anyone, please acknowledge them here.
If you used any libraries not mentioned in this module's content, please list them with a brief explanation what you used them for.
If you used any generative AI tools, please add links to your transcripts below, and any other information that you feel is necessary to comply with the generative AI policy.
If no acknowledgments are appropriate, just write none in the file.


In [162]:
from datetime import date
import os

ack_text = f"""DX704 Week 2 — Acknowledgments
Date: {date.today().isoformat()}

People / Discussions
- None.

External Libraries (beyond standard course stack)
- None. (Used only numpy, pandas, matplotlib.)

Data Sources
- strawberry-prices.tsv from the dx704-project-02 repository (course-provided; transformed from BLS/FRED series APU0000711415).
- USDA ERS article/figure on seasonal fruit pricing (referenced in the assignment prompt).

Example Code & References (allowed)
- https://github.com/bu-cds-omds/dx704-examples
- https://github.com/bu-cds-omds/dx601-examples
- https://github.com/bu-cds-omds/dx602-examples
- https://github.com/bu-cds-omds/dx603-examples

Generative AI Usage
- None.

Other Notes
- Only the files explicitly requested by the assignment were created (strawberry-backtest.tsv, backtest-accuracy.tsv, strawberry-forecast.tsv, timings.tsv, check.tsv, acknowledgments.txt).
"""

with open("acknowledgments.txt", "w", encoding="utf-8") as f:
    f.write(ack_text)

print("Exists?", os.path.exists("acknowledgments.txt"),
      "Size:", os.path.getsize("acknowledgments.txt"), "bytes")

Exists? True Size: 871 bytes


Submit "acknowledgments.txt" in Gradescope.

## Part 7: Code

Please submit a Jupyter notebook that can reproduce all your calculations and recreate the previously submitted files.
You do not need to provide code for data collection if you did that by manually.

Submit "project.ipynb" in Gradescope.