# Strawberry Price Prediction - Exploratory Data Analysis

This notebook explores the strawberry price prediction dataset, focusing on:
1. Data overview and missing values
2. Price distribution and trends
3. Weather features analysis
4. Seasonal patterns

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from utils import *

%matplotlib inline
plt.style.use('seaborn')

## 1. Load and Examine Data

In [None]:
# Load data
df = load_data('../data/raw/senior_ds_test.csv')

# Display basic information
print("Dataset Shape:", df.shape)
print("\nColumns:", df.columns.tolist())
print("\nData Types:\n", df.dtypes)
print("\nSample Data:\n", df.head())

## 2. Missing Values Analysis

In [None]:
# Check missing values
print("Missing Values Count:\n")
print(df.isnull().sum())

# Plot missing values heatmap
plot_missing_values(df)

## 3. Split Train/Test Before Analysis

In [None]:
# Split data
train_df, test_df = split_train_test(df)

print("Training set shape:", train_df.shape)
print("Testing set shape:", test_df.shape)

# Display date ranges
print("\nTraining data date range:")
print(f"Start: {train_df['start_date'].min()}, End: {train_df['start_date'].max()}")
print("\nTesting data date range:")
print(f"Start: {test_df['start_date'].min()}, End: {test_df['start_date'].max()}")

## 4. Price Analysis

In [None]:
# Plot price distribution and time series
plot_price_distribution(train_df)

# Basic statistics
print("\nPrice Statistics (Training Set):")
print(train_df['price'].describe())

## 5. Weather Features Analysis

In [None]:
# Plot correlations
plot_weather_correlations(train_df)

# Weather features statistics
weather_cols = ['windspeed', 'temp', 'cloudcover', 'precip', 'solarradiation']
print("\nWeather Features Statistics (Training Set):")
print(train_df[weather_cols].describe())

## 6. Seasonal Patterns

In [None]:
# Plot seasonal patterns
plot_seasonal_patterns(train_df)

# Additional seasonal analysis
print("\nAverage Price by Year:")
print(train_df.groupby('year')['price'].mean())

## 7. Key Findings and Next Steps

1. Missing Values Pattern:
   - [To be filled after analysis]

2. Price Trends:
   - [To be filled after analysis]

3. Weather Correlations:
   - [To be filled after analysis]

4. Seasonal Effects:
   - [To be filled after analysis]

Next Steps:
1. Handle missing values separately for train and test sets
2. Feature engineering based on observed patterns
3. Model selection considering the seasonal nature of the data