Analysis: 

We analyzed organic versus conventionally raised eggs. We wanted to see if recent changes in egg prices have affected both environmental types equally or differently. First, we aggregated the data from a CSV provided by the USDA. The data was then filtered down to include only eggs in the shell and with an environment of either specifically conventional or USDA Organic and their average price over a week period. The average monthly price per environment was then calculated from that dataset. 

Analysis showed that organic eggs maintained a relatively stable price over the course of 2024. The highest price for organic eggs was 5.74 in December 2024 with the lowest organic egg price being 4.48 in April 2024. This means there was only a fluctuation of $1.26 for organic eggs in 2024. 

The highest price for conventional eggs was 5.15 in November 2024 with the lowest conventional egg price being 1.93 in May 2024. This incidates a fluctuation much higher than in organic eggs, with the price difference for conventional eggs in 2024 being a change of 3.22. 

The average cost of organic eggs in 2024 was 5.01; the average cost of conventional eggs in 2024 was 2.74. 

The month with the greatest difference between organic and conventional eggs was January 2024 with a organic eggs being 3.29 higher than conventional. The month with the smallest difference between organic and conventional eggs was November 2024 with organic eggs being only 0.37. 

The implications of the data show that the prices of organic eggs fluctuated less than the price of conventional eggs over the course of 2024. The difference in conventional egg prices was 2.5x as much as the change in organic egg prices over the course of 2024. This implies that whatever external factor caused the large change in conventional eggs was not present in organic eggs, with the greatest month influence by an external factor to be November 2024. 

In [None]:
#Import Dependencies
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from scipy.stats import linregress

In [None]:
# Read in CSV file and create a DataFrame
raw_egg_df = pd.read_csv("Resources/USDA_Eggs_2024.csv")

# Remove unnecessary columns
raw_egg_df.drop(['office_name', 'office_code', 'office_city', 'office_state', 'published_date', 'commodity', 'slug_id', 'slug_name',
                'report_title', 'final_ind', 'report_date', 'community', 'condition', 'price_unit', 'price_min', 'price_max'], axis=1, inplace=True)
raw_egg_df

In [None]:
# Filter for shell egg
filtered_df = raw_egg_df.loc[(raw_egg_df["section"] == "Shell Egg")]
filtered_df

# Find the weighted data_frame for true average
weighted_df = filtered_df[["report_begin_date", "report_end_date", "region", "section", "type", "quality_grade", "package_size", "environment", "price_avg", "store_count"]]
weighted_df["weighted"] = weighted_df["price_avg"] * weighted_df["store_count"] 
weighted_df
weighted_df["weighted"]= weighted_df["weighted"] / weighted_df["store_count"].sum()
weighted_df

In [None]:
# Select only relevant columns
environment_df = weighted_df[['report_begin_date', 'environment', 'price_avg']]
environment_df

In [None]:
#Filter so only USDA Organic and Conventional columns remain and rename columns
filtered_env_df = environment_df[environment_df['environment'].isin(['USDA Organic', 'Conventional'])]
filtered_env_df
pretty_df = filtered_env_df.rename(columns={
    'report_begin_date': 'Date',
    'environment': 'Environment',
    'price_avg': 'Avg. Price'
})
pretty_df

In [None]:
#Finding average price for date and environment that are the same
weekly_df = pretty_df.groupby(['Date', 'Environment'], as_index=False)['Avg. Price'].mean()
weekly_df

In [None]:
#Get average per month instead of by week
weekly_df['Date'] = pd.to_datetime(weekly_df['Date'])
weekly_df['Month'] = weekly_df['Date'].dt.to_period('M')
monthly_df = weekly_df.groupby(['Month', 'Environment'], as_index=False)['Avg. Price'].mean()
monthly_df

In [None]:
organic_df = monthly_df[monthly_df['Environment'] == 'USDA Organic']
conventional_df = monthly_df[monthly_df['Environment'] == 'Conventional']

x_axis = ['Janaury', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
tick_locations = [value for value in x_axis]
plt.figure(figsize=(10, 5))
plt.plot(x_axis, organic_df['Avg. Price'], color='green', marker='o', linestyle='-', label='Organic')
plt.plot(x_axis, conventional_df['Avg. Price'], color='blue', marker='o', linestyle='-', label='Conventional')

# Adding titles and labels
plt.title('Average Price of Eggs by Environment in 2024')
plt.xlabel('Month')
plt.ylabel('Average Price ($)')
plt.xticks(rotation=45)  # Rotate month labels for better readability
plt.legend()  # Show legend
plt.grid()
plt.ylim(bottom=0)

# Show the plot
plt.tight_layout()
plt.show()

# Save the figure
plt.savefig("Project 1/Environment.png")

In [None]:
#Find summary stats
max_organic = round(organic_df['Avg. Price'].max(), 2)
min_organic = round(organic_df['Avg. Price'].min(), 2)
max_conv = round(conventional_df['Avg. Price'].max(), 2)
min_conv = round(conventional_df['Avg. Price'].min(), 2)
mean_organic = round(organic_df['Avg. Price'].mean(), 2)
mean_conv = round(conventional_df['Avg. Price'].mean(), 2)

print(f"The maximum price of organic eggs in 2024 was {max_organic}.")
print(f"The minimum price of organic eggs in 2024 was {min_organic}.")
print(f"The maximum price of conventional eggs in 2024 was {max_conv}.")
print(f"The minimum price of conventional eggs in 2024 was {min_conv}.")
print(f"The mean price of organic eggs in 2024 was {mean_organic}.")
print(f"The mean price of conventional eggs in 2024 was {mean_conv}.")