<a href="https://www.kaggle.com/code/adel1235/avocado-analysis-with-shiny-web-application?scriptVersionId=180150905" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

---
title: "Avocado"
author: "Adel"
date: "2024-05-13"
output:
  html_document: default
  pdf_document: default
---
<style>
    body {
        margin: 20px;
        padding: 20px;

    }
</style>

<h1>Big Picture: Forecasting Avocado Average Price with Shiny Web Application</h1>

<ul>
  <li><a href="#executive-summary">Executive Summary</a></li>
  <li><a href="#introduction">Introduction</a></li>
  <li><a href="#methodology">Methodology</a>
    <ul>
      <li><a href="#perform-data-collection">Perform Data Collection</a></li>
      <li><a href="#perform-data-wrangling">Perform Data Wrangling</a></li>
      <li><a href="#perform-exploratory-data-analysis">Perform Exploratory Data Analysis</a></li>
      <li><a href="#perform-predictive-analysis-using-regression-models">Perform Predictive Analysis Using Regression Models</a></li>
      <li><a href="#build-an-r-shiny-dashboard-app">Build an R Shiny Dashboard App</a></li>
    </ul>
  </li>
  <li><a href="#results">Results</a></li>
  <li><a href="#ConClusion">Conclusion</a></li>
</ul>

&nbsp;

<h2 id="executive-summary">Executive Summary</h2>
<ul>
  <li><strong>Objective:</strong> The focus of this analysis is to predict the average selling price of avocados across various regions in the United States.</li>
  <li><strong>Goal:</strong> Our aim is to provide precise insights and analysis regarding the average selling price of avocados among US regions, catering to both sellers and consumers.</li>
  <li><strong>Approach:</strong> We have applied advanced forecasting techniques based on historical data to predict future avocado prices.</li>
  <li><strong>Influencing Factors:</strong> The analysis takes into account various influencing factors such as the total volume sold, the type of avocado, the kind of bags sold, and the avocado type.</li>
</ul>

<h2 id="introduction">Introduction</h2>

<h3>Context:</h3>
<p>Our analysis revolves around the average selling price of avocados in various regions. We aim to delve into the factors influencing these prices and provide insights that can aid stakeholders in the avocado market.</p>

<h3>How?</h3>
<p>We adopt a multi-faceted approach to analyze the average selling price of avocados, employing various types of analysis:</p>
<ol>
  <li><strong>Descriptive Analysis:</strong> This type of analysis helps us understand the basic characteristics of the data, such as mean, median, and distribution of avocado prices.</li>
  <li><strong>Time Series Analysis:</strong> By examining historical avocado price data over time, we aim to identify trends, seasonality, and patterns that can guide future price predictions.</li>
  <li><strong>Correlation Analysis:</strong> We explore the relationships between avocado prices and other relevant variables, such as volume sold, avocado type, and packaging type, to uncover potential factors influencing price fluctuations.</li>
  <li><strong>Predictive Analysis:</strong> Leveraging advanced forecasting techniques, we endeavor to predict future avocado prices based on historical trends and patterns.</li>
</ol>
<hr />

<h2 id="methodology">Methodology</h2>
<hr />
<hr />

<h3 id="perform-data-collection">Perform Data Collection</h3>
<p>We gather relevant data on avocado prices, sales volumes, types, and regions from reputable sources to ensure the accuracy and reliability of our analysis.</p>

<details>
  <summary>Show Code</summary>
  
  ```r
  # Install required packages
  install.packages("ggmap")
  install.packages("plotly")
  install.packages("reshape2")
  install.packages("caret")
  install.packages("zoo")

  # Load necessary libraries
  library(zoo)
  library(tidymodels)
  library(stringr)
  library(plotly)
  library(httr)
  library(glmnet)
  library(tidyverse)
  library(rvest)
  library(dplyr)
  library(gridExtra)
  library(patchwork)
  library(scales)
  library(reshape2)
  library(broom)
  library(car)
  library(caret)
  library(forecast)
  library(ggthemes)
  library(viridis)

  # Read the dataset
  df2 <- read.csv("C:\\Users\\DLECU\\Downloads\\avocado.csv")
  avocado <- data.frame(df2)
  view(avocado)
  avocado <- avocado[-1]
  ```
</details>

<hr />
<hr />

<h3 id="perform-data-wrangling">Perform Data Wrangling</h3>
<p>We preprocess the collected data to handle missing values, outliers, and inconsistencies. This step ensures that our data is clean, organized, and ready for analysis.</p>
<details>
  <summary>Show Code</summary>
    ```r

  #Set New column name for Better Understand
new_names <- c("HASS.4046", "FUERTE.4225", "BACON.4770")
colnames(avocado)[4:6] <- new_names[1:3]
# Replace any white space separators by underscores, using the str_replace_all function
colnames(avocado) <- str_replace_all(colnames(avocado), " ", "_")
# Convert all column names to uppercase
colnames(avocado) <- toupper(colnames(avocado))
glimpse(avocado)
find_character <- function(strings) {
  grepl("[^0-9.]", strings)
}
#Let's try to find any elements in the `AveragePrice` column containing non-numeric characters.
avocado %>% 
  select(AVERAGEPRICE) %>% 
  filter(find_character(AVERAGEPRICE)) %>%
  slice(0:10)
#Let's try to find any elements in the `Total Volume` column containing non-numeric characters.
avocado %>% 
  select(TOTAL.VOLUME) %>% 
  filter(find_character(TOTAL.VOLUME)) %>%
  slice(0:10)
#Let's try to find any elements in the `Hass-4046 ` column containing non-numeric characters.
avocado %>% 
  select(HASS.4046) %>% 
  filter(find_character(HASS.4046)) %>%
  slice(0:10)
#Let's try to find any elements in the `Fuerte.4225 ` column containing non-numeric characters.
avocado %>% 
  select(FUERTE.4225) %>% 
  filter(find_character(FUERTE.4225)) %>%
  slice(0:10)
#Let's try to find any elements in the `Bacon.4770 ` column containing non-numeric characters.
avocado %>% 
  select(BACON.4770) %>% 
  filter(find_character(BACON.4770)) %>%
  slice(0:10)
#Let's try to find any elements in the `Total.Bags ` column containing non-numeric characters.
avocado %>% 
  select(TOTAL.BAGS) %>% 
  filter(find_character(TOTAL.BAGS)) %>%
  slice(0:10)
#Let's try to find any elements in the `Small.Bags ` column containing non-numeric characters.
avocado %>% 
  select(SMALL.BAGS) %>% 
  filter(find_character(SMALL.BAGS)) %>%
  slice(0:10)
#Let's try to find any elements in the `Large.Bags ` column containing non-numeric characters.
avocado %>% 
  select(LARGE.BAGS) %>% 
  filter(find_character(LARGE.BAGS)) %>%
  slice(0:10)
#Let's try to find any elements in the `XLarge.Bags ` column containing non-numeric characters.
avocado %>% 
  select(XLARGE.BAGS) %>% 
  filter(find_character(XLARGE.BAGS)) %>%
  slice(0:10)
#Check data type
sapply(avocado, typeof)
# Correct DATE column To Date Format
avocado <- avocado %>%
  mutate(DATE = as.Date(DATE),
         YEAR = as.integer(YEAR))  # Assuming "Year" column is already in integer format
#Check data type
sapply(avocado, typeof)
# Remove duplicate rows
avocado <- avocado %>%
  distinct()
# Check the structure of the data frame to confirm the change
str(avocado)
# Get distinct city names from the 'REGION' column
distinct_cities <- unique(avocado$REGION)
distinct_cities
# Save the dataset 
write.csv(avocado, "E:/R-Project/Avocado/avocado.csv", row.names=FALSE)
view(avocado)
  ```
</details>

<hr />
<hr />

<h3 id="perform-exploratory-data-analysis">Perform Exploratory Data Analysis</h3>
<p>We explore the data to understand its underlying patterns, trends, and relationships. This step includes visualizations and statistical summaries to gain insights.</p>
<details>
  <summary>Show Code</summary>
    ```r
  # Create a new column for MONTH
avocado <- avocado %>%
  mutate(MONTH = month(DATE))

# Create a new column for MONTH
avocado <- avocado %>%
  mutate(MONTH = month(DATE))

# Summarize data by month, year, and type
monthly_summary <- avocado %>%
  group_by(YEAR, MONTH, TYPE) %>%
  summarise(AVG_AVERAGEPRICE = mean(AVERAGEPRICE),
            TOTAL_VOLUME = sum(TOTAL.VOLUME),
            TOTAL_HASS_4046 = sum(HASS.4046),
            TOTAL_FUERTE_4225 = sum(FUERTE.4225),
            TOTAL_BACON_4770 = sum(BACON.4770),
            TOTAL_TOTAL_BAGS = sum(TOTAL.BAGS),
            TOTAL_SMALL_BAGS = sum(SMALL.BAGS),
            TOTAL_LARGE_BAGS = sum(LARGE.BAGS),
            TOTAL_XLARGE_BAGS = sum(XLARGE.BAGS)) 

# Create a pseudo-date variable combining YEAR and MONTH
monthly_summary$Date <- as.Date(paste(monthly_summary$YEAR, monthly_summary$MONTH, "01", sep = "-"))

# Plot using ggplot for AVG_AVERAGEPRICE
ggplot(monthly_summary, aes(x = Date, y = AVG_AVERAGEPRICE, color = TYPE, fill = TYPE)) + 
  geom_area(alpha = 0.3, position = position_dodge(0.8)) + 
  theme_minimal() +  
  scale_color_manual(values = c("conventional" = "#ED7921", "organic" = "#62BE51")) + 
  scale_fill_manual(values = c("conventional" = "#FD833E", "organic" = "#B8FC5F")) +
  labs(title = "Average Avocado Price by Date", x = "Date", y = "Average Price") +  
  theme(axis.title.x = element_text(face = "bold", size = 12),  
        axis.title.y = element_text(face = "bold", size = 12),  
        axis.text.x = element_text(size = 10),  
        axis.text.y = element_text(size = 10))
        
        
        
# Plot using ggplot for AVG_TOTAL_VOLUME
ggplot(monthly_summary, aes(x = Date, y = TOTAL_VOLUME, color = TYPE, fill = TYPE)) + 
  geom_area(alpha = 0.3, position = position_dodge(0.8)) + 
  theme_minimal() +  
  scale_color_manual(values = c("conventional" = "#ED7921", "organic" = "#62BE51")) + 
  scale_fill_manual(values = c("conventional" = "#FD833E", "organic" = "#B8FC5F")) +
  labs(title = "Total Avocado Volume by Date", x = "Date", y = "Total Volume") +  # Adding titles to the plot
  theme(axis.title.x = element_text(face = "bold", size = 12),  # Styling x-axis title
        axis.title.y = element_text(face = "bold", size = 12),  # Styling y-axis title
        axis.text.x = element_text(size = 10),  # Styling x-axis labels
        axis.text.y = element_text(size = 10)) +  # Styling y-axis labels
  scale_y_continuous(labels = scales::comma)  # Formatting y-axis labels as comma-separated values

# Plot using ggplot with adjusted y-axis scale and styled labels
ggplot(monthly_summary, aes(x = Date, y = TOTAL_TOTAL_BAGS, color = TYPE, fill = TYPE)) + 
  geom_area(alpha = 0.3, position = position_dodge(0.8)) + 
  theme_minimal() +  
  scale_color_manual(values = c("conventional" = "#ED7921", "organic" = "#62BE51")) + 
  scale_fill_manual(values = c("conventional" = "#FD833E", "organic" = "#B8FC5F")) +
  scale_y_continuous(labels = scales::comma) +  # Adjusting y-axis scale to display in comma-separated format
  labs(title = " Total Bags by Date", x = "Date", y = "Total Bags") +  # Adding titles to the plot
  theme(axis.title.x = element_text(face = "bold", size = 12),  # Styling x-axis title
        axis.title.y = element_text(face = "bold", size = 12),  # Styling y-axis title
        axis.text.x = element_text(size = 10),  # Styling x-axis labels
        axis.text.y = element_text(size = 10))  # Styling y-axis labels


# Define the order of months
month_order <- c("January", "February", "March", "April", "May", "June",
                 "July", "August", "September", "October", "November", "December")

# Convert MONTH to factor with specified levels
monthly_summary$MONTH <- factor(monthly_summary$MONTH, levels = 1:12, labels = month_order)


# Convert MONTH to factor with specified levels
monthly_summary$MONTH <- factor(monthly_summary$MONTH, levels = month_order)

# Plotting for conventional avocados
conv_pat_yearly <- monthly_summary %>%
  filter(TYPE == "conventional", YEAR %in% c(2015, 2016, 2017)) %>%
  ggplot(aes(x = MONTH, y = AVG_AVERAGEPRICE, group = YEAR)) +
  geom_point(color = "#5D6D7E") +
  geom_line(group = 1, color = "blue") +
  facet_wrap(~ as.factor(YEAR)) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5,face = "bold"),
        plot.background = element_rect(fill = "#DCFCE6"),
        axis.text.x = element_text(angle = 90)) +
  labs(title = "Seasonal Fluctuations \n Conventional Avocados",
       x = "Month",
       y = "Average Price",
       caption = "Data Source: Your Data Source")
conv_pat_yearly

# Plotting for organic avocados
org_pat_yearly <- monthly_summary %>%
  filter(TYPE == "organic", YEAR %in% c(2015, 2016, 2017)) %>%
  ggplot(aes(x = MONTH, y = AVG_AVERAGEPRICE, group = YEAR)) +
  geom_point(color = "#5D6D7E") +
  geom_line(group = 1, color = "#E74C3C") +
  facet_wrap(~ as.factor(YEAR)) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5,face = "bold"),
        plot.background = element_rect(fill = "#dddac3"),
        axis.text.x = element_text(angle = 90)) +
  labs(title = "Seasonal Fluctuations \n Organic Avocados",
       x = "Month",
       y = "Average Price",
       caption = "Data Source: Your Data Source")
org_pat_yearly
# Arrange plots side by side
grid.arrange(conv_pat_yearly, org_pat_yearly, nrow = 2)

# Plotting for conventional avocados
conv_pat_yearly <- monthly_summary %>%
  filter(TYPE == "conventional", YEAR %in% c(2015, 2016, 2017)) %>%
  ggplot(aes(x = MONTH, y = TOTAL_VOLUME, group = YEAR)) +
  geom_point(color = "#5D6D7E") +
  geom_line(group = 1, color = "blue") +
  facet_wrap(~ as.factor(YEAR)) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5),
        plot.background = element_rect(fill = "#F4F6F7"),
        axis.text.x = element_text(angle = 90)) +
  labs(title = "Seasonal Fluctuations \n Conventional Avocados",
       x = "Month",
       y = "Total Volume",
       caption = "Data Source: Your Data Source") +
  scale_y_continuous(labels = comma_format())
conv_pat_yearly
# Plotting for organic avocados
org_pat_yearly <- monthly_summary %>%
  filter(TYPE == "organic", YEAR %in% c(2015, 2016, 2017)) %>%
  ggplot(aes(x = MONTH, y = TOTAL_VOLUME, group = YEAR)) +
  geom_point(color = "#5D6D7E") +
  geom_line(group = 1, color = "#E74C3C") +
  facet_wrap(~ as.factor(YEAR)) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5),
        plot.background = element_rect(fill = "#F4F6F7"),
        axis.text.x = element_text(angle = 90)) +
  labs(title = "Seasonal Fluctuations \n Organic Avocados",
       x = "Month",
       y = "Total Volume",
       caption = "Data Source: Your Data Source") +
  scale_y_continuous(labels = comma_format())

# Arrange plots side by side
grid.arrange(conv_pat_yearly, org_pat_yearly, nrow = 2)

#Df2 monthly sales according to year& month 
# Create another data frame without the "TYPE" column
monthly_summary_without_type <-avocado %>%
  group_by(YEAR, MONTH) %>%
  summarise(AVG_AVERAGEPRICE = mean(AVERAGEPRICE),
            TOTAL_VOLUME = sum(TOTAL.VOLUME),
            TOTAL_HASS_4046 = sum(HASS.4046),
            TOTAL_FUERTE_4225 = sum(FUERTE.4225),
            TOTAL_BACON_4770 = sum(BACON.4770),
            TOTAL_TOTAL_BAGS = sum(TOTAL.BAGS),
            TOTAL_SMALL_BAGS = sum(SMALL.BAGS),
            TOTAL_LARGE_BAGS = sum(LARGE.BAGS),
            TOTAL_XLARGE_BAGS = sum(XLARGE.BAGS)) 
view(monthly_summary_without_type)

# Define the order of months
month_order1 <- c("January", "February", "March", "April", "May", "June",
                 "July", "August", "September", "October", "November", "December")

# Convert MONTH to factor with specified levels
monthly_summary_without_type$MONTH <- factor(monthly_summary_without_type$MONTH, levels = 1:12, labels = month_order1)


# Convert MONTH to factor with specified levels
monthly_summary_without_type$MONTH <- factor(monthly_summary_without_type$MONTH, levels = month_order1)

# Plotting for conventional avocados
conv_pat_yearly <- monthly_summary_without_type %>%
  filter(YEAR %in% c(2015, 2016, 2017)) %>%
  ggplot(aes(x = MONTH, y = TOTAL_VOLUME, group = YEAR)) +
  geom_point(color = "#5D6D7E") +
  geom_line(group = 1, color = "blue") +
  facet_wrap(~ as.factor(YEAR)) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5),
        plot.background = element_rect(fill = "#F4F6F7"),
        axis.text.x = element_text(angle = 90)) +
  labs(title = "Seasonal Fluctuations \n Conventional Avocados",
       x = "Month",
       y = "Total Volume",
       caption = "Data Source: Your Data Source") +
  scale_y_continuous(labels = comma_format())

# Plotting for organic avocados
org_pat_yearly <- monthly_summary_without_type %>%
  filter(YEAR %in% c(2015, 2016, 2017)) %>%
  ggplot(aes(x = MONTH, y = TOTAL_VOLUME, group = YEAR)) +
  geom_point(color = "#5D6D7E") +
  geom_line(group = 1, color = "#E74C3C") +
  facet_wrap(~ as.factor(YEAR)) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5),
        plot.background = element_rect(fill = "#F4F6F7"),
        axis.text.x = element_text(angle = 90)) +
  labs(title = "Seasonal Fluctuations \n Organic Avocados",
       x = "Month",
       y = "Total Volume",
       caption = "Data Source: Your Data Source") +
  scale_y_continuous(labels = comma_format())

# Arrange plots side by side
grid.arrange(conv_pat_yearly, org_pat_yearly, nrow = 2)


# Plotting for conventional avocados
conv_pat_yearly <- monthly_summary_without_type %>%
  filter(YEAR %in% c(2015, 2016, 2017)) %>%
  ggplot(aes(x = MONTH, y = AVG_AVERAGEPRICE, group = YEAR)) +
  geom_point(color = "#5D6D7E") +
  geom_line(group = 1, color = "blue") +
  facet_wrap(~ as.factor(YEAR)) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5),
        plot.background = element_rect(fill = "#F4F6F7"),
        axis.text.x = element_text(angle = 90)) +
  labs(title = "Seasonal Fluctuations \n Conventional Avocados",
       x = "Month",
       y = "Average Price",
       caption = "Data Source: Your Data Source") +
  scale_y_continuous(labels = comma_format())

# Plotting for organic avocados
org_pat_yearly <- monthly_summary_without_type %>%
  filter(YEAR %in% c(2015, 2016, 2017)) %>%
  ggplot(aes(x = MONTH, y = AVG_AVERAGEPRICE, group = YEAR)) +
  geom_point(color = "#5D6D7E") +
  geom_line(group = 1, color = "#E74C3C") +
  facet_wrap(~ as.factor(YEAR)) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5),
        plot.background = element_rect(fill = "#F4F6F7"),
        axis.text.x = element_text(angle = 90)) +
  labs(title = "Seasonal Fluctuations \n Organic Avocados",
       x = "Month",
       y = "Average Price",
       caption = "Data Source: Your Data Source") +
  scale_y_continuous(labels = comma_format())

# Arrange plots side by side
grid.arrange(conv_pat_yearly, org_pat_yearly, nrow = 2)

#DF3 monthly sales according to quartile
# Convert MONTH column to numeric
monthly_summary <- monthly_summary %>%
  mutate(MONTH = as.numeric(MONTH))
view(monthly_summary)
# Add a new column containing an identifier for each 3-month period
monthly_summary <- monthly_summary %>%
  mutate(Quarter = ceiling(MONTH / 3))
# Group the data by the quarterly identifier and create a new variable for x-axis labels
quarterly_summary <- monthly_summary %>%
  group_by(YEAR, Quarter, TYPE) %>%
  summarise(AVG_AVERAGEPRICE = mean(AVG_AVERAGEPRICE),
            TOTAL_VOLUME = sum(TOTAL_VOLUME),
            TOTAL_HASS_4046 = sum(TOTAL_HASS_4046),
            TOTAL_FUERTE_4225 = sum(TOTAL_FUERTE_4225),
            TOTAL_BACON_4770 = sum(TOTAL_BACON_4770),
            TOTAL_TOTAL_BAGS = sum(TOTAL_TOTAL_BAGS),
            TOTAL_SMALL_BAGS = sum(TOTAL_SMALL_BAGS),
            TOTAL_LARGE_BAGS = sum(TOTAL_LARGE_BAGS),
            TOTAL_XLARGE_BAGS = sum(TOTAL_XLARGE_BAGS)) %>%
  mutate(Quarter_Year = paste(Quarter, "-", YEAR))

# Convert Quarter_Year to a factor with levels sorted as desired
quarterly_summary$Quarter_Year <- factor(quarterly_summary$Quarter_Year, 
                                         levels = unique(quarterly_summary$Quarter_Year))

# Plot the data with the custom x-axis labels and differentiating between types
ggplot(quarterly_summary, aes(x = Quarter_Year, y = AVG_AVERAGEPRICE, color = TYPE)) +
  geom_point(size = 4) +
  geom_line(group = 1) +
  scale_color_manual(values = c("#F35D5D", "#7FB3D5")) +
  my_theme +
  labs(
    title = "Average Price by Quarter and Type",
    x = "Quarter - Year",
    y = "Average Price"
  ) +
theme(plot.background = element_rect(fill = alpha("#F9E79F", 0.3)))



# Plot the data with Total Volume and a lighter background color
ggplot(quarterly_summary, aes(x = Quarter_Year, y = TOTAL_VOLUME/1e6, color = TYPE)) +
  geom_point(size = 4) +
  geom_line(group = 1) +
  scale_color_manual(values = c("#F35D5D", "#7FB3D5")) +
  my_theme +
  labs(
    title = "Total Volume by Quarter and Type",
    x = "Quarter - Year",
    y = "Total Volume (Millions)"
  ) +
  theme(plot.background = element_rect(fill = alpha("#FFA07A", 0.1)))



# Create the plot with custom theme and color palette
ggplot(quarterly_summary, aes(x = Quarter, y = AVG_AVERAGEPRICE, color = TYPE, linetype = TYPE, label = YEAR)) +
  geom_line(size = 1.5, aes(group = TYPE)) +
  geom_point(size = 3, aes(shape = TYPE)) +
  scale_color_manual(values = my_colors) +
  scale_linetype_manual(values = c("solid", "dashed")) +
  labs(title = "Average Price by Quarter and Type",
       x = "Quarter",
       y = "Average Price",
       color = "Type",
       linetype = "Type") +
  my_theme +
  geom_text(size = 3, position = position_nudge(x = 0.2), show.legend = FALSE)  # Add text annotations for years



# Create the plot with custom theme and color palette
ggplot(quarterly_summary, aes(x = Quarter, y = TOTAL_VOLUME, color = TYPE, linetype = TYPE, label = YEAR)) +
  geom_line(size = 1.5, aes(group = TYPE)) +
  geom_point(size = 3, aes(shape = TYPE)) +
  scale_color_manual(values = my_colors) +
  scale_linetype_manual(values = c("solid", "dashed")) +
  labs(title = "Average Volume sold by Quarter and Type",
       x = "Quarter",
       y = "Total Volume",
       color = "Type",
       linetype = "Type") +
  my_theme +
  geom_text(size = 3, position = position_nudge(x = 0.2), show.legend = FALSE) +  # Add text annotations for years
  scale_y_continuous(labels = scales::comma)  # Format y-axis labels with commas

#DF4 monthly sales according to quartile for both kind
quarterly_summary_all <-monthly_summary %>%
  group_by(YEAR, Quarter) %>%
  summarise(AVG_AVERAGEPRICE = mean(AVG_AVERAGEPRICE),
            TOTAL_VOLUME = sum(TOTAL_VOLUME),
            TOTAL_HASS_4046 = sum(TOTAL_HASS_4046),
            TOTAL_FUERTE_4225 = sum(TOTAL_FUERTE_4225),
            TOTAL_BACON_4770 = sum(TOTAL_BACON_4770),
            TOTAL_TOTAL_BAGS = sum(TOTAL_TOTAL_BAGS),
            TOTAL_SMALL_BAGS = sum(TOTAL_SMALL_BAGS),
            TOTAL_LARGE_BAGS = sum(TOTAL_LARGE_BAGS),
            TOTAL_XLARGE_BAGS = sum(TOTAL_XLARGE_BAGS)) %>%
  mutate(Quarter_Year = paste(Quarter, "-", YEAR))
view(quarterly_summary_all)



# Convert Quarter_Year to a factor with levels sorted as desired
quarterly_summary_all$Quarter_Year <- factor(quarterly_summary_all$Quarter_Year, 
                                            levels = unique(quarterly_summary_all$Quarter_Year))
# Plot the data with the custom x-axis labels and no differentiation between types
ggplot(quarterly_summary_all, aes(x = Quarter_Year, y = AVG_AVERAGEPRICE)) +
  geom_point(size = 4) +
  geom_line(group = 1) +
  my_theme +
  labs(
    title = "Average Price by Quarter and Year",
    x = "Quarter - Year",
    y = "Average Price"
  ) +
  theme(plot.background = element_rect(fill = alpha("#F9E79F", 0.3)))


# Convert Quarter_Year to a factor with levels sorted as desired
quarterly_summary_all$Quarter_Year <- factor(quarterly_summary_all$Quarter_Year, 
                                             levels = unique(quarterly_summary_all$Quarter_Year))

# Plot the data with the custom x-axis labels and no differentiation between types
ggplot(quarterly_summary_all, aes(x = Quarter_Year, y = TOTAL_VOLUME)) +
  geom_point(size = 4) +
  geom_line(group = 1) +
  my_theme +
  labs(
    title = "Average Volume Sold by Quarter and Year",
    x = "Quarter - Year",
    y = "Average Volume"
  ) +
  theme(plot.background = element_rect(fill = alpha("#FFA07A", 0.3)))+
  scale_y_continuous(labels = scales::comma)
# Create a seasonal column
avocado <- avocado %>%
  mutate(season = ifelse(MONTH %in% c("3", "4", "5"), "Spring",
                         ifelse(MONTH %in% c("6", "7", "8"), "Summer",
                                ifelse(MONTH %in% c("9", "10", "11"), "Fall",
                                       ifelse(MONTH %in% c("12", "1", "2"), "Winter", "Unknown")))))
# Group by season, year, and type, excluding the year 2018
seasonal_summary <- avocado %>%
  filter(YEAR != 2018) %>%
  group_by(season, YEAR, TYPE) %>%
  summarise(AVG_AVERAGEPRICE = mean(AVERAGEPRICE),
            TOTAL_VOLUME = sum(TOTAL.VOLUME),
            TOTAL_HASS_4046 = sum(HASS.4046),
            TOTAL_FUERTE_4225 = sum(FUERTE.4225),
            TOTAL_BACON_4770 = sum(BACON.4770),
            TOTAL_TOTAL_BAGS = sum(TOTAL.BAGS),
            TOTAL_SMALL_BAGS = sum(SMALL.BAGS),
            TOTAL_LARGE_BAGS = sum(LARGE.BAGS),
            TOTAL_XLARGE_BAGS = sum(XLARGE.BAGS))

print(seasonal_summary)

# Create a new column containing the combination of season and year
seasonal_summary <- seasonal_summary %>%
  mutate(season_year = paste(season, YEAR, sep = "-"))

# Reorder the levels of the season variable
seasonal_summary$season <- factor(seasonal_summary$season, levels = c("Winter", "Spring", "Summer", "Fall"))

# Create the plot with custom theme and color palette
ggplot(seasonal_summary, aes(x = season, y = AVG_AVERAGEPRICE, color = TYPE, linetype = TYPE, label = YEAR)) +
  geom_line(size = 1.5, ) +
  geom_point(size = 3, ) +
  scale_color_manual(values = my_colors) +
  scale_linetype_manual(values = c("solid", "dashed")) +
  labs(title = "Average Price by Season and Type",
       x = "Season",
       y = "Average Price",
       color = "Type",
       linetype = "Type") +
  my_theme +
  geom_text(size = 3, position = position_nudge(x = 0.2), show.legend = FALSE)  # Add text annotations for years




#--------------------------------------------------------------
# Modify the theme and color palette
my_theme <- theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    axis.title = element_text(size = 12),
    axis.text = element_text(size = 10),
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8)
  )

# Color palette
my_colors <- c("#FF5733", "#3498DB")  # You can change these colors as desired

# Filter seasonal_summary for organic type only
seasonal_summary_organic <- seasonal_summary %>%
  filter(TYPE == "organic")

# Update plot for organic avocado
plot_organic <- ggplot(seasonal_summary_organic, aes(x = season, y = TOTAL_VOLUME, label = YEAR)) +
  geom_line(size = 1.5, color = my_colors[1]) +
  geom_point(size = 3, color = my_colors[1]) +
  labs(title = "Organic Avocado Volume by Season",
       x = "Season",
       y = "Total Volume",
       label = "Year") +
  my_theme +
  geom_text(size = 3, position = position_nudge(x = 0.2)) +
  scale_y_continuous(labels = scales::comma)
plot_organic




# Filter seasonal_summary for organic type only
seasonal_summary_conventional <- seasonal_summary %>%
  filter(TYPE == "conventional")
# Update plot for conventional avocado
plot_conventional <- ggplot(seasonal_summary_conventional, aes(x = season, y = TOTAL_VOLUME, label = YEAR)) +
  geom_line(size = 1.5, color = my_colors[2]) +
  geom_point(size = 3, color = my_colors[2]) +
  labs(title = "Conventional Avocado Volume by Season",
       x = "Season",
       y = "Total Volume",
       label = "Year") +
  my_theme +
  geom_text(size = 3, position = position_nudge(x = 0.2)) +
  scale_y_continuous(labels = scales::comma)

# Combine both plots
combined_plot <- grid.arrange(plot_organic, plot_conventional, ncol = 1)

# Print the combined plot
print(combined_plot)

#df7 is price affecting on kind of bags ?
ggplot(seasonal_summary, aes(x = AVG_AVERAGEPRICE)) +
  geom_line(aes(y = TOTAL_SMALL_BAGS, color = "Small Bags")) +
  geom_line(aes(y = TOTAL_LARGE_BAGS, color = "Large Bags")) +
  geom_line(aes(y = TOTAL_XLARGE_BAGS, color = "XLarge Bags")) +
  labs(title = "Total Bags by Season",
       x = "Average Price",
       y = "Total Bags",
       color = "Bags Type") +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", size = 14),
        axis.title = element_text(face = "bold", size = 12),
        legend.title = element_text(face = "bold", size = 10),
        legend.text = element_text(size = 8)) +
  scale_color_manual(values = c("Small Bags" = "red",
                                "Large Bags" = "green",
                                "XLarge Bags" = "purple"),
                     name = "Bags Type") +
  guides(color = guide_legend(override.aes = list(size = 3))) +  # Adjust legend size
  scale_y_continuous(labels = scales::comma)  # Format y-axis labels
view(seasonal_summary)





# Ensure YEAR is numeric
seasonal_summary$YEAR <- as.numeric(seasonal_summary$YEAR)

# Create season_year variable combining YEAR and season
seasonal_summary$season_year <- paste(seasonal_summary$season, seasonal_summary$YEAR, sep = "-")

# Sort data by season_year
seasonal_summary <- seasonal_summary[order(seasonal_summary$season, seasonal_summary$YEAR), ]


# Filter data for conventional type only
conventional_summary <- seasonal_summary[seasonal_summary$TYPE == "conventional", ]

# Create plot for Total Small Bags
plot_small <- ggplot(conventional_summary, aes(x = season, y = TOTAL_SMALL_BAGS, color = as.factor(YEAR), shape = as.factor(YEAR))) +
  geom_line(size = 1.5) +
  geom_point(size = 3) +
  labs(title = "Total Small Bags by Season-Year",
       x = "Season",
       y = "Total Small Bags",
       color = "Year",
       shape = "Year") +
  scale_y_continuous(labels = scales::comma) +
  scale_shape_manual(values = c(15, 16, 17), name = "Year") + # Assign shapes for years
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +  # Rotate x-axis labels
  guides(color = FALSE)  # Hide legend for color

# Create plot for Total Large Bags
plot_large <- ggplot(conventional_summary, aes(x = season, y = TOTAL_LARGE_BAGS, color = as.factor(YEAR), shape = as.factor(YEAR))) +
  geom_line(size = 1.5) +
  geom_point(size = 3) +
  labs(title = "Total Large Bags by Season-Year",
       x = "Season",
       y = "Total Large Bags",
       color = "Year",
       shape = "Year") +
  scale_y_continuous(labels = scales::comma) +
  scale_shape_manual(values = c(15, 16, 17), name = "Year") + # Assign shapes for years
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +  # Rotate x-axis labels
  guides(color = FALSE)  # Hide legend for color

# Create plot for Total XLarge Bags
plot_xlarge <- ggplot(conventional_summary, aes(x = season, y = TOTAL_XLARGE_BAGS, color = as.factor(YEAR), shape = as.factor(YEAR))) +
  geom_line(size = 1.5) +
  geom_point(size = 3) +
  labs(title = "Total XLarge Bags by Season-Year",
       x = "Season",
       y = "Total XLarge Bags",
       color = "Year",
       shape = "Year") +
  scale_y_continuous(labels = scales::comma) +
  scale_shape_manual(values = c(15, 16, 17), name = "Year") + # Assign shapes for years
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +  # Rotate x-axis labels
  guides(color = FALSE)  # Hide legend for color

# Combine the three plots
combined_plot <- grid.arrange(plot_small, plot_large, plot_xlarge, ncol = 1, 
                              top = "Total Bags by Season-Year")

# Print the combined plot
print(combined_plot)
#------------------------------------------------------
#Df5 hroup by region
# Summarize the data grouped by region
region_summary <- avocado %>%
  group_by(REGION) %>%
  summarise(AVG_AVERAGEPRICE = mean(AVERAGEPRICE),
            TOTAL_VOLUME = sum(TOTAL.VOLUME),
            TOTAL_HASS_4046 = sum(HASS.4046),
            TOTAL_FUERTE_4225 = sum(FUERTE.4225),
            TOTAL_BACON_4770 = sum(BACON.4770),
            TOTAL_TOTAL_BAGS = sum(TOTAL.BAGS),
            TOTAL_SMALL_BAGS = sum(SMALL.BAGS),
            TOTAL_LARGE_BAGS = sum(LARGE.BAGS),
            TOTAL_XLARGE_BAGS = sum(XLARGE.BAGS))

view(region_summary)


options(repr.plot.width = 10, repr.plot.height = 6)

ggplot(region_summary, aes(x = reorder(REGION, AVG_AVERAGEPRICE), y = AVG_AVERAGEPRICE)) +
  geom_boxplot(outlier.color = "#FF0000", outlier.shape = 16, fill = "#FFD700", color = "#FF0000") +
  scale_y_continuous(labels = scales::dollar) +
  theme_minimal() +
  labs(title = "Average Price of Avocado: Regional Markets",
       x = "Region",
       y = "Average Price",
       caption = "Data: Haas Avocado Board") +
  theme(panel.grid.minor = element_blank(),
        plot.title = element_text(size = 20, hjust = 0.5, color = "#006400", face = "bold"),
        legend.title = element_text(size = 12, color = "#006400"),
        legend.text = element_text(size = 10, color = "#006400"),
        legend.position = "top",
        axis.text.x = element_text(angle = 45, hjust = 1, color = "#000000"),
        axis.text.y = element_text(color = "#000000"),
        axis.title.x = element_text(color = "#006400", size = 14),
        axis.title.y = element_text(color = "#006400", size = 14),
        plot.caption = element_text(color = "#FF0000", size = 10))


# Sort the region_summary data frame by AVG_AVERAGEPRICE in ascending order
region_summary1 <- region_summary %>%
  filter(REGION != "TotalUS") %>%
  arrange(AVG_AVERAGEPRICE)

# Convert REGION to a factor with levels sorted by AVG_AVERAGEPRICE
region_summary1$REGION <- factor(region_summary1$REGION, levels = region_summary1$REGION)

# Create the heatmap
heatmap <- ggplot(region_summary1, aes(x = 1, y = REGION, fill = AVG_AVERAGEPRICE)) +
  geom_tile() +
  geom_text(aes(label = REGION), color = "black", size = 4) +  # Add text labels
  scale_fill_gradient(low = "#FFFFFF", high = "#1E90FF") +  # Set the color gradient
  labs(title = "Average Price Heatmap by Region",
       x = "",
       y = "",
       fill = "Average Price") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5),
        axis.text = element_blank(),  # Remove axis text
        axis.ticks = element_blank(),  # Remove axis ticks
        panel.grid = element_blank())  # Remove grid lines

# Print the heatmap
print(heatmap)



# Sort the region_summary1 data frame by TOTAL_VOLUME in ascending order
region_summary2 <- region_summary %>%
  filter(REGION != "TotalUS") %>%
  arrange(TOTAL_VOLUME)
# Convert REGION to a factor with levels sorted by AVG_AVERAGEPRICE
region_summary2$REGION <- factor(region_summary2$REGION, levels = region_summary2$REGION)

# Create the heatmap for total volume
heatmap_total_volume <- ggplot(region_summary2, aes(x = 1, y = REGION, fill = TOTAL_VOLUME)) +
  geom_tile() +
  geom_text(aes(label = REGION), color = "white", size = 4) +  # Add text labels
  scale_fill_gradient(low = "red", high = "yellow") +  # Set the color gradient
  labs(title = "Total Volume Heatmap by Region",
       x = "",
       y = "",
       fill = "Total Volume") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5),
        axis.text = element_blank(),  # Remove axis text
        axis.ticks = element_blank(),  # Remove axis ticks
        panel.grid = element_blank()) +  # Remove grid lines
  scale_fill_continuous(labels = scales::comma)  # Format fill values without scientific notation

# Print the heatmap for total volume
print(heatmap_total_volume)
# Combine the heatmaps into a single plot
combined_heatmaps <- grid.arrange(heatmap, heatmap_total_volume, ncol = 2)

# Print the combined heatmaps
print(combined_heatmaps)

view(region_summary)
#Fit ANOVA model
aov_model <- aov(AVG_AVERAGEPRICE ~ TOTAL_VOLUME, data = region_summary)

# Perform ANOVA test
anova_result <- anova(aov_model)

# Print ANOVA table
print(anova_result)


------------------------------------------------------------
#DF6 is kind of avocado affect on total volume sales?
avocado_kind <- monthly_summary %>%
  filter(TYPE %in% c("organic", "conventional")) %>%
  select(AVG_AVERAGEPRICE, TOTAL_VOLUME, TOTAL_HASS_4046, TOTAL_FUERTE_4225, TOTAL_BACON_4770, TOTAL_TOTAL_BAGS, TOTAL_SMALL_BAGS, TOTAL_LARGE_BAGS, TOTAL_XLARGE_BAGS)


# Define the conditions for each season
avocado_kind <- avocado_kind %>%
  mutate(SEASON = case_when(
    MONTH %in% c(12, 1, 2) ~ 1,
    MONTH %in% c(3, 4, 5) ~ 2,
    MONTH %in% c(6, 7, 8) ~ 3,
    MONTH %in% c(9, 10, 11) ~ 4
  ))
# Check the correlation between average price and other elements
correlation_matrix <- cor(avocado_kind[, c("AVG_AVERAGEPRICE", "TOTAL_VOLUME", "TOTAL_HASS_4046", "TOTAL_FUERTE_4225", "TOTAL_BACON_4770", "TOTAL_TOTAL_BAGS", "TOTAL_SMALL_BAGS", "TOTAL_LARGE_BAGS", "TOTAL_XLARGE_BAGS","SEASON")])

# Print correlation matrix
print(correlation_matrix)

# Plot correlation matrix
ggplot(data = melt(correlation_matrix), aes(x = Var1, y = Var2, fill = value)) +
  geom_tile() +
  geom_text(aes(label = round(value, 2)), color = "black", size = 5) +  # Add text labels
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", midpoint = 0, limit = c(-1,1), space = "Lab", name="Correlation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, size = 10, hjust = 1)) +
  coord_fixed()
```
</details>
<hr />

<h2 style="font-size: 19px;">1- Is the Average price affected over time?</h2>

<p>Let's explore the trend of the average price over time.</p>

<img src="file:///C:/Users/DLECU/OneDrive/Desktop/plot1.png" alt="Avg_price Over Time">


##### Observations:
1. **Organic Price Premium:** Organic avocados consistently command higher prices compared to conventional ones.
2. **Seasonal Trends:** Both organic and conventional avocados exhibit similar seasonal trends, with peak prices in the second half of each year and lower prices in the first half.
3. **Year 2017:** The year 2017 stands out with the lowest average price for organic avocados and the highest average price for both types, indicating a significant market shift or external influence.
   
##### Conclusion:
The plot suggests that avocado prices are influenced by seasonal factors and external market dynamics, with organic avocados maintaining a premium position throughout the observed period.

<hr />

<h2 style="font-size: 19px;">2- Is the Total Volume sold affected over time?</h2>

<p>Let's explore the trend of the total volume sold  over time.</p>

<img src="file:///C:/Users/DLECU/OneDrive/Desktop/plot2.png" alt="Total Volume sold Over Time">


##### Observations:

1. There is a strong **negative correlation** between average price and total volume sold. The peak in total volume sold occurs in the first half of each year when the average price is lower. Conversely, the dip in total volume sold happens in the second half of each year due to higher average prices for both conventional and organic avocados.

2. **Total volume sold** for conventional avocados is higher than organic avocados, as expected.

##### Conclusion:

The plot reveals a clear inverse relationship between average price and total volume sold. When average prices are lower, there is a corresponding increase in the total volume sold, and vice versa. Additionally, the observation confirms the expectation that conventional avocados have higher sales volume compared to organic avocados.

<hr />

<h2 style="font-size: 19px;">3- Is the Total bags sold affected over time?</h2>

<p>Let's explore the trend of the total Bags sold  over time.</p>

<img src="file:///C:/Users/DLECU/OneDrive/Desktop/plot3.png" alt="Total Bags sold Over Time">


##### Observations:

1. **Higher total volume sold** in each year corresponds to an increase in the total bags sold, indicating a growing consumption trend over time.

2. **Average price** affects total bags sold, as evidenced by the inverse relationship between average price and total bags sold. In the first half of every year, when the average price is lower, the total bags sold increase. Conversely, during the second half of each year, when the average selling price dips, the total bags sold decrease.

##### Conclusion:

The analysis reveals a clear relationship between average price, total volume sold, and total bags sold over time. As average prices fluctuate, they directly influence consumer behavior, impacting the total volume and bags sold. Understanding these trends can help stakeholders make informed decisions regarding pricing strategies and market demand.



<hr />

<h2 style="font-size: 22px;">4- Understanding monthly Fluctuations in Avocado Prices</h2>

<p>Let's explore the monthly fluctuations for both kind on year(2015:2017).</p>

<img src="file:///C:/Users/DLECU/OneDrive/Desktop/plot4.png" alt="Total Bags sold Over Time">

##### Observations:

1. **Organic Price Premium**: Organic avocados consistently command higher prices compared to conventional ones.
2. **Seasonal Trends**: Both organic and conventional avocados exhibit similar seasonal trends, with peak prices in the second half of each year, especially in September and October.
3. **Year 2017**: The year 2017 stands out with the lowest average price for organic avocados in February and the highest average price for both types in September, indicating a significant market shift or external influence.

##### Conclusion:

The plot highlights the premium price of organic avocados compared to conventional ones and reveals consistent seasonal trends in avocado pricing, with peak prices typically occurring in the second half of each year. The notable fluctuations in prices in 2017 suggest potential market dynamics or external factors influencing avocado pricing during that period. Understanding these trends can inform market strategies and decision-making processes for stakeholders in the avocado industry.


<hr />

<h2 style="font-size: 22px;">5- Understanding Seasonal Fluctuations in Avocado Volume sold:</h2>

<p>Let's explore the seasonal fluctuations for both kind on year(2015:2017).</p>

<img src="file:///C:/Users/DLECU/OneDrive/Desktop/plot5.png" alt="Total Bags sold Over Time">

##### Observations:

1. **Organic Price Premium**: Organic avocados consistently command higher prices compared to conventional ones.

2. **Seasonal Trends**: Both organic and conventional avocados exhibit similar seasonal trends in both 2015 and 2016 years, with peak in May for both kind.

3. **Year 2017**: The year 2017 stands out with an unexpected drop in total volume sold fluctuation from April to May.

##### Conclusion:

The Plot highlights the organic price premium, seasonal trends, and the anomaly observed in the year 2017. Understanding these fluctuations can aid in market forecasting and strategic decision-making for stakeholders.




<hr />

<h2 style="font-size: 22px;">6- Quarterly Analysis of Avocado Volume Sold:</h2>

<p>To further understand the seasonal trends, let's analyze the avocado volume sold by each quarter from 2015 to 2017.</p>

<img src="file:///C:/Users/DLECU/OneDrive/Desktop/plot6.png" alt="Quarterly Average Price">
<img src="file:///C:/Users/DLECU/OneDrive/Desktop/plot7.png" alt="Quarterly Avocado Volume
Sold">

##### Observations:

1. **Quarterly Peaks**: Both types of avocados show significant volume changes at the end of each quarter.
2. **Price Sensitivity**: Lower average prices in the first quarter correlate with higher sales volumes, indicating price sensitivity among consumers.
3. **Anomalies**: Specific quarters, such as Q2 2017, show unusual volume drops which warrant further investigation.

##### Conclusion:

Quarterly analysis reveals distinct sales patterns and consumer behavior trends influenced by price fluctuations. These insights are crucial for optimizing supply chain strategies and market planning.

<hr />

<h2 style="font-size: 22px;">7- Detailed Quarterly Summary Analysis:</h2>

<p>Based on the detailed quarterly summary analysis from 2015 to 2018, we observe the following:</p>

<img src="file:///C:/Users/DLECU/OneDrive/Desktop/plot9.png" alt="Detailed Quarterly Summary Analysis">

##### Observations:

1. **Steady Increase in Total Volume Sold**: Organic avocados show a steady increase in total volume sold each year.
2. **Consistent Seasonal Peaks**: Seasonal peaks for both organic and conventional avocados are consistent across 2015 and 2016, particularly in May.
3. **2017 Anomaly**: A noticeable anomaly in 2017 with a drop in total volume sold from April to May for both types.
4. **Volume and Price Relationship**: Lower average prices in Q1 correlate with higher sales volumes, while higher average prices in Q2 and Q3 correlate with lower sales volumes.

##### Conclusion:

The detailed quarterly analysis underscores the importance of monitoring seasonal trends and price fluctuations to better understand market dynamics. The data reveals consistent seasonal patterns and highlights anomalies that may impact strategic planning for avocado sales.




<hr />

<h2 style="font-size: 22px;">8-Quarterly Analysis of Avocado Prices and Volumes:</h2>

<p>The provided data summarizes the average prices and total volumes of different types of avocados across various quarters from 2015 to 2018. The following observations and conclusions are derived from this data.</p>

<img src="file:///C:/Users/DLECU/OneDrive/Desktop/plot10.png" alt="Quarterly Analysis Plot">

##### Observations:

1. **Average Prices**:
   - The average price of avocados shows a general upward trend from 2015 to 2017, peaking in Q3 2017 at 1.701611.
   - There is a slight decline in average prices observed in Q4 2017 (1.544529) and Q1 2018 (1.347531).

2. **Total Volume Sold**:
   - The total volume sold also displays fluctuations across quarters, with notable peaks in Q2 2015 (1209755184) and Q2 2016 (1373391436).
   - The highest total volume sold was in Q1 2018 (1382738340), indicating a potential increase in avocado consumption over the years.

3. **Yearly Trends**:
   - Each year shows distinct seasonal fluctuations, with some quarters exhibiting higher volumes and average prices than others.
   - The year 2017 stands out with the highest average prices in Q3 (1.701611) and significant total volumes in Q2 (1325955722) and Q4 (1146630125).

4. **Volume Distribution by Type**:
   - The total volumes of specific types (Hass, Fuerte, Bacon) and bags (small, large, x-large) are also provided, indicating detailed distribution patterns.
   - For example, Q2 2017 has high volumes of Hass (445262206) and a significant number of small bags (349493676).

##### Conclusion:

The quarterly data analysis reveals several key insights into the avocado market between 2015 and 2018:

- **Price Trends**: There is a clear upward trend in average avocado prices, with the highest prices observed in 2017. This indicates increasing demand or possible supply constraints during this period.
- **Volume Fluctuations**: The total volume sold fluctuates significantly across quarters, with peaks often seen in the second quarter of each year. This suggests a seasonal pattern in avocado consumption.
- **Type and Bag Distribution**: Different types of avocados and bag sizes show varying trends, which can help in understanding consumer preferences and market segmentation.
- **Market Dynamics**: The data highlights important market dynamics, such as the highest volume and price occurring in different years and quarters. Stakeholders can use these insights for better inventory management and pricing strategies.

By closely monitoring these trends, producers, retailers, and marketers can make more informed decisions to optimize their operations and meet consumer demand effectively.




<hr />

<h2 style="font-size: 22px;">9- Understanding Seasonal Fluctuations in Avocado Prices</h2>

<p>Let's explore the seasonal fluctuations for both types of avocados from 2015 to 2017.</p>

<img src="file:///C:/Users/DLECU/OneDrive/Desktop/plot11.png" alt="">
<img src="file:///C:/Users/DLECU/OneDrive/Desktop/plot12.png" alt="">

##### Observations:

1. **Fall Season**:
    - In 2015, the average price for conventional avocados was 1.05, while for organic avocados, it was 1.72.
    - By 2017, the average price for conventional avocados increased to 1.49, and for organic avocados to 1.92.
    - There is a noticeable increase in the total volume sold for both types from 2015 to 2017.

2. **Spring Season**:
    - The average price for conventional avocados was lower in 2016 (0.96) compared to 2015 (1.10), and it increased again in 2017 to 1.29.
    - Organic avocado prices showed a consistent increase from 1.62 in 2015 to 1.67 in 2017.
    - The total volume sold saw significant fluctuations, with a peak in 2016 for conventional avocados.

3. **Summer Season**:
    - The average price for conventional avocados showed an upward trend from 1.10 in 2015 to 1.35 in 2017.
    - Organic avocados followed a similar trend, with prices rising from 1.73 in 2015 to 1.85 in 2017.
    - The total volume sold increased over the years, with a substantial rise in 2017 for both types.

4. **Winter Season**:
    - The average price for conventional avocados slightly decreased from 1.05 in 2015 to 1.00 in 2016, then increased to 1.06 in 2017.
    - Organic avocado prices were relatively stable, with a slight decrease from 1.61 in 2015 to 1.52 in 2016, and then a slight increase to 1.52 in 2017.
    - There was a marked increase in the total volume sold from 2015 to 2017 for both types.

##### Conclusion:

The seasonal analysis of avocado prices and volumes from 2015 to 2017 reveals several trends:
- Organic avocados consistently command higher prices compared to conventional ones across all seasons.
- There are noticeable seasonal trends, with prices generally increasing over the years.
- The total volume of avocados sold has also increased significantly, indicating a growing demand.
- Understanding these seasonal fluctuations can help stakeholders in the avocado market to make informed decisions regarding pricing, supply chain management, and marketing strategies.




<hr />

<h2 style="font-size: 22px;">10- Is Average Price Affecting Customer Consumption?</h2>

<p>Let's see customer thinking in seasonal fluctuations for both types of avocados from 2015 to 2017.</p>

<img src="file:///C:/Users/DLECU/OneDrive/Desktop/plot13.png" alt="">
<img src="file:///C:/Users/DLECU/OneDrive/Desktop/plot14.png" alt="">


##### Observations:
1. **The highest kind of packs** sold is the small, followed by the largest and then the x-large, as expected.
2. **With the lower average price** , there is an increase in both small and large packs, while with the high average price, the demand for small and large packs decreases. 
3. **The x-large pack**  increases with the average price peak for small packs with the lower price, stays average with the average price, and decreases with the highest price. The same trend is observed for large packs.
   
##### Conclusion:
The analysis suggests that average price indeed affects customer consumption preferences for different pack sizes of avocados. When the average price is lower, customers tend to buy more small and large packs, while they decrease their purchases of x-large packs. Conversely, when the average price is higher, customers shift their preferences towards smaller packs. This understanding can help avocado producers and marketers adjust their pricing strategies to align with customer preferences and maximize sales.




<hr />

<h2 style="font-size: 22px;">10- Is Region Affected by Average Price?</h2>

<p>Let's see if regions are influenced by seasonal fluctuations in average avocado prices from 2015 to 2017.</p>

<img src="file:///C:/Users/DLECU/OneDrive/Desktop/plot15.png" alt="">


<img src="file:///C:/Users/DLECU/OneDrive/Desktop/plot18.png" alt="">


##### Observations:
1. There seems to be variation in avocado consumption patterns across different regions in response to changes in average prices.
2. Certain regions exhibit more sensitivity to price fluctuations compared to others.
3. Regions with higher average prices may experience changes in total volume and the distribution of avocado pack sizes.
4. It's evident that some regions consistently maintain higher average prices, while others have more price volatility over time.

##### Conclusion:
The analysis suggests that average prices do indeed have an impact on avocado consumption patterns across various regions. Regions with higher average prices may experience shifts in total volume sold and preferences for different pack sizes. Understanding these regional differences can help avocado producers and marketers tailor their strategies to better meet the demands of each market and optimize sales.



<hr />

<h2 style="font-size: 22px;">11- Correlation between Average Price and Other Factors Depending on Monthly Report</h2>

<p>Let's examine the correlation between average price and other factors based on monthly reports.</p>

<img src="file:///C:/Users/DLECU/OneDrive/Desktop/plot16.png" alt="">

##### Observations:
1. There is a strong negative correlation between average price and total volume sold, indicating that as the average price increases, the total volume tends to decrease.
2. Similar strong negative correlations are observed between average price and the total volume of Hass avocados (4046), Fuerte avocados (4225), and Bacon avocados (4770).
3. The total number of bags sold (both small, large, and extra-large) also shows negative correlations with average price, albeit slightly weaker than the correlations with total volume.
4. Among the bag sizes, the small bags exhibit the strongest negative correlation with average price, followed by large bags and extra-large bags.

##### Conclusion:
The analysis reveals significant negative correlations between average price and various factors such as total volume sold and bag sizes. This suggests that consumers may be more price-sensitive, leading to changes in purchasing behavior in response to fluctuations in avocado prices. Understanding these correlations can help avocado producers and retailers adjust their pricing strategies and inventory management to optimize sales and maximize profitability.

<hr />
<hr />



<h3 id="perform-predictive-analysis-using-regression-models">Perform Predictive Analysis Using Regression Models</h3>
<p>For our predictive analysis, we utilize regression models to forecast avocado prices based on various factors such as sales volumes, types, and regions. By gathering data from reputable sources, we ensure the accuracy and reliability of our predictions.</p>

 
 <details>
  <summary>Show Code</summary>
  
  ```r
# Simple linear regression on AVERAGEPRICE and TOTAL.VOLUME
simple_lm <- lm(AVERAGEPRICE ~ TOTAL.VOLUME, data = models_training)
# Predictions using simple linear regression model
simple_lm_pred <- predict(simple_lm, newdata = models_testing)

# Calculate RMSE for simple linear regression model
rmse_simple_lm <- sqrt(mean((simple_lm_pred - actual_values)^2))

# Calculate R-squared for simple linear regression model
r_squared_simple_lm <- cor(simple_lm_pred, actual_values)^2

# Calculate MAPE (Mean Absolute Percentage Error)
mape_simple_lm <- mean(abs((actual_values - simple_lm_pred) / actual_values)) * 100



# Predict on the training and testing datasets for each model
training_results_simple_lm <- data.frame(truth = models_training$AVERAGEPRICE, .pred = predict(simple_lm))
testing_results_simple_lm <- data.frame(truth = models_testing$AVERAGEPRICE, .pred = predict(simple_lm, newdata = models_testing))

# Add a column indicating the dataset type (training or testing)
training_results_simple_lm$train <- "training"
testing_results_simple_lm$train <- "testing"

# Combine the datasets
combined_results_simple_lm <- bind_rows(training_results_simple_lm, testing_results_simple_lm)

# Define custom colors
real_color <- "purple"  # 
predicted_color <- "green"  #
ggplot(combined_results_simple_lm, aes(truth, .pred)) +
  geom_abline(lty = 2, color = "orange", size = 1.5) +
  geom_point(color = "#006EA1", alpha = 0.5) +
  facet_wrap(~ train) +
  labs(x = "Truth", y = "Predicted") +
  ggtitle("Evaluation of Simple Linear Regression Model") +
  theme(plot.background = element_rect(fill = "#a2fef3"),
        plot.title = element_text(face = "bold", color = "black", hjust = 0.5))  # Centering the title


testing_results_simple_lm <- data.frame(truth = models_testing$AVERAGEPRICE, predicted1 = predict(simple_lm, newdata = models_testing))

# Plot the results for the multiple linear regression model on testing dataset
ggplot(testing_results_simple_lm, aes(x = truth, y = predicted1)) +
  geom_abline(lty = 2, color = "gray40", size = 1.2) +
  geom_jitter(aes(color = "Real Values"), alpha = 0.5, width = 0.1) +  # Add jitter for real values
  geom_jitter(aes(color = "Predicted Values"), alpha = 0.5, width = 0.1) +  # Add jitter for predicted values
  labs(x = "Real Values (Testing)", y = "Predicted Values") +
  scale_color_manual(values = c("Real Values" = real_color, "Predicted Values" = predicted_color)) +
  ggtitle("Comparison of Real and Predicted Values on Testing Dataset") +
  theme_minimal() +
  theme(plot.background = element_rect(fill = "#fef3a2"),plot.title = element_text(face = "bold", color = "black",hjust = 0.5))




# Multiple linear regression on AVERAGEPRICE with all other variables
multiple_lm <- lm(AVERAGEPRICE ~ ., data = models_training)

# Predictions using multiple linear regression model
multiple_lm_pred <- predict(multiple_lm, newdata = models_testing)

# Calculate RMSE for the multiple linear regression model
rmse_multiple_lm <- sqrt(mean((multiple_lm_pred - actual_values)^2))

# Calculate R-squared for the multiple linear regression model
r_squared_multiple_lm <- cor(multiple_lm_pred, actual_values)^2

# Calculate MAPE for the multiple linear regression model
mape_multiple_lm <- mean(abs((actual_values - multiple_lm_pred) / actual_values)) * 100


# Predict on the training and testing datasets for each model
training_results_multiple_lm <- data.frame(truth = models_training$AVERAGEPRICE, .pred = predict(multiple_lm))
testing_results_multiple_lm <- data.frame(truth = models_testing$AVERAGEPRICE, .pred = predict(multiple_lm, newdata = models_testing))

# Add a column indicating the dataset type (training or testing)
training_results_multiple_lm$train <- "training"
testing_results_multiple_lm$train <- "testing"

# Combine the datasets
combined_results_multiple_lm <- bind_rows(training_results_multiple_lm, testing_results_multiple_lm)

# Plot the results for the multiple linear regression model
ggplot(combined_results_multiple_lm, aes(truth, .pred)) +
  geom_abline(lty = 2, color = "orange", size = 1.5) +
  geom_point(color = "#006EA1", alpha = 0.5) +
  facet_wrap(~ train) +
  labs(x = "Truth", y = "Predicted") +
  ggtitle("Evaluation of Multiple Linear Regression Model") +
  theme(plot.background = element_rect(fill  = "#a2fef3"),plot.title = element_text(face = "bold", color = "black",hjust = 0.5))



# Predictions using multiple linear regression model on testing dataset
testing_results_multiple_lm <- data.frame(truth = models_testing$AVERAGEPRICE, 
                                          predicted = predict(multiple_lm, newdata = models_testing))


# Plot the results for the multiple linear regression model on testing dataset
ggplot(testing_results_multiple_lm, aes(x = truth, y = predicted)) +
  geom_abline(lty = 2, color = "gray40", size = 1.2) +
  geom_jitter(aes(color = "Real Values"), alpha = 0.5, width = 0.1) +  # Add jitter for real values
  geom_jitter(aes(color = "Predicted Values"), alpha = 0.5, width = 0.1) +  # Add jitter for predicted values
  labs(x = "Real Values (Testing)", y = "Predicted Values") +
  scale_color_manual(values = c("Real Values" = real_color, "Predicted Values" = predicted_color)) +
  ggtitle("Comparison of Real and Predicted Values on Testing Dataset") +
  theme_minimal() +
  theme(plot.background = element_rect(fill = "#fef3a2"),plot.title = element_text(face = "bold", color = "black",hjust = 0.5))



# Run polynomial regression model
poly_model <- lm(AVERAGEPRICE ~ poly(TOTAL.VOLUME, 2) + poly(HASS.4046, 2) + poly(FUERTE.4225, 2) + poly(BACON.4770, 2) + poly(TOTAL.BAGS, 2) + poly(SMALL.BAGS, 2) + poly(LARGE.BAGS, 2) + poly(XLARGE.BAGS, 2) + poly(quarter, 2), data = models_training)

view(models_training)
# Predictions using polynomial regression model
poly_model_pred <- predict(poly_model, newdata = models_testing)

# Calculate RMSE for polynomial regression model
rmse_poly_model <- sqrt(mean((poly_model_pred - actual_values)^2))

# Calculate R-squared for polynomial regression model
r_squared_poly_model <- cor(poly_model_pred, actual_values)^2

# Calculate MAPE (Mean Absolute Percentage Error)
mape_poly_model <- mean(abs((actual_values - poly_model_pred) / actual_values)) * 100



# Run multiple linear regression model with significant predictors
# Convert TYPE into dummy variables (one-hot encoding)
models_training_b <- models_training %>%
  mutate(TYPE_organic = ifelse(TYPE == "organic", 1, 0),
         TYPE_conventional = ifelse(TYPE == "conventional", 1, 0))
models_testing_b <- models_testing %>%
  mutate(TYPE_organic = ifelse(TYPE == "organic", 1, 0),
         TYPE_conventional = ifelse(TYPE == "conventional", 1, 0))

# Fit multiple linear regression model with interactions including TYPE
mlm_model <- lm(AVERAGEPRICE ~ TOTAL.VOLUME * HASS.4046 * FUERTE.4225 * BACON.4770 * TOTAL.BAGS * SMALL.BAGS * quarter * region_id * TYPE_organic * TYPE_conventional, data = models_training_b)
mlm_model_pred <- predict(mlm_model, newdata = models_testing_b)

# Make predictions on the testing data
mlm_model_pred <- predict(mlm_model, newdata = models_testing_b)

# Calculate RMSE
rmse_mlm_model <- sqrt(mean((mlm_model_pred - actual_values2)^2))

# Calculate R-squared
r_squared_mlm_model <- cor(mlm_model_pred, actual_values2)^2

# Calculate MAPE
mape_mlm_model <- mean(abs((actual_values2 - mlm_model_pred) / actual_values2)) * 100



# Summary of simple linear regression
summary(simple_lm)
summary(multiple_lm)
summary(poly_model)
summary(mlm_model)

# Print the evaluation metrics
cat("Simple Linear Regression RMSE:", rmse_simple_lm, "\n")
cat("Simple Linear Regression R-squared:", r_squared_simple_lm, "\n")
cat("Simple Linear Regression MAPE:", mape_simple_lm, "%\n")


# Print the results
cat("Multiple Linear Regression RMSE:", rmse_multiple_lm, "\n")
cat("Multiple Linear Regression R-squared:", r_squared_multiple_lm, "\n")
cat("Multiple Linear Regression MAPE:", mape_multiple_lm, "%\n")

# Print the evaluation metrics
cat("Polynomial Regression RMSE:", rmse_poly_model, "\n")
cat("Polynomial Regression R-squared:", r_squared_poly_model, "\n")
cat("Polynomial Regression MAPE:", mape_poly_model, "%\n")

# Print results
cat("Multiple Linear Regression with Significant Predictors RMSE:", rmse_mlm_model, "\n")
cat("Multiple Linear Regression with Significant Predictors R-squared:", r_squared_mlm_model, "\n")
cat("Multiple Linear Regression with Significant Predictors MAPE:", mape_mlm_model, "%\n")




# Create a dataframe
evaluation_df <- data.frame(
  Model = c("Simple Linear Regression", "Multiple Linear Regression", "Polynomial Regression", "Multiple Linear Regression with Significant Predictors"),
  RMSE = c(rmse_simple_lm, rmse_multiple_lm, rmse_poly_model, rmse_mlm_model),
  R_squared = c(r_squared_simple_lm, r_squared_multiple_lm, r_squared_poly_model, r_squared_mlm_model),
  MAPE = c(mape_simple_lm, mape_multiple_lm, mape_poly_model, mape_mlm_model)
)

# Print the dataframe
view(evaluation_df)
# Assuming avocado_data is your original data frame
organic_data <- avocado_data %>%
  group_by(REGION, MONTH_YEAR, TYPE) %>%
  summarise(
    Avg_price = mean(AVERAGEPRICE),
    Total_VOLUME = sum(TOTAL.VOLUME),
    Total_HASS_4046 = sum(HASS.4046),
    Total_FUERTE_4225 = sum(FUERTE.4225),
    Total_BACON_4770 = sum(BACON.4770),
    Total_TOTAL_BAGS = sum(TOTAL.BAGS),
    Total_SMALL_BAGS = sum(SMALL.BAGS),
    Total_LARGE_BAGS = sum(LARGE.BAGS),
    Total_XLARGE_BAGS = sum(XLARGE.BAGS)
  ) %>%
  filter(TYPE == "organic")

view(organic_data)


# Assuming avocado_data is your original data frame
conventional_data <- avocado_data %>%
  group_by(REGION, MONTH_YEAR, TYPE) %>%
  summarise(
    Avg_price = mean(AVERAGEPRICE),
    Total_VOLUME = sum(TOTAL.VOLUME),
    Total_HASS_4046 = sum(HASS.4046),
    Total_FUERTE_4225 = sum(FUERTE.4225),
    Total_BACON_4770 = sum(BACON.4770),
    Total_TOTAL_BAGS = sum(TOTAL.BAGS),
    Total_SMALL_BAGS = sum(SMALL.BAGS),
    Total_LARGE_BAGS = sum(LARGE.BAGS),
    Total_XLARGE_BAGS = sum(XLARGE.BAGS)
  ) %>%
  filter(TYPE == "conventional")

view(conventional_data)

# Convert MONTH_YEAR to character format
monthly_region$MONTH_YEAR <- as.character(monthly_region$MONTH_YEAR)

# Convert MONTH_YEAR to Date format
monthly_region$MONTH_YEAR <- as.Date(paste0(monthly_region$MONTH_YEAR, "-01"))


# Convert MONTH_YEAR to character format
conventional_data$MONTH_YEAR <- as.character(conventional_data$MONTH_YEAR)

# Convert MONTH_YEAR to Date format
conventional_data$MONTH_YEAR <- as.Date(paste0(conventional_data$MONTH_YEAR, "-01"))


# Convert MONTH_YEAR to character format
organic_data$MONTH_YEAR <- as.character(organic_data$MONTH_YEAR)

# Convert MONTH_YEAR to Date format
organic_data$MONTH_YEAR <- as.Date(paste0(organic_data$MONTH_YEAR, "-01"))

# View the first few rows of the resulting data frame
view(organic_data)

getwd()
# Assuming your summarized data frame is called monthly_region
# Save the data frame to an RDS file
saveRDS(monthly_region, "monthly_region.RDS")
saveRDS(organic_data, "organic_data.RDS")
saveRDS(conventional_data, "conventional_data.RDS")

view(organic_data)
# Load your summarized data frame
monthly_region <- readRDS("monthly_region.RDS")
# Load your summarized data frames
monthly_region <- readRDS("monthly_region.RDS")
organic_data <- readRDS("organic_data.RDS")
conventional_data <- readRDS("conventional_data.RDS")



ts_data <- ts(models$AVERAGEPRICE, start = c(min(models$YEAR), 1), frequency = 4)


arima_model <- auto.arima(ts_data)


forecast_values <- forecast(arima_model, h = 4)


print(forecast_values)

# Extract actual values from models for the forecast period
actual_year <- max(models$YEAR)
actual_values <- models$AVERAGEPRICE[
  models$YEAR == actual_year | models$YEAR == (actual_year + 1)
]

# Ensure the actual_values vector only contains the required number of values (4 quarters)
actual_values <- tail(actual_values, 4)

# Ensure actual_values has the same length as forecast_values$mean
if(length(actual_values) != length(forecast_values$mean)) {
  stop("Length of actual_values does not match the length of forecasted values.")
}

# Calculate evaluation metrics
calculate_metrics <- function(actual, predicted) {
  me <- mean(predicted - actual)
  rmse <- sqrt(mean((predicted - actual)^2))
  mae <- mean(abs(predicted - actual))
  mpe <- mean((predicted - actual) / actual) * 100
  mape <- mean(abs((predicted - actual) / actual)) * 100
  mase <- mae / mean(abs(diff(actual)))  # Using mean absolute error of naive forecast as scaling factor
  list(ME = me, RMSE = rmse, MAE = mae, MPE = mpe, MAPE = mape, MASE = mase)
}

arima_metrics <- calculate_metrics(actual_values, forecast_values$mean)

# Print the evaluation metrics for ARIMA
cat("ARIMA ME:", arima_metrics$ME, "\n")
cat("ARIMA RMSE:", arima_metrics$RMSE, "\n")
cat("ARIMA MAE:", arima_metrics$MAE, "\n")
cat("ARIMA MPE:", arima_metrics$MPE, "%\n")
cat("ARIMA MAPE:", arima_metrics$MAPE, "%\n")
cat("ARIMA MASE:", arima_metrics$MASE, "\n")

# Plot residuals
plot(residuals(arima_model), main = "ARIMA Residuals")

# ACF and PACF of residuals
acf(residuals(arima_model), main = "ACF of ARIMA Residuals")
pacf(residuals(arima_model), main = "PACF of ARIMA Residuals")

# Ljung-Box test for autocorrelation in residuals
print(Box.test(residuals(arima_model), type = "Ljung-Box"))

# Seasonal ARIMA model
sarima_model <- auto.arima(ts_data, seasonal = TRUE)

# Forecasting
sarima_forecast <- forecast(sarima_model, h = length(actual_values))

sarima_metrics <- calculate_metrics(actual_values, sarima_forecast$mean)

# Print the evaluation metrics for SARIMA
cat("SARIMA ME:", sarima_metrics$ME, "\n")
cat("SARIMA RMSE:", sarima_metrics$RMSE, "\n")
cat("SARIMA MAE:", sarima_metrics$MAE, "\n")
cat("SARIMA MPE:", sarima_metrics$MPE, "%\n")
cat("SARIMA MAPE:", sarima_metrics$MAPE, "%\n")
cat("SARIMA MASE:", sarima_metrics$MASE, "\n")
  ```
</details>


<hr />
<hr />
<h3 style="font-size: 19px;">I- Comparison between predicted and real data using Simple linear regression</h3>

<p>Let's  displaying a comparison between predicted and real data using a Simple linear regression model, along with a plot showing "Avg_price ..</p>

<img src="file:///C:/Users/DLECU/OneDrive/Desktop/plot21.png" alt="Avg_price Over Time">

<hr />
<hr />
<h3 style="font-size: 19px;">II- Comparison between predicted and real data using Multiple  linear regression</h3>

<p>Let's  displaying a comparison between predicted and real data using a Multiple linear regression model, along with a plot showing "Avg_price ..</p>

<img src="file:///C:/Users/DLECU/OneDrive/Desktop/plot22.png" alt="Avg_price Over Time">


<hr />
<hr />
<h3 style="font-size: 19px;">III- Comparison between models evaluation</h3>

<p>Let's  compare between differnt models using Rmse R-squred & Mape</p>

<img src="file:///C:/Users/DLECU/OneDrive/Desktop/plot23.png" alt="">,<img src="file:///C:/Users/DLECU/OneDrive/Desktop/plot24.png" alt="">




<p>After conducting a thorough evaluation of various regression models, including:</p>
<ul>
  <li>Simple Linear Regression</li>
  <li>Multiple Linear Regression</li>
  <li>Polynomial Regression</li>
  <li>Multiple Linear Regression with Significant Predictors</li>
  <li>Arima Model</li>
</ul>
<p>It is evident that:</p>
<ul>
  <li>The Simple Linear Regression model exhibits a relatively high Mean Absolute Percentage Error (MAPE) of 25.85% and a low R-squared value of 0.034, indicating limited predictive accuracy and explanatory power, respectively.</li>
  <li>On the other hand, the Multiple Linear Regression model outperforms all others with a significantly lower MAPE of 14.55% and a substantially higher R-squared value of 0.612.</li>
</ul>
<p>This indicates that the Multiple Linear Regression model provides:</p>
<ul>
  <li>More accurate predictions</li>
  <li>Better explanation of the variance in avocado prices compared to the other models evaluated.</li>
</ul>
<p>Therefore, for accurate and reliable predictions of avocado prices, the Multiple Linear Regression model with its comprehensive set of predictors stands out as the preferred choice.</p>

<hr />
<hr />



<h3 id="build-an-r-shiny-dashboard-app">build-an-r-shiny-dashboard-app</h3>
<p>We will build a Shiny web application that contains five types of plots: the real average price of avocado, total volume over time, total bags over time, predicted next average price using the ARIMA model, and predicted total volume sold using the real current data of average price and other elements like total bags and type of avocado.</p>

<p>You can access the application at the following URL: <a href="https://n0gl91-adel-alaa.shinyapps.io/avocado/">https://n0gl91-adel-alaa.shinyapps.io/avocado/</a></p>

<details>
  <summary>Show Code</summary>
  
  ```r
  library(shiny)
library(ggplot2)
library(dplyr)
library(scales)
library(forecast)
library(zoo)

# Load your summarized data frames
monthly_region <- readRDS("monthly_region.RDS")
organic_data <- readRDS("organic_data.RDS")
conventional_data <- readRDS("conventional_data.RDS")

# Load US region data
us_regions <- read.csv("Us-Region.csv")

# Define UI
ui <- fluidPage(
  tags$head(
    tags$style(HTML("
      /* Add your custom CSS styles here */
      #sidebar-content {
        animation: slideInLeft 0.5s forwards;
      }
      .plot-title {
        color: #333;
        text-align: center;
        margin-bottom: 20px;
      }
      .plot-container {
        background-color: #f5f5f5;
        border-radius: 10px;
        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
        padding: 20px;
        margin-bottom: 20px;
      }
    "))
  ),
  titlePanel("Avocado Sales Analysis"),
  
  sidebarLayout(
    sidebarPanel(
      id = "sidebar-content",
      selectInput("region", "Choose a Region:", choices = unique(monthly_region$REGION)),
      selectInput("type", "Choose Type:", choices = c("All", "Organic", "Conventional"))
    ),
    
    mainPanel(
      div(class = "plot-container",
          plotOutput("avgPricePlot"),
          h3("Average Price Over Time", class = "plot-title")),
      
      div(class = "plot-container",
          plotOutput("totalVolumePlot"),
          h3("Total Volume Over Time", class = "plot-title")),
      
      div(class = "plot-container",
          plotOutput("totalBagsPlot"),
          h3("Total Bags Over Time", class = "plot-title")),
      
      div(class = "plot-container",
          plotOutput("predictedAvgPricePlot"),
          h3("Predicted Average Price for Next 4 Quarters", class = "plot-title")),
      
      div(class = "plot-container",
          plotOutput("predictedTotalVolumePlot"),
          h3("Predicted Total Volume for Next Month (Multiple Linear Regression)", class = "plot-title"))
    )
  )
)

# Define server logic
server <- function(input, output) {
  # Reactive function to filter data based on selected region and type
  selectedData <- reactive({
    if (input$type == "All") {
      filter(monthly_region, REGION == input$region)
    } else if (input$type == "Organic") {
      filter(organic_data, REGION == input$region)
    } else if (input$type == "Conventional") {
      filter(conventional_data, REGION == input$region)
    }
  })
  
  # Model Training (Ensure it's within a reactive expression)
  model_lm <- reactive({
    lm(Total_VOLUME ~ Avg_price + Total_TOTAL_BAGS, data = selectedData())
  })
  
  # Render average price plot
  output$avgPricePlot <- renderPlot({
    ggplot(selectedData(), aes(x = MONTH_YEAR, y = Avg_price)) +
      geom_line(color = "blue") +
      labs(x = "Month", y = "Average Price") +
      theme_minimal()
  })
  
  # Render total volume plot
  output$totalVolumePlot <- renderPlot({
    ggplot(selectedData(), aes(x = MONTH_YEAR, y = Total_VOLUME)) +
      geom_line(color = "green") +
      labs(x = "Month", y = "Total Volume") +
      theme_minimal()
  })
  
  # Render total bags plot
  output$totalBagsPlot <- renderPlot({
    ggplot(selectedData(), aes(x = MONTH_YEAR, y = Total_TOTAL_BAGS)) +
      geom_line(color = "orange") +
      labs(x = "Month", y = "Total Bags") +
      theme_minimal()
  })
  
  # Render predicted average price plot
  output$predictedAvgPricePlot <- renderPlot({
    # Convert data to time series
    ts_data <- ts(selectedData()$Avg_price, start = c(min(selectedData()$MONTH_YEAR), 1), frequency = 4)
    # Apply ARIMA model
    arima_model <- auto.arima(ts_data)
    # Generate forecasts for the next 4 quarters
    forecast_values <- forecast(arima_model, h = 4)
    # Plot forecasts
    autoplot(forecast_values) + 
      labs(y = "Predicted Avg Price") +
      theme_minimal() +
      scale_x_yearqtr(format = "%Y Q%q")
  })
  
  # Render predicted total volume plot using multiple linear regression model
  output$predictedTotalVolumePlot <- renderPlot({
    req(selectedData())  # Ensure selectedData is available
    model <- model_lm()  # Retrieve the reactive model
    # Prepare data for prediction (use the latest available month)
    latest_month_data <- tail(selectedData(), 1)
    # Predict total volume for the next month
    predicted_total_volume <- predict(model, newdata = latest_month_data)
    # Plot predicted total volume
    ggplot() +
      geom_point(aes(x = 1, y = predicted_total_volume), color = "red", size = 3) +
      labs(y = "Predicted Total Volume") +
      theme_minimal()
  })
}

# Run the application
shinyApp(ui = ui, server = server)

```

</details>


<hr />
<hr />

<h2 id="results">Results</h2>
1. **Average Price Over Time**:
   - **Organic Price Premium**: Organic avocados consistently command higher prices compared to conventional ones.
   - **Seasonal Trends**: Both organic and conventional avocados exhibit similar seasonal trends, with peak prices in the second half of each year and lower prices in the first half.
   - **Year 2017 Anomaly**: The year 2017 stands out with the lowest average price for organic avocados and the highest average price for both types, indicating a significant market shift or external influence.

2. **Total Volume Sold Over Time**:
   - There is a strong negative correlation between average price and total volume sold, with peaks in total volume sold in the first half of each year when prices are lower.
   - Total volume sold for conventional avocados is higher than for organic avocados.

3. **Total Bags Sold Over Time**:
   - Higher total volume sold corresponds to an increase in total bags sold, indicating a growing consumption trend over time.
   - There is an inverse relationship between average price and total bags sold, with higher sales volumes in the first half of each year when prices are lower.

4. **Monthly Fluctuations in Avocado Prices**:
   - Organic avocados consistently command higher prices.
   - Both organic and conventional avocados exhibit peak prices in the second half of each year, particularly in September and October.

5. **Seasonal Fluctuations in Avocado Volume Sold**:
   - Organic avocados consistently command higher prices compared to conventional ones.
   - Both types show peak volumes sold in May for 2015 and 2016, with an anomaly in 2017.

6. **Quarterly Analysis of Avocado Volume Sold**:
   - Significant volume changes are observed at the end of each quarter.
   - Lower prices in the first quarter correlate with higher sales volumes, indicating price sensitivity among consumers.

7. **Detailed Quarterly Summary Analysis**:
   - Steady increase in total volume sold for organic avocados each year.
   - Seasonal peaks are consistent across 2015 and 2016, particularly in May.
   - Anomaly in 2017 with a drop in total volume sold from April to May for both types.

8. **Quarterly Analysis of Avocado Prices and Volumes**:
   - There is an upward trend in average avocado prices, peaking in Q3 2017.
   - Total volume sold fluctuates, with peaks in the second quarter of each year.

9. **Seasonal Fluctuations in Avocado Prices**:
   - Organic avocados consistently command higher prices.
   - Prices generally increase over the years, with the total volume sold also increasing significantly.

10. **Impact of Average Price on Customer Consumption**:
    - When the average price is lower, there is an increase in both small and large packs, while demand for x-large packs decreases.

11. **Regional Influence on Average Price**:
    - Variation in avocado consumption patterns across regions in response to price changes.
    - Some regions maintain higher average prices while others show more price volatility.

12. **Correlation Between Average Price and Other Factors**:
    - Strong negative correlation between average price and total volume sold.
    - Negative correlations also observed between average price and the total volume of different avocado types and bag sizes.

13. **Predictive Analysis Using Regression Models**:
    - **Simple Linear Regression**: High MAPE (25.85%) and low R-squared value (0.034), indicating limited predictive accuracy.
    - **Multiple Linear Regression**: Lowest MAPE (14.55%) and highest R-squared value (0.612), providing more accurate predictions and better explanation of variance in avocado prices.

<hr />
<hr />


<h2 id="ConClusion">conclusion</h2>

In this analysis, we aimed to predict the average selling price of avocados across various regions in the United States. We employed a comprehensive approach, including data collection, wrangling, exploratory data analysis, and predictive modeling using various regression models.

Key findings include:
- **Seasonal Trends**: Avocado prices and volumes exhibit strong seasonal trends, with prices peaking in the second half of each year and volumes sold peaking in the first half.
- **Organic Price Premium**: Organic avocados consistently command higher prices compared to conventional ones.
- **Volume-Price Relationship**: There is a strong inverse relationship between average price and total volume sold, indicating price sensitivity among consumers.
- **Regional Variations**: Different regions exhibit varying consumption patterns and price sensitivities.
- **Model Comparison**: The Multiple Linear Regression model outperformed other models, providing more accurate predictions and better explanatory power for avocado prices.

By understanding these trends and correlations, stakeholders in the avocado market can make informed decisions regarding pricing strategies, inventory management, and market planning. The insights gained from this analysis are integrated into a Shiny web application, providing a user-friendly interface for visualizing and predicting avocado prices and volumes.

You can access the application at the following URL: [https://n0gl91-adel-alaa.shinyapps.io/avocado/](https://n0gl91-adel-alaa.shinyapps.io/avocado/).