# Fruit and Vegetable Prices


## Data set selection

> In this section, you will need to provide the following information about the selected data set:

### Source (with link) 
The dataset comes from the USDA Economic Research Service â€“ Fruit and Vegetable Prices Database.  
Link: https://www.ers.usda.gov/data-products/fruit-and-vegetable-prices/

---

### Fields

- **`Fruit`** â€“ Name of the fruit  
- **`Form`** â€“ The form in which the fruit is sold (Fresh, Canned, Juice, Frozen, etc.)  
- **`RetailPrice`** â€“ Price per unit as sold (e.g., per pound, per pint)  
- **`RetailPriceUnit`** â€“ Unit of pricing  
- **`Yield`** â€“ Edible yield percentage after removing waste (peel, core, etc.)  
- **`CupEquivalentSize`** â€“ How much product is needed to make 1 edible cup  
- **`CupEquivalentUnit`** â€“ Unit used for the cup conversion  
- **`CupEquivalentPrice`** â€“ Cost to obtain one edible cup of the fruit  

---

### License
Creative Commons CCZero  
Link: http://www.opendefinition.org/licenses/cc-zero

### Data set selection rationale

I selected this dataset because this dataset could provide real-world economic and nutrition data that are useful for health insights. It includes multiple variables that I can use for statistical analysis. The dataset is clean overall, I didn't have to preprocess it. By analyzing this dataset, I might be able to create analysis related to fruit affordability, and which fruits bring most value based on its price and yield.

### Questions to be answered

> Using statistical analysis and visualization, what questions would you like to be able answer about this dataset.
> This could include questions such as:

 - What is the average retail price for different fresh fruits?  
 We can compare how fruit prices vary in cost
 - How does the form of the fruit affect the retail price?  
 This helps determine whether processing tends to make specific fruit more or less expensive
 - What is the relationship between retail price and cup equivalent price?  
 This helps understand how much edible fruit consumers actually get for the price they pay
 - How does yield impact the cost of edible fruit?
 This may answer if a fruit with low yield may be cheap per pound but be more expensive per serving
 - What are top 5 affordable fruits based on cup equivalent price?  
 This helps identify the most 5 affordable fruits. With the data collected, this would answer what fruits are the best for budget planning  
 - What is the distribution of fruit forms?
 This would answer which type of fruit products dominate the dataset. For stakeholders, it would help them understand what products consumers have access to.

### Visualization ideas

> Provide a few examples of what you plan to visualize to answer the questions you posed in the previous section. In this project, you will be producing 6-8 visualizations. You will also be producing an interactive chart using Plotly.
1. Interactive bar chart using Plotly for the average retail prices. This would show how fresh fruit prices vary across the dataset. By using Plotly, it would handle multiple labels of fruits on interacting with chart.  
2. Grouped bar chart for retail price vs form. This would show whether fresh, canned, frozen, or juice tend to be cheaper or more expensive.  
3. Scatter plot for retail price vs cup equivalent price. This would help identify correlation between retail price and cup equivalent price so that stakeholders can what fruits bring the good value. I would use plotly for this chart because that way, it allows to label which help stakeholders see exact price values, make correlation patterns clearer.  
4. Scatter plot for the impact of yield on cup equivalent price. This would show how edible yield affects how much the consumers pay. Low-yield fruit may seem cheap per pound but expensive per cup.  
5. Pie chart for top 5 most affordable fruits. This would help identify the cheapest fruit based on cup equivalent price. This would be used for budget planning.  
6. Pie chart for distribution of fruit forms. This would help understand which types of fruit products dominate the dataset.



In [45]:
# ðŸš€ Importing some libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

try:
  import plotly.express as px
  import plotly.graph_objects as go
  PLOTLY_AVAILABLE = True
except ImportError:
  PLOTLY_AVAILABLE = False
  print("Note: Plotly not available. Interactive visualizations will be skipped.")

print("âœ… Libraries loaded successfully!")

âœ… Libraries loaded successfully!


### Loading the Data
Load the fruit and vegetable prices dataset

In [None]:
# Load the fruit and vegetables dataset
df = pd.read_csv('data/Fruit-Prices-2022.csv')

# Display basic information about the dataset
print('Dataset Shape:', df.shape)

display(df.head())

Dataset Shape: (62, 8)


Unnamed: 0,Fruit,Form,RetailPrice,RetailPriceUnit,Yield,CupEquivalentSize,CupEquivalentUnit,CupEquivalentPrice
0,Apples,Fresh,1.8541,per pound,0.9,0.2425,pounds,0.4996
1,"Apples, applesauce",Canned,1.1705,per pound,1.0,0.5401,pounds,0.6323
2,"Apples, ready-to-drink",Juice,0.8699,per pint,1.0,8.0,fluid ounces,0.4349
3,"Apples, frozen concentrate",Juice,0.6086,per pint,1.0,8.0,fluid ounces,0.3043
4,Apricots,Fresh,3.6162,per pound,0.93,0.3638,pounds,1.4145


## ðŸ“Š Visualization Idea 1: Average Retail Price of Fresh Fruits

### 

In [None]:
# Retrieve retail price for fresh fruits
retail_price = df.loc[df['Form'] == 'Fresh', ['Fruit', 'Form', 'RetailPrice']]

# Plotly for bar chart
fig = px.histogram(
    retail_price, x='Fruit', y='RetailPrice',
    histfunc='avg',
    title='Interactive Bar Chart: Average Retail Price of Fresh Fruits',
    labels={'RetailPrice': 'Retail Price'}
)
fig.show()

In [49]:
# ðŸ“Š Visualization Idea 2: Average Retail Price of Fresh Fruits
display(df['Form'].unique())

array(['Fresh', 'Canned', 'Juice', 'Dried', 'Frozen'], dtype=object)