# Data Challenge: Creating Interactive Plotly Visuals

## Targeted KSBs (Knowledge, Skills, and Behaviors)

- **S6** – Demonstrates mastery in creating dynamic visualizations using Python (Plotly)
- **K10** – Applies chart selection principles based on data types, variables, and audience needs
- **S12** – Performs comprehensive data exploration to uncover patterns and relationships

---

## Dataset Description:

This dataset contains information about various Indian sweets, including their ingredients, preparation time, and flavor profile.  You can read more about the data [Here](https://www.kaggle.com/datasets/nehaprabhavalkar/indian-food-101?select=indian_food.csv)

---

## Task 1: Plot a Scatter Plot of Prep Time vs Cook Time
### Objective:
Create a line chart showing the relationship between **prep time** and **cook time** for each food item. Use Plotly to visualize these two variables and identify any patterns.

### Instructions:
1. **Load the dataset** using `pandas`.
2. Use **Plotly Express** to create a **scatterplot**.
3. Set **prep_time** on the x-axis and **cook_time** on the y-axis.
4. **Label** the axes appropriately and add a **title** for clarity.

In [3]:
#Run this cell without changes 
import pandas as pd 
import plotly.express as px

In [4]:
# Read in the data (data/indian_food.csv) -- Hint use pandas to read in the CSV 

df = pd.read_csv("/Users/kabbo/Desktop/marcy/DA2025_Lectures/Mod2/data/indian_food.csv")

In [6]:
# Create a scatterplot of prep_time on X-axis & cook_time on Y-axis 

fig = px.scatter(df, x='prep_time', y='cook_time', title='Cook time by prep time')

# Show the plot
fig.show()

### What insights did you get from Task 1? (Double-click to type answer)

There doesn't seem to be any proportionality between prep time and cook time. Cooktime doesn't go above a certain threshold except for 1 outlier. I don't know the context of this data so I can't provide any explanations.

## Task 2: Bar Chart of Cook Time by Region

### Objective:
Create a bar chart that shows the average cook time for each region. This will help us understand the cooking time distribution across different regions.  **There is a "weird" bar in the chart why is that the case??**


### Instructions:
- Group the data by the region. (Hint:  may need a df.groupby() method here!)

- Calculate the average cook time for each region.

- Create a bar chart using Plotly to show this average cook time for each region.

- Label the axes and title the chart.

In [8]:
# Task 2: Create a bar chart showing the average cook time by region
# Fill in the code to group by region and calculate the average cook time
df_region_avg = df.groupby('region')['cook_time'].mean().reset_index()

# Create the bar chart
fig = px.bar(df_region_avg, x='region', y='cook_time', title="Average Cook Time by Region")

# Show the plot
fig.show()


### What insights did you get from Task 2? (Double-click to type answer)
The -1 bar represents the missing data. Central region has the longest cook times.

## Task 3: Pie Chart of Flavor Profile Distribution

### Objective:
Create a pie chart showing the distribution of flavor profiles (e.g., sweet, savory) across the dataset.

### Instructions:
- Use Plotly Express to create a pie chart.

- Plot the flavor_profile column, which will show the distribution of flavor types.

- Ensure the chart is labeled clearly.

In [11]:
# Get the count of each flavor profile
flavor_counts = df['flavor_profile'].value_counts().reset_index()
flavor_counts.columns = ['flavor_profile', 'count']

# Create the pie chart
fig = px.pie(flavor_counts,
             values='count',
             names='flavor_profile',
             title="Flavor Profile Distribution")

# Show the plot
fig.show()

### What insights did you get from Task 3? (Double-click to type answer)
Over 50% of the flavor profile is spicy. The least popular being sour which is at 0.392%. Sweet flavors are the 2nd most popular at 34.5%.