# How to tell a story with your data (beginner friendly)


You can see the full step-by-step vidoe here: 

Please don’t forget to like and subscribe for more beginner-friendly Python projects. 

 First, define the question. "What do you want to know from this data?"
Will will answer: "What influences purchase behavior across age groups and countries?”

#### Step 1: Acquire
  Let’s import the dataset into Jupyter Notebook and preview it.

In [None]:
import pandas as pd
# Load the data
df = pd.read_csv("cartoon_customer_data.csv")
df.head()

#### Step 2: 
Now we’ll clean the data — remove duplicates, check for nulls, and group age ranges

In [None]:
# Check for missing values
df.isnull().sum()

In [None]:
# Drop duplicates
df.drop_duplicates(inplace=True)

In [None]:
# Create age group
bins = [0, 18, 35, 50, 70]
labels = ["Teen", "Young Adult", "Adult", "Senior"]
df["Age Group"] = pd.cut(df["Age"], bins=bins, labels=labels)

This creates the age ranges (also called bins).
These ranges split ages into four groups:
0 to 18 = "Teen"
18 to 35 = "Young Adult"
35 to 50 = "Adult"
50 to 70 = "Senior"
Think of this as drawing boundary lines to group ages into buckets.

#### Step 3: Visualize

Let’s bring this story to life with some visuals

In [None]:
# First import your library to use the packages for these visualizations
import matplotlib.pyplot as plt

In [None]:
# a. Bar Chart: Total Purchase Amount by Country
country_sales = df.groupby("Country")["PurchaseAmount"].sum().sort_values()

plt.figure(figsize=(10, 6))
country_sales.plot(kind='barh', color='skyblue')
plt.title("Total Purchase Amount by Country")
plt.xlabel("Total Sales ($)")
plt.ylabel("Country")
plt.tight_layout()
plt.show()

Bar charts like this help us quickly compare totals and spot who’s contributing the most.

In [None]:
# b. Bar Chart: Average Purchase by Age Group
age_avg = df.groupby("Age Group")["PurchaseAmount"].mean()

plt.figure(figsize=(8, 5))
age_avg.plot(kind='bar', color='lightgreen')
plt.title("Average Purchase Amount by Age Group")
plt.ylabel("Average Purchase ($)")
plt.xlabel("Age Group")
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()

This tells us that younger customers might be more impulsive or trend-driven, and businesses targeting them could benefit from upselling.

In [None]:
# c. Pie Chart: Average Purchase by Gender

gender_counts = df["Gender"].value_counts()

plt.figure(figsize=(6, 6))
plt.pie(gender_counts, labels=gender_counts.index, autopct='%1.1f%%', colors=['#66b3ff','#ff9999','#99ff99'])
plt.title("Customer Gender Breakdown")
plt.axis('equal')  # Keeps the pie chart circular
plt.show()


This pie chart shows how many customers are male, female, or non-binary. It gives a clear, quick snapshot of your audience.This is our audiance by gender and this pie chart shows 64.2% male and 35.8% female. This helps us understand who we’re talking to — and can guide marketing strategies, ad visuals, or even product offerings.

Now that we’ve cleaned and explored the data, we can say that age and country both play a role in spending behavior.
This is how we go from messy data to story-driven insights. 

Now lets save the cleaned dataset for later projects.

In [None]:
df.to_csv("cleaned_cartoon_customer_data.csv", index=False) 

You can also see the full step-by-step vidoe here: 

Please don’t forget to like and subscribe for more beginner-friendly Python projects. 