# Lab | Data Aggregation and Filtering

In this challenge, we will continue to work with customer data from an insurance company. We will use the dataset called marketing_customer_analysis.csv, which can be found at the following link:

https://raw.githubusercontent.com/data-bootcamp-v4/data/main/marketing_customer_analysis.csv

This dataset contains information such as customer demographics, policy details, vehicle information, and the customer's response to the last marketing campaign. Our goal is to explore and analyze this data by first performing data cleaning, formatting, and structuring.

1. Create a new DataFrame that only includes customers who:
   - have a **low total_claim_amount** (e.g., below $1,000),
   - have a response "Yes" to the last marketing campaign.

2. Using the original Dataframe, analyze:
   - the average `monthly_premium` and/or customer lifetime value by `policy_type` and `gender` for customers who responded "Yes", and
   - compare these insights to `total_claim_amount` patterns, and discuss which segments appear most profitable or low-risk for the company.

3. Analyze the total number of customers who have policies in each state, and then filter the results to only include states where there are more than 500 customers.

4. Find the maximum, minimum, and median customer lifetime value by education level and gender. Write your conclusions.

In [3]:
import pandas as pd

# Load the dataset
file_path = "marketing_customer_analysis.csv"  # Update path if needed
df = pd.read_csv(file_path)

# Step 1: Filter for customers with Total Claim Amount < 1000 and Response == "Yes"
low_claim_responders = df[(df['Total Claim Amount'] < 1000) & (df['Response'] == 'Yes')]

# Step 2: Analyze 'Monthly Premium Auto' and 'Customer Lifetime Value' by 'Policy Type' and 'Gender'
premium_clv_analysis = pd.pivot_table(
    low_claim_responders,
    values=['Monthly Premium Auto', 'Customer Lifetime Value'],
    index=['Policy Type', 'Gender'],
    aggfunc='mean'
).round(2)

# Step 3: Compare with Total Claim Amount patterns
claim_amount_analysis = pd.pivot_table(
    low_claim_responders,
    values='Total Claim Amount',
    index=['Policy Type', 'Gender'],
    aggfunc='mean'
).round(2)

# Step 4: This should count total number of customers by state
customers_by_state = df['State'].value_counts()

# Step 5: Filter for states with more than 500 customers
states_over_500 = customers_by_state[customers_by_state > 500]

# Step 6: Get max, min, and median CLV by 'Gender' and 'Education'
clv_stats = df.groupby(['Gender', 'Education'])['Customer Lifetime Value'].agg(['max', 'min', 'median']).round(2)

# Print results
print("=== Avg Monthly Premium & CLV by Policy Type and Gender ===")
print(premium_clv_analysis)
print("\n=== Avg Total Claim Amount by Policy Type and Gender ===")
print(claim_amount_analysis)
print("\n=== States with More than 500 Customers ===")
print(states_over_500)
print("\n=== CLV Stats by Gender and Education ===")
print(clv_stats)


=== Avg Monthly Premium & CLV by Policy Type and Gender ===
                       Customer Lifetime Value  Monthly Premium Auto
Policy Type    Gender                                               
Corporate Auto F                       7334.77                 89.04
               M                       7920.40                 88.55
Personal Auto  F                       7966.93                 90.62
               M                       7481.82                 87.43
Special Auto   F                       7594.01                 86.71
               M                       8348.23                 80.00

=== Avg Total Claim Amount by Policy Type and Gender ===
                       Total Claim Amount
Policy Type    Gender                    
Corporate Auto F                   407.80
               M                   388.25
Personal Auto  F                   404.97
               M                   427.87
Special Auto   F                   426.66
               M                   3