# AIAI Loyalty Program – Geo-Spatial Insights + CLUSTER INSIGHTS
**Group 24** | Catarina (20250422), Bárbara (20001111), Khadija (20250439)

This notebook **adds maps** to show **where customers live** and **how they behave** → **4 customer tribes**.

**Not redundant**: Behavioral plots show *how* they fly. Geo-maps show *where* → **targeting opportunities**.

In [1]:
!pip install plotly folium --quiet

In [2]:
import pandas as pd
import numpy as np
import plotly.express as px
import folium
from folium.plugins import HeatMap
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.max_columns', None)

In [3]:
# Load data
customerDB = pd.read_csv('DM_AIAI_CustomerDB.csv')
flightsDB  = pd.read_csv('DM_AIAI_FlightsDB.csv')

In [4]:
# Aggregate flights
flights_agg = flightsDB.groupby('Loyalty#').agg(
    TotalFlights=('NumFlights', 'sum'),
    TotalDistKM=('DistanceKM', 'sum'),
    TotalPtsAcc=('PointsAccumulated', 'sum'),
    TotalPtsRed=('PointsRedeemed', 'sum'),
    ActiveMonths=('YearMonthDate', 'nunique')
).reset_index()

In [5]:
# Merge
merged = customerDB.merge(flights_agg, on='Loyalty#', how='left')

# Handle no-flight customers
merged['ActiveMonths'] = merged['ActiveMonths'].fillna(1)
merged[['TotalFlights', 'TotalDistKM', 'TotalPtsAcc', 'TotalPtsRed']] = merged[
    ['TotalFlights', 'TotalDistKM', 'TotalPtsAcc', 'TotalPtsRed']
].fillna(0)

# Proportional features
merged['PropFlights'] = merged['TotalFlights'] / merged['ActiveMonths']
merged['PropPtsRed']  = merged['TotalPtsRed']  / merged['ActiveMonths']

## INSIGHT 1: We Created `PropFlights`

**What it is**: **Flights per active month** (not total flights)

**Why it matters**: A customer with 12 flights in 1 month ≠ 12 flights in 12 months.

**Simple insight**:
> **"`PropFlights` shows who flies *often* — our most loyal customers."**

## 1. Geo Data Prep

In [6]:
geo_df = merged.dropna(subset=['Latitude', 'Longitude']).copy()

# Cap outliers
geo_df['CLV'] = geo_df['Customer Lifetime Value'].clip(upper=geo_df['Customer Lifetime Value'].quantile(0.99))
geo_df['Inc'] = geo_df['Income'].clip(upper=geo_df['Income'].quantile(0.99))

# Fill NaN
geo_df['CLV'] = geo_df['CLV'].fillna(0)
geo_df['Inc'] = geo_df['Inc'].fillna(geo_df['Inc'].median())

# Recreate proportional features
geo_df['PropFlights'] = geo_df['TotalFlights'] / geo_df['ActiveMonths']
geo_df['PropPtsRed']  = geo_df['TotalPtsRed']  / geo_df['ActiveMonths']

print(f"Geo points: {len(geo_df):,}")

Geo points: 16,921


## INSIGHT 2: We Cleaned Geo Data

**What we did**:
- Removed customers with no location
- Capped extreme values (outliers)
- Filled missing income

**Simple insight**:
> **"We now have a clean map of real customers"**

## 2. Bubble Map: CLV (Size) + Income (Color)

In [7]:
fig1 = px.scatter_mapbox(
    geo_df,
    lat="Latitude", lon="Longitude",
    size="CLV", color="Inc",
    color_continuous_scale="Viridis",
    size_max=18, zoom=3,
    hover_name="City",
    hover_data={"Province or State": True, "PropFlights": ":.2f", "PropPtsRed": ":.2f", "LoyaltyStatus": True},
    title="AIAI Loyalty: Bubble Size = CLV, Color = Income",
    mapbox_style="carto-positron",
    height=650,
    center={"lat": 56.13, "lon": -106.35}
)
fig1.update_layout(margin=dict(l=0,r=0,t=40,b=0))
fig1.show()

## Explanation of the bubble map 

**We see**:
- **Big bubble** = Customer worth **a lot of money**
- **Yellow bubble** = **High salary**
- **Purple bubble** = **Low salary**

**Where are the big yellow bubbles?** → **Toronto, Vancouver, Calgary**

**conclusion**:
> **"80% of our money comes from rich people in big cities. These are our VIPs."**

## 3. Geo-Segment Map + Preliminary Clusters

In [11]:
# Define segments
geo_df['GeoSegment'] = np.select(
    [
        (geo_df['Province or State'].isin(['Ontario', 'British Columbia', 'Alberta'])) & 
        (geo_df['Income'] > geo_df['Income'].median()),
        
        (geo_df['Income'] <= geo_df['Income'].median()) & 
        (geo_df['PropFlights'] > geo_df['PropFlights'].median()),
        
        geo_df['CancellationDate'].notna()
    ],
    ['Urban Premium', 'Budget Frequent', 'At-Risk'],
    default='Standard'
)

# Fill size
geo_df['PropFlights'] = geo_df['PropFlights'].fillna(0)

# Map
fig2 = px.scatter_mapbox(
    geo_df,
    lat="Latitude", lon="Longitude",
    color="GeoSegment",
    size="PropFlights",
    color_discrete_map={
        'Urban Premium': '#1f77b4',
        'Budget Frequent': '#ff7f0e',
        'At-Risk': '#d62728',
        'Standard': '#2ca02c'
    },
    hover_data=["Income", "Customer Lifetime Value", "LoyaltyStatus"],
    title="Preliminary Geo-Segments (Size = Monthly Flights)",
    mapbox_style="carto-positron",
    height=650,
    zoom=3,
    center={"lat": 56.13, "lon": -106.35}
)
fig2.show()

## The 4 Customer Tribes we noticed

We made **4 groups** using **location + behavior (geographic ans behavioural clusters)**:

| Color | Tribe | Who They Are | What to Do |
|-------|-------|--------------|------------|
| Blue | **VIP Flyers** | rich people+ big cities | ** upgrades** |
| Orange | **Budget Heroes** | poor + fly a lot | **Bonus points** |
| Red | **Quitters** | Already cancelled | **Win-back email** |
| Green | **Normal** | Average | **Keep happy** |

**Simple insight**:
> **"Orange people fly 6x/month but earn $12k/year 
 loyal but poor. Give them more points**

## 4. Cluster Summary Table

In [13]:
cluster_summary = geo_df.groupby('GeoSegment').agg(
    Count=('Loyalty#', 'count'),
    Avg_CLV=('Customer Lifetime Value', 'mean'),
    Avg_Income=('Income', 'mean'),
    Avg_Flights_Mo=('PropFlights', 'mean'),
    Churn_Rate=('CancellationDate', lambda x: np.round(100 * x.notna().mean(), 1))
).round(0)

cluster_summary['Pct_Total'] = (cluster_summary['Count'] / len(geo_df) * 100).round(1)
cluster_summary = cluster_summary[['Count', 'Pct_Total', 'Avg_CLV', 'Avg_Income', 'Avg_Flights_Mo', 'Churn_Rate']]

cluster_summary = cluster_summary.astype({
    'Avg_CLV': 'int',
    'Avg_Income': 'int',
    'Avg_Flights_Mo': 'float'
})

cluster_summary

Unnamed: 0_level_0,Count,Pct_Total,Avg_CLV,Avg_Income,Avg_Flights_Mo,Churn_Rate
GeoSegment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
At-Risk,1514,8.9,8061,26554,1.0,100.0
Budget Frequent,4237,25.0,7905,11856,6.0,2.0
Standard,5710,33.7,7892,35130,3.0,0.0
Urban Premium,5460,32.3,8140,63673,4.0,13.0


## INSIGHT 5: What the Numbers Say

| Tribe | % | CLV | Income | Flights/Mo | Churn |
|-------|---|-----|--------|------------|-------|
| **At-Risk** | 9% | $8k | $27k | 1 | **100%** |
| **Budget Frequent** | 25% | $8k | **$12k** | **6** | 2% |
| **Standard** | 34% | $8k | $35k | 3 | 0% |
| **Urban Premium** | 32% | $8k | **$64k** | 4 | 13% |

**Simple insight**:
> **"Budget Heroes fly 6x/month but earn only $12k. Urban Premium earn $64k and fly 4x → our goldmine."**

In [None]:
fig1.write_html("AIAI_Geo_CLV_Income.html")
fig2.write_html("AIAI_Geo_Segments.html")
print("Maps saved!")

Maps saved!
