# AIAI Loyalty Program – Geo-Spatial Insights
 **Student IDs: 20250422, 20250388, 20250439**

---

**Deliverable 1 – Bonus Option 2: Geo-Spatial Insights**

We use **cleaned data** (`customers_clean.csv`) with **pre-engineered features**:
- `PropNrFlights`, `CLV_Category`, `Income_Category`, `Recency_Category`, `ChurnStatus`


**Goal**: Show **preliminary customer clusters** using **geo + behavior**

---

In [11]:
!pip install -q plotly folium

import pandas as pd
import numpy as np
import plotly.express as px
import folium
from folium.plugins import HeatMap
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.max_columns', None)

In [12]:

customers = pd.read_csv('customers_clean.csv')
flights = pd.read_csv('flights_new.csv')

# Geo DF
geo_df = customers.dropna(subset=['Latitude', 'Longitude']).copy()

# Fill NaN for safety
geo_df['PropNrFlights'] = geo_df['PropNrFlights'].fillna(0)
geo_df['ChurnStatus'] = geo_df['ChurnStatus'].fillna('Active')
geo_df['CLV_Category'] = geo_df['CLV_Category'].fillna('Low')
geo_df['Income_Category'] = geo_df['Income_Category'].fillna('Low')
geo_df['Recency_Category'] = geo_df['Recency_Category'].fillna('No Flights')

print(f"Total customers with geo: {len(geo_df):,}")
print("Features used:", ['PropNrFlights', 'CLV_Category', 'Income_Category', 'ChurnStatus', 'Recency_Category'])

Total customers with geo: 16,574
Features used: ['PropNrFlights', 'CLV_Category', 'Income_Category', 'ChurnStatus', 'Recency_Category']


## 3. Define Preliminary Clusters (Using Your Features)

In [13]:
geo_df['GeoCluster'] = np.select(
    [
        # 1. Urban High-Value
        (geo_df['Province or State'].isin(['Ontario', 'British Columbia', 'Alberta'])) &
        (geo_df['CLV_Category'] == 'High') &
        (geo_df['Income_Category'] == 'High'),
        
        # 2. Budget Frequent
        (geo_df['PropNrFlights_Category'] == 'High') &
        (geo_df['Income_Category'] == 'Low'),
        
        # 3. At-Risk
        (geo_df['ChurnStatus'] == 'Cancelled') |
        (geo_df['Recency_Category'] == 'Low')
    ],
    ['Urban High-Value', 'Budget Frequent', 'At-Risk'],
    default='Standard'
)

print("Cluster distribution:")
print(geo_df['GeoCluster'].value_counts())

Cluster distribution:
GeoCluster
Standard            10279
Urban High-Value     3018
At-Risk              1822
Budget Frequent      1455
Name: count, dtype: int64


## MAP: CLV Bubbles (Size = CLV, Color = Income)


In [19]:
fig_clv = px.scatter_mapbox(
    geo_df,
    lat="Latitude", lon="Longitude",
    size="Customer Lifetime Value",
    color="Income",
    color_continuous_scale="Viridis",
    size_max=18,
    opacity=0.7,
    hover_name="City",
    hover_data={
        "Province or State": True,
        "PropNrFlights": ":.2f",
        "LoyaltyStatus": True,
        "ChurnStatus": True,
        "CLV_Category": True
    },
    title="AIAI Loyalty: Bubble Size = CLV, Color = Income",
    mapbox_style="carto-positron",
    height=650,
    zoom=3,
    center={"lat": 56.13, "lon": -106.35}
)

fig_clv.update_layout(margin=dict(l=0, r=0, t=50, b=0))
fig_clv.show()

## 4. Interactive Geo-Cluster Map (Jittered)

In [14]:
np.random.seed(42)
jitter = 0.02
geo_df['lat_j'] = geo_df['Latitude'] + np.random.uniform(-jitter, jitter, len(geo_df))
geo_df['lon_j'] = geo_df['Longitude'] + np.random.uniform(-jitter, jitter, len(geo_df))

# MAP
fig = px.scatter_mapbox(
    geo_df,
    lat="lat_j", lon="lon_j",
    color="GeoCluster",
    size="PropNrFlights",
    size_max=15,
    opacity=0.7,
    color_discrete_map={
        'Urban High-Value': '#1f77b4',
        'Budget Frequent': '#ff7f0e',
        'At-Risk': '#d62728',
        'Standard': '#2ca02c'
    },
    hover_data={
        "City": True,
        "Province or State": True,
        "CLV_Category": True,
        "Income_Category": True,
        "ChurnStatus": True,
        "Recency_Months": ":.0f",
        "PropNrFlights": ":.2f"
    },
    title="Preliminary Customer Clusters (Geo + Behavior)",
    mapbox_style="carto-positron",
    height=700,
    zoom=3,
    center={"lat": 56.13, "lon": -106.35}
)

fig.update_layout(margin=dict(l=0, r=0, t=50, b=0))
fig.show()

## 5. Cluster Summary Table

In [15]:
season_agg = flights.groupby('Loyalty#')['Season'].agg(lambda x: x.mode()[0] if not x.empty else 'No Flights').reset_index()
season_agg.columns = ['Loyalty#', 'MostSeason']
geo_df = geo_df.merge(season_agg, on='Loyalty#', how='left')

summary = geo_df.groupby('GeoCluster').agg(
    Count=('Loyalty#', 'count'),
    Avg_CLV=('Customer Lifetime Value', 'mean'),
    Avg_Income=('Income', 'mean'),
    Avg_Flights_Mo=('PropNrFlights', 'mean'),
    Churn_Rate=('ChurnStatus', lambda x: round(100 * (x == 'Cancelled').mean(), 1)),
    Avg_Recency=('Recency_Months', 'mean'),
    Most_Season=('MostSeason', lambda x: x.mode()[0] if not x.empty else 'No Flights')
).round(1)

summary['%_Total'] = (summary['Count'] / len(geo_df) * 100).round(1)
summary = summary[['Count', '%_Total', 'Avg_CLV', 'Avg_Income', 'Avg_Flights_Mo', 'Churn_Rate', 'Avg_Recency', 'Most_Season']]
summary

Unnamed: 0_level_0,Count,%_Total,Avg_CLV,Avg_Income,Avg_Flights_Mo,Churn_Rate,Avg_Recency,Most_Season
GeoCluster,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
At-Risk,1822,11.0,7536.3,37578.3,2.2,98.1,5.2,Autumn
Budget Frequent,1455,8.8,3254.0,35564.0,6.6,5.2,0.7,Autumn
Standard,10279,62.0,7608.8,37398.7,5.1,0.0,0.4,Autumn
Urban High-Value,3018,18.2,11826.1,40048.9,4.9,12.7,1.0,Autumn


## 6. Density Heatmap (Optional)

In [16]:
m = folium.Map(location=[56.13, -106.35], zoom_start=3, tiles="CartoDB positron")
heat_data = geo_df[['Latitude', 'Longitude']].values.tolist()
HeatMap(heat_data, radius=12, blur=15).add_to(m)
m

## 7. Business Insights (Deliverable 1 Answer)

| Cluster | Insight | Action |
|--------|---------|--------|
| **Urban High-Value** | High CLV + Income + Urban | **Target with premium upgrades** |
| **Budget Frequent** | Fly often, low income | **Loyalty points for budget flyers** |
| **At-Risk** | High churn + low recency | **Win-back campaigns** |
| **Standard** | Baseline | **General retention** |*

In [17]:
# Export maps
fig.write_html("AIAI_Geo_Clusters.html")
print("Map saved as HTML!")

Map saved as HTML!
