# Costco Membership Churn
Churn refers to a customer’s decision to stop doing business with a company. Understanding why members cancel their subscriptions helps inform retention strategies and impact revenue.

This project simulates a realistic dataset of Costco members, including demographics, spending behavior, and membership details. The goal is to analyze factors influencing churn and ultimately create a compelling Tableau dashboard to visualize these insights.

## Installs

This notebook uses several Python libraries essential for data generation and analysis. I utilize `pandas` for data manipulation, `numpy` for numerical operations, and `faker` to create realistic synthetic data such as signup dates and customer IDs. Additionally, the `random` library helps simulate categorical variables. All these tools enable us to create a rich dataset for churn analysis.

In [10]:
!pip install Faker

Collecting Faker
  Downloading faker-37.4.2-py3-none-any.whl.metadata (15 kB)
Downloading faker-37.4.2-py3-none-any.whl (1.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m43.6 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[?25hInstalling collected packages: Faker
Successfully installed Faker-37.4.2


In [11]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

from faker import Faker # simulating the data
import random

## Generating the Data

In this section, I generate a synthetic dataset representing 5,000 Costco members. Each record includes customer demographics, membership types, spending habits, visit frequency, and home store location with latitude and longitude coordinates. I also define a churn flag using a behavior-based scoring system considering factors like auto-renew status, average spend, and promotional usage. This dataset serves as the foundation for further exploratory analysis and dashboard creation.

In [21]:
membership_types = ['gold', 'executive', 'business']  # rename for clarity

memberships = []
for i in range(N):
    customer_id = f"C{i+1:05d}"
    signup_date = fake.date_between(start_date="-5y", end_date="-30d")
    membership_type = random.choices(membership_types, weights=[0.5, 0.3, 0.2])[0]
    auto_renew = random.choice([True, False])
    avg_spend = round(np.random.normal(loc=120, scale=40), 2)
    avg_spend = max(avg_spend, 10)  # no negative spend
    visits = np.random.poisson(12)
    home_store = random.choice(stores)
    email_opt_in = random.choice([True, False])
    promo_used = random.choices([True, False], weights=[0.3, 0.7])[0]  # Added this
    age = int(np.clip(np.random.normal(loc=40, scale=15), 18, 85))
    household_size = max(1, int(np.random.poisson(2.5)))
    has_cc = random.choice([True, False])
    store_key = home_store.lower()
    lat, lon = store_locations[store_key]

    # Define churn logic here — example from previous:
    score = 0
    if not auto_renew:
        score += 1
    if avg_spend < 80:
        score += 1
    if visits < 6:
        score += 1
    if not promo_used:
        score += 0.5
    churned = score >= 2

    churn_reason = random.choice(churn_reasons) if churned else None

    memberships.append([
        customer_id, signup_date, membership_type, auto_renew,
        avg_spend, visits, home_store, email_opt_in, promo_used,
        age, household_size, churned, churn_reason, lat, lon
    ])

columns = [
    'customer_id',
    'signup_date',
    'membership_type',
    'auto_renew',
    'avg_monthly_spend',
    'total_visits_last_year',
    'home_store',
    'email_opt_in',
    'promo_used',
    'age',
    'household_size',
    'churned',
    'churn_reason',
    'latitude',
    'longitude'
]

## Exporting the Data

Finally, I will export the data for further analysis. 

In [24]:
# create dataframe
memberships = pd.DataFrame(memberships, columns=columns)

# export to csv
memberships.to_csv('costco_membership_data.csv', index=False)
print('costco_membership_data.csv created with', len(memberships), 'rows')

costco_membership_data.csv created with 5000 rows
