#BA820 Project M2 - Exit Pathways in Alone


Focus Question (Q2):
Are there identifiable exit pathways—such as early voluntary withdrawal versus later medical evacuation—that differ meaningfully in timing and survival duration?

Method (Unsupervised):
Association rule mining (Apriori) to discover interpretable co-occurrence patterns in exit timing, exit type, and outcome bins.

Note: This notebook builds on M1 data understanding but all analyses below are newly conducted for M2.


###Imports

In [None]:
import pandas as pd
import numpy as np

from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)


###Load Survivalists


In [None]:
BASE_URL = "https://raw.githubusercontent.com/Marcusshi/BA820-A1-08/main/data/alone_tv_show/"
survivalists = pd.read_csv(BASE_URL + "survivalists.csv")

print("Survivalists:", survivalists.shape)
survivalists.head()

Survivalists: (94, 16)


Unnamed: 0,season,name,age,gender,city,state,country,result,days_lasted,medically_evacuated,reason_tapped_out,reason_category,team,day_linked_up,profession,url
0,1,Alan Kay,40,Male,Blairsville,Georgia,United States,1,56,False,,,,,Corrections Officer,alan-kay
1,1,Sam Larson,22,Male,Lincoln,Nebraska,United States,2,55,False,Lost the mind game,Family / personal,,,Outdoor Gear Retailer,sam-larson
2,1,Mitch Mitchell,34,Male,Bellingham,Massachusetts,United States,3,43,False,Realized he should actually be around for his ...,Family / personal,,,Butcher,mitch-mitchell
3,1,Lucas Miller,32,Male,Quasqueton,Iowa,United States,4,39,False,Felt content with what he had done,Family / personal,,,Survivalist and Wildlife Therapist/Natural Hea...,lucas-miller
4,1,Dustin Feher,37,Male,Pittsburgh,Pennsylvania,United States,5,8,False,Fear of storm,Family / personal,,,Carpenter,dustin-feher


###Minimal Cleaning + Standardize Column Names
In our M1 notebook, we implemented a clean_cols() function to standardize column names. I reuse the same logic here to ensure consistency, while all analysis below is newly conducted for M2.

In [None]:
def clean_cols(df):
    df.columns = (
        df.columns.str.strip()
                 .str.lower()
                 .str.replace(" ", "_")
                 .str.replace("-", "_")
    )
    return df

survivalists = clean_cols(survivalists)
survivalists.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 94 entries, 0 to 93
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   season               94 non-null     int64  
 1   name                 94 non-null     object 
 2   age                  94 non-null     int64  
 3   gender               94 non-null     object 
 4   city                 94 non-null     object 
 5   state                93 non-null     object 
 6   country              94 non-null     object 
 7   result               94 non-null     int64  
 8   days_lasted          94 non-null     int64  
 9   medically_evacuated  94 non-null     bool   
 10  reason_tapped_out    84 non-null     object 
 11  reason_category      84 non-null     object 
 12  team                 14 non-null     object 
 13  day_linked_up        8 non-null      float64
 14  profession           94 non-null     object 
 15  url                  94 non-null     objec

## EDA & Preprocessing Updates (Relative to M1)

Building on the exploratory analysis conducted in M1, additional preprocessing steps were required to support association rule mining in this milestone.

Because association rules operate on discrete item co-occurrence rather than continuous variables, survival duration was discretized into coarse time bins (0–7, 8–21, 22–45, 46–70, 71+ days). This binning was motivated by clear right-skew observed in M1 and by the need to balance interpretability with sufficient item support.

Exit outcomes were decomposed into multiple attributes—exit timing (early vs. not early), medical evacuation status, and exit category—rather than treated as a single label. Early experiments using a single exit label produced trivial or sparse rules, motivating a more granular transaction design.

Finally, item frequency inspection was conducted to verify that key attributes (timing, medical status, exit category) appeared frequently enough to justify a minimum support threshold of 5%. No additional data cleaning beyond M1 was required.


##Transaction design
* Each participant is treated as a transaction (a “basket” of exit-related attributes).
* I convert timing + exit type + outcome into discrete tokens (e.g., TIMING_EARLY, MEDICAL_YES, EXIT_PSYCHOLOGICAL, DAYS_BIN_0_7).
Apriori then identifies frequent co-occurrence patterns and generates interpretable rules such as:`{TIMING_EARLY, EXIT_PSYCHOLOGICAL} → {DAYS_BIN_0_7}`

These rules help characterize exit pathways without imposing predefined labels.

In [None]:
s = survivalists.copy()

# Ensure numeric
s["days_lasted"] = pd.to_numeric(s["days_lasted"], errors="coerce")

# Winners have NA for tap-out fields; keep that meaning explicit
s["reason_category"] = s["reason_category"].fillna("Winner/NoTap")

# Timing tokens
s["timing_flag"] = np.where(s["days_lasted"] <= 7, "TIMING_EARLY", "TIMING_NOT_EARLY")

# Medical tokens
s["medical_flag"] = np.where(s["medically_evacuated"], "MEDICAL_YES", "MEDICAL_NO")

# Survival duration bins (adjustable)
bins = [-np.inf, 7, 21, 45, 70, np.inf]
labels = ["0_7", "8_21", "22_45", "46_70", "71_plus"]
s["days_bin"] = pd.cut(s["days_lasted"], bins=bins, labels=labels)

# Exit category token (normalize string)
s["exit_cat"] = (
    s["reason_category"].astype(str)
      .str.strip()
      .str.upper()
      .str.replace(r"\s+", "_", regex=True)
)
s["exit_token"] = "EXIT_" + s["exit_cat"]

s[["season","days_lasted","days_bin","timing_flag","medical_flag","exit_token"]].head(10)


Unnamed: 0,season,days_lasted,days_bin,timing_flag,medical_flag,exit_token
0,1,56,46_70,TIMING_NOT_EARLY,MEDICAL_NO,EXIT_WINNER/NOTAP
1,1,55,46_70,TIMING_NOT_EARLY,MEDICAL_NO,EXIT_FAMILY_/_PERSONAL
2,1,43,22_45,TIMING_NOT_EARLY,MEDICAL_NO,EXIT_FAMILY_/_PERSONAL
3,1,39,22_45,TIMING_NOT_EARLY,MEDICAL_NO,EXIT_FAMILY_/_PERSONAL
4,1,8,8_21,TIMING_NOT_EARLY,MEDICAL_NO,EXIT_FAMILY_/_PERSONAL
5,1,6,0_7,TIMING_EARLY,MEDICAL_NO,EXIT_MEDICAL_/_HEALTH
6,1,4,0_7,TIMING_EARLY,MEDICAL_NO,EXIT_FAMILY_/_PERSONAL
7,1,4,0_7,TIMING_EARLY,MEDICAL_NO,EXIT_LOSS_OF_INVENTORY
8,1,1,0_7,TIMING_EARLY,MEDICAL_NO,EXIT_FAMILY_/_PERSONAL
9,1,0,0_7,TIMING_EARLY,MEDICAL_NO,EXIT_FAMILY_/_PERSONAL


In [None]:
transactions = []
for _, r in s.iterrows():
    basket = [
        f"SEASON_{int(r['season'])}",
        f"DAYS_BIN_{r['days_bin']}",
        r["timing_flag"],
        r["medical_flag"],
        r["exit_token"]
    ]
    # remove any missing tokens safely
    basket = [x for x in basket if pd.notna(x) and x != "DAYS_BIN_nan"]
    transactions.append(basket)

# quick sanity check
transactions[:5]


[['SEASON_1',
  'DAYS_BIN_46_70',
  'TIMING_NOT_EARLY',
  'MEDICAL_NO',
  'EXIT_WINNER/NOTAP'],
 ['SEASON_1',
  'DAYS_BIN_46_70',
  'TIMING_NOT_EARLY',
  'MEDICAL_NO',
  'EXIT_FAMILY_/_PERSONAL'],
 ['SEASON_1',
  'DAYS_BIN_22_45',
  'TIMING_NOT_EARLY',
  'MEDICAL_NO',
  'EXIT_FAMILY_/_PERSONAL'],
 ['SEASON_1',
  'DAYS_BIN_22_45',
  'TIMING_NOT_EARLY',
  'MEDICAL_NO',
  'EXIT_FAMILY_/_PERSONAL'],
 ['SEASON_1',
  'DAYS_BIN_8_21',
  'TIMING_NOT_EARLY',
  'MEDICAL_NO',
  'EXIT_FAMILY_/_PERSONAL']]

In [None]:
s["exit_token"].value_counts().head(10)

Unnamed: 0_level_0,count
exit_token,Unnamed: 1_level_1
EXIT_MEDICAL_/_HEALTH,45
EXIT_FAMILY_/_PERSONAL,36
EXIT_WINNER/NOTAP,10
EXIT_LOSS_OF_INVENTORY,3


## Encoding Transactions for Association Rule Mining

Transactions are converted into a one-hot encoded matrix using a TransactionEncoder.
Each column represents the presence or absence of a specific exit-related attribute.


In [None]:
te = TransactionEncoder()
te_array = te.fit(transactions).transform(transactions)

df_encoded = pd.DataFrame(te_array, columns=te.columns_)
df_encoded.head()


Unnamed: 0,DAYS_BIN_0_7,DAYS_BIN_22_45,DAYS_BIN_46_70,DAYS_BIN_71_plus,DAYS_BIN_8_21,EXIT_FAMILY_/_PERSONAL,EXIT_LOSS_OF_INVENTORY,EXIT_MEDICAL_/_HEALTH,EXIT_WINNER/NOTAP,MEDICAL_NO,...,SEASON_2,SEASON_3,SEASON_4,SEASON_5,SEASON_6,SEASON_7,SEASON_8,SEASON_9,TIMING_EARLY,TIMING_NOT_EARLY
0,False,False,True,False,False,False,False,False,True,True,...,False,False,False,False,False,False,False,False,False,True
1,False,False,True,False,False,True,False,False,False,True,...,False,False,False,False,False,False,False,False,False,True
2,False,True,False,False,False,True,False,False,False,True,...,False,False,False,False,False,False,False,False,False,True
3,False,True,False,False,False,True,False,False,False,True,...,False,False,False,False,False,False,False,False,False,True
4,False,False,False,False,True,True,False,False,False,True,...,False,False,False,False,False,False,False,False,False,True


## Item Frequency Inspection

Before mining rules, I examine how frequently individual items appear.
This helps set reasonable support thresholds and avoid overly sparse rules.


In [None]:
item_support = df_encoded.mean().sort_values(ascending=False)

item_support.head(15)

Unnamed: 0,0
TIMING_NOT_EARLY,0.808511
MEDICAL_NO,0.734043
EXIT_MEDICAL_/_HEALTH,0.478723
EXIT_FAMILY_/_PERSONAL,0.382979
MEDICAL_YES,0.265957
DAYS_BIN_46_70,0.223404
DAYS_BIN_22_45,0.212766
DAYS_BIN_71_plus,0.202128
TIMING_EARLY,0.191489
DAYS_BIN_0_7,0.191489


This inspection confirmed that timing and exit-related attributes appeared in a substantial fraction of transactions, while finer-grained combinations were appropriately filtered by the support threshold.

## Frequent Itemset Mining (Apriori)

I begin with a minimum support of 5% to identify commonly occurring
combinations of exit-related attributes.


In [None]:
frequent_itemsets = apriori(
    df_encoded,
    min_support=0.05,
    use_colnames=True
)

frequent_itemsets.sort_values("support", ascending=False).head(10)


Unnamed: 0,support,itemsets
20,0.808511,(TIMING_NOT_EARLY)
8,0.734043,(MEDICAL_NO)
76,0.617021,"(TIMING_NOT_EARLY, MEDICAL_NO)"
6,0.478723,(EXIT_MEDICAL_/_HEALTH)
5,0.382979,(EXIT_FAMILY_/_PERSONAL)
48,0.382979,"(MEDICAL_NO, EXIT_FAMILY_/_PERSONAL)"
63,0.37234,"(TIMING_NOT_EARLY, EXIT_MEDICAL_/_HEALTH)"
54,0.308511,"(TIMING_NOT_EARLY, EXIT_FAMILY_/_PERSONAL)"
130,0.308511,"(TIMING_NOT_EARLY, MEDICAL_NO, EXIT_FAMILY_/_P..."
56,0.265957,"(MEDICAL_YES, EXIT_MEDICAL_/_HEALTH)"


## Association Rule Generation

Association rules are generated using confidence and lift to identify
meaningful exit pathway relationships rather than trivial co-occurrences.


In [None]:
rules = association_rules(
    frequent_itemsets,
    metric="confidence",
    min_threshold=0.6
)

rules = rules.sort_values("lift", ascending=False)
rules.head(10)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
181,"(MEDICAL_YES, TIMING_EARLY)","(DAYS_BIN_0_7, EXIT_MEDICAL_/_HEALTH)",0.074468,0.106383,0.074468,1.0,9.4,1.0,0.066546,inf,0.965517,0.7,1.0,0.85
184,"(DAYS_BIN_0_7, EXIT_MEDICAL_/_HEALTH)","(MEDICAL_YES, TIMING_EARLY)",0.106383,0.074468,0.074468,0.7,9.4,1.0,0.066546,3.085106,1.0,0.7,0.675862,0.85
183,"(DAYS_BIN_0_7, MEDICAL_YES)","(TIMING_EARLY, EXIT_MEDICAL_/_HEALTH)",0.074468,0.106383,0.074468,1.0,9.4,1.0,0.066546,inf,0.965517,0.7,1.0,0.85
182,"(TIMING_EARLY, EXIT_MEDICAL_/_HEALTH)","(DAYS_BIN_0_7, MEDICAL_YES)",0.106383,0.074468,0.074468,0.7,9.4,1.0,0.066546,3.085106,1.0,0.7,0.675862,0.85
173,"(MEDICAL_NO, DAYS_BIN_0_7)","(TIMING_EARLY, EXIT_FAMILY_/_PERSONAL)",0.117021,0.074468,0.074468,0.636364,8.545455,1.0,0.065754,2.545213,1.0,0.636364,0.607106,0.818182
176,"(TIMING_EARLY, EXIT_FAMILY_/_PERSONAL)","(MEDICAL_NO, DAYS_BIN_0_7)",0.074468,0.117021,0.074468,1.0,8.545455,1.0,0.065754,inf,0.954023,0.636364,1.0,0.818182
175,"(DAYS_BIN_0_7, EXIT_FAMILY_/_PERSONAL)","(MEDICAL_NO, TIMING_EARLY)",0.074468,0.117021,0.074468,1.0,8.545455,1.0,0.065754,inf,0.954023,0.636364,1.0,0.818182
174,"(MEDICAL_NO, TIMING_EARLY)","(DAYS_BIN_0_7, EXIT_FAMILY_/_PERSONAL)",0.117021,0.074468,0.074468,0.636364,8.545455,1.0,0.065754,2.545213,1.0,0.636364,0.607106,0.818182
188,"(SEASON_1, DAYS_BIN_0_7)","(MEDICAL_NO, TIMING_EARLY)",0.053191,0.117021,0.053191,1.0,8.545455,1.0,0.046967,inf,0.932584,0.454545,1.0,0.727273
189,"(SEASON_1, TIMING_EARLY)","(MEDICAL_NO, DAYS_BIN_0_7)",0.053191,0.117021,0.053191,1.0,8.545455,1.0,0.046967,inf,0.932584,0.454545,1.0,0.727273


In [None]:
exit_rules = rules[
    rules["consequents"].astype(str).str.contains("DAYS_BIN|MEDICAL|EXIT")
]

exit_rules[[
    "antecedents",
    "consequents",
    "support",
    "confidence",
    "lift"
]].head(10)


Unnamed: 0,antecedents,consequents,support,confidence,lift
181,"(MEDICAL_YES, TIMING_EARLY)","(DAYS_BIN_0_7, EXIT_MEDICAL_/_HEALTH)",0.074468,1.0,9.4
184,"(DAYS_BIN_0_7, EXIT_MEDICAL_/_HEALTH)","(MEDICAL_YES, TIMING_EARLY)",0.074468,0.7,9.4
183,"(DAYS_BIN_0_7, MEDICAL_YES)","(TIMING_EARLY, EXIT_MEDICAL_/_HEALTH)",0.074468,1.0,9.4
182,"(TIMING_EARLY, EXIT_MEDICAL_/_HEALTH)","(DAYS_BIN_0_7, MEDICAL_YES)",0.074468,0.7,9.4
173,"(MEDICAL_NO, DAYS_BIN_0_7)","(TIMING_EARLY, EXIT_FAMILY_/_PERSONAL)",0.074468,0.636364,8.545455
176,"(TIMING_EARLY, EXIT_FAMILY_/_PERSONAL)","(MEDICAL_NO, DAYS_BIN_0_7)",0.074468,1.0,8.545455
175,"(DAYS_BIN_0_7, EXIT_FAMILY_/_PERSONAL)","(MEDICAL_NO, TIMING_EARLY)",0.074468,1.0,8.545455
174,"(MEDICAL_NO, TIMING_EARLY)","(DAYS_BIN_0_7, EXIT_FAMILY_/_PERSONAL)",0.074468,0.636364,8.545455
188,"(SEASON_1, DAYS_BIN_0_7)","(MEDICAL_NO, TIMING_EARLY)",0.053191,1.0,8.545455
189,"(SEASON_1, TIMING_EARLY)","(MEDICAL_NO, DAYS_BIN_0_7)",0.053191,1.0,8.545455


## Interpreting Exit Pathways

The strongest rules reveal structured exit pathways rather than random exits.

For example:
- Early exits frequently co-occur with specific exit categories
- Medical evacuation rules tend to imply longer survival durations
- Certain exit reasons rarely appear in early time bins

These patterns support the existence of distinct exit pathways
characterized by timing, exit cause, and medical involvement.
Taken together, these rules indicate that exits on Alone follow a small number of structured pathways rather than occurring randomly. In particular, early exits are systematically associated with specific exit categories, while medical evacuations are more strongly linked to later survival durations, suggesting qualitatively different exit mechanisms.


## Limitations and Sensitivity

- Small sample size limits rule complexity
- Results are sensitive to binning choices for survival duration
- Rare exit categories generate sparse or unstable rules
- Association rules describe co-occurrence, not causality


## What Surprised Me

One unexpected finding was how strongly early exit timing and medical evacuation co-occurred in the association rules. I initially expected early exits to be dominated by voluntary or psychological reasons, but the rules revealed that medical evacuation frequently appeared alongside very short survival durations, suggesting that early exits are not solely driven by personal choice.

Another surprising result was the consistency of certain exit categories with later survival bins. Medical-related exits tended to co-occur with longer survival durations rather than early time bins, indicating that medical evacuation often represents a late-stage failure after prolonged endurance rather than an early weakness.

These patterns shifted my understanding of “exit pathways” from a simple early-versus-late distinction to a more structured progression involving timing, exit cause, and medical involvement. Rather than random exits, the data suggests that participants tend to follow a small number of recurring exit trajectories.

## What Did Not Work / Iterations

Early attempts to treat exit outcomes as a single categorical label—rather than a combination of timing, medical status, and survival duration bins—produced rules that were either trivial or uninformative. Additionally, using finer-grained duration bins resulted in very sparse itemsets due to the small sample size.

These iterations motivated the final transaction design, which balances interpretability with sufficient support.