**Churn Prediction Project**

Students: Hae In Keum, Julius Enderwitz

In [85]:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from skrub import GapEncoder

In [86]:
train = pd.read_parquet("train.parquet")
test = pd.read_parquet("test.parquet")

train.shape, test.shape

((17499636, 19), (4393179, 19))

First, inspect the data structure and print out all column names. Check the datatypes.

In [87]:
train.head()

Unnamed: 0,status,gender,firstName,level,lastName,userId,ts,auth,page,sessionId,location,itemInSession,userAgent,method,length,song,artist,time,registration
0,200,M,Shlok,paid,Johnson,1749042,1538352001000,Logged In,NextSong,22683,"Dallas-Fort Worth-Arlington, TX",278,"""Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebK...",PUT,524.32934,Ich mache einen Spiegel - Dream Part 4,Popol Vuh,2018-10-01 00:00:01,2018-08-08 13:22:21
992,200,M,Shlok,paid,Johnson,1749042,1538352525000,Logged In,NextSong,22683,"Dallas-Fort Worth-Arlington, TX",279,"""Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebK...",PUT,178.02404,Monster (Album Version),Skillet,2018-10-01 00:08:45,2018-08-08 13:22:21
1360,200,M,Shlok,paid,Johnson,1749042,1538352703000,Logged In,NextSong,22683,"Dallas-Fort Worth-Arlington, TX",280,"""Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebK...",PUT,232.61995,Seven Nation Army,The White Stripes,2018-10-01 00:11:43,2018-08-08 13:22:21
1825,200,M,Shlok,paid,Johnson,1749042,1538352935000,Logged In,NextSong,22683,"Dallas-Fort Worth-Arlington, TX",281,"""Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebK...",PUT,265.50812,Under The Bridge (Album Version),Red Hot Chili Peppers,2018-10-01 00:15:35,2018-08-08 13:22:21
2366,200,M,Shlok,paid,Johnson,1749042,1538353200000,Logged In,NextSong,22683,"Dallas-Fort Worth-Arlington, TX",282,"""Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebK...",PUT,471.69261,Circlesong 6,Bobby McFerrin,2018-10-01 00:20:00,2018-08-08 13:22:21


In [88]:
train.columns

Index(['status', 'gender', 'firstName', 'level', 'lastName', 'userId', 'ts',
       'auth', 'page', 'sessionId', 'location', 'itemInSession', 'userAgent',
       'method', 'length', 'song', 'artist', 'time', 'registration'],
      dtype='object')

In [89]:
train.dtypes

status                    int64
gender                   object
firstName                object
level                    object
lastName                 object
userId                   object
ts                        int64
auth                     object
page                     object
sessionId                 int64
location                 object
itemInSession             int64
userAgent                object
method                   object
length                  float64
song                     object
artist                   object
time             datetime64[us]
registration     datetime64[us]
dtype: object

In this project, we have approximately seven weeks of user log data. We use the training set to construct a model predicting the binary variable churn yes/no in the next seven days based on user-level data from the days before the prediction window gathered from the raw log data. 

First, we see that both the training data and the test data reach from October 1, 2018 until November 20, 2018.

In [90]:
train['time'].min(), train['time'].max()


(Timestamp('2018-10-01 00:00:01'), Timestamp('2018-11-20 00:00:00'))

In [91]:
test['time'].min(), test['time'].max()

(Timestamp('2018-10-01 00:00:06'), Timestamp('2018-11-20 00:00:00'))

In [92]:
# Window configuration
OBS_DAYS = 14  # observation window length
PRED_DAYS = 10  # prediction window length
SUB_OBS_DAYS = 7  # short window inside observation

First, we create a copy of train to avoid editing the original training dataset. We compute "global timestamps" that are then used to define the observation and prediction windows. For each training subset, we define an observation mask and a prediction mask to flag users that appear in the observation window and that churn in the prediction window.

We define a function that computes our desired features given the raw training data "df", and the start and end of our observation window. It returns a DataFrame called features with all the features for this observation window. In our training process, we will have three observation windows and thus a concatenation of the three feature objects computed in this function. 

In [93]:
def create_features(df, observation_start, observation_end):

    # filter observation window
    obs_mask = (df['time'] >= observation_start) & (df['time'] < observation_end)
    df_obs = df[obs_mask]
    
    user_ids = df_obs['userId'].unique()
    features = pd.DataFrame({'userId': user_ids})
    
    # --- total_events: total number of events per user ---
    total_events = (
        df_obs.groupby('userId')
        .size()
        .reset_index(name='total_events')
    )
    features = features.merge(total_events, on='userId', how='left')
    
    # --- page-based features: page visit counts ---
    page_features = {
        'songs_played': 'NextSong',
        'roll_advert': 'Roll Advert',
        'settings': 'Settings',
        'add_playlist': 'Add to Playlist',
        'thumbs_up': 'Thumbs Up',
        'home': 'Home',
        'help': 'Help',
        'logout': 'Logout',
        'about': 'About',
        'upgrade': 'Upgrade',
        'submit_upgrade': 'Submit Upgrade',
        'thumbs_down': 'Thumbs Down',
        'downgrade': 'Downgrade',
        'add_friend': 'Add Friend',
        'error': 'Error',
        'save_settings': 'Save Settings',
        'submit_downgrade': 'Submit Downgrade',
        'login': 'Login',
        'register': 'Register',
        'submit_registration': 'Submit Registration',
    }
    
    for col_name, page_name in page_features.items():
        page_counts = (
            df_obs[df_obs['page'] == page_name]
            .groupby('userId')
            .size()
            .reset_index(name=col_name)
        )
        features = features.merge(page_counts, on='userId', how='left')
    
    # ------------------------------------------------------------------
    # NEW FEATURE SECTION (sessions, ratios, tenure 등)  ← 기존 코드
    # ------------------------------------------------------------------

    # (A) Count events per session
    session_events = (
        df_obs.groupby(['userId', 'sessionId'])
        .size()
        .reset_index(name='events_in_session')
    )
    
    num_sessions = (
        session_events.groupby('userId')['sessionId']
        .nunique()
        .reset_index(name='num_sessions')
    )
    features = features.merge(num_sessions, on='userId', how='left')
    
    events_per_session = (
        session_events.groupby('userId')['events_in_session']
        .mean()
        .reset_index(name='events_per_session')
    )
    features = features.merge(events_per_session, on='userId', how='left')
    
    # (B) songs per session + song stats
    nextsong_obs = df_obs[df_obs['page'] == 'NextSong']
    
    if not nextsong_obs.empty:
        session_songs = (
            nextsong_obs.groupby(['userId', 'sessionId'])
            .size()
            .reset_index(name='songs_in_session')
        )
        
        songs_per_session = (
            session_songs.groupby('userId')['songs_in_session']
            .mean()
            .reset_index(name='songs_per_session')
        )
        features = features.merge(songs_per_session, on='userId', how='left')
        
        song_stats = (
            nextsong_obs.groupby('userId')
            .agg(
                total_song_length=('length', 'sum'),
                avg_song_length=('length', 'mean'),
                median_song_length=('length', 'median'),
            )
            .reset_index()
        )
        features = features.merge(song_stats, on='userId', how='left')
        
    # (C) Ratios
    features['song_share'] = (
        features['songs_played'] / features['total_events'].replace(0, np.nan)
    )
    features['ad_per_song'] = (
        features['roll_advert'] / features['songs_played'].replace(0, np.nan)
    )
    features['thumbs_up_rate'] = (
        features['thumbs_up'] / features['songs_played'].replace(0, np.nan)
    )
    features['thumbs_down_rate'] = (
        features['thumbs_down'] / features['songs_played'].replace(0, np.nan)
    )
    
    # (D) Page diversity + active days
    unique_pages = (
        df_obs.groupby('userId')['page']
        .nunique()
        .reset_index(name='unique_pages')
    )
    features = features.merge(unique_pages, on='userId', how='left')
    
    active_days = (
        df_obs.assign(date=df_obs['time'].dt.date)
        .groupby('userId')['date']
        .nunique()
        .reset_index(name='active_days')
    )
    features = features.merge(active_days, on='userId', how='left')
    
    # (E) tenure_days
    if 'registration' in df.columns:
        reg = df[['userId', 'registration']].drop_duplicates('userId').copy()
        
        if not np.issubdtype(reg['registration'].dtype, np.datetime64):
            reg['registration_dt'] = pd.to_datetime(reg['registration'], unit='ms')
        else:
            reg['registration_dt'] = reg['registration']
        
        reg['tenure_days'] = (observation_end - reg['registration_dt']).dt.days
        reg = reg[['userId', 'tenure_days']]
        features = features.merge(reg, on='userId', how='left')
    
    # ------------------------------------------------------------------
    # (F) userAgent cleaning and mode extraction
    # ------------------------------------------------------------------
    if 'userAgent' in df.columns:
        ua_df = df_obs[['userId', 'userAgent']].copy()
        ua_df['userAgent'] = ua_df['userAgent'].fillna("Unknown")
        ua_df['ua_clean'] = (
            ua_df['userAgent']
            .str.lower()
            .str.replace('[^a-z0-9 ]', ' ', regex=True)
            .str.replace('\s+', ' ', regex=True)
            .str.strip()
        )
        
        ua_counts = (
            ua_df.groupby(['userId', 'ua_clean'])
            .size()
            .reset_index(name='cnt')
        )
        ua_mode = (
            ua_counts
            .sort_values(['userId', 'cnt'], ascending=[True, False])
            .drop_duplicates('userId')
            [['userId', 'ua_clean']]
        )
        ua_mode = ua_mode.rename(columns={'ua_clean': 'ua_clean_mode'})
        
        features = features.merge(ua_mode, on='userId', how='left')

    # ------------------------------------------------------------------
    # (G) NEW: recency_days
    # ------------------------------------------------------------------
    last_event_time = (
        df_obs.groupby('userId')['time']
        .max()
        .reset_index(name='last_event_time')
    )
    features = features.merge(last_event_time, on='userId', how='left')

    features['recency_days'] = (
        (observation_end - features['last_event_time'])
        .dt.total_seconds() / (60 * 60 * 24)
    )

    features.drop(columns=['last_event_time'], inplace=True)

    # ------------------------------------------------------------------
    # (H) NEW: activity_volatility
    # ------------------------------------------------------------------
    df_daily = df_obs.copy()
    df_daily['date'] = df_daily['time'].dt.floor('D')

    daily_counts = (
        df_daily.groupby(['userId', 'date'])
        .size()
        .reset_index(name='events_per_day')
    )

    volatility = (
        daily_counts.groupby('userId')['events_per_day']
        .std()
        .reset_index(name='activity_volatility')
    )

    features = features.merge(volatility, on='userId', how='left')
    # for users who only used for 1 day: std NaN → 0
    features['activity_volatility'] = features['activity_volatility'].fillna(0.0)
    
    # ------------------------------------------------------------------
    # done: numeric fill with NA
    # ------------------------------------------------------------------
    num_cols = features.select_dtypes(include=[np.number]).columns
    features[num_cols] = features[num_cols].fillna(0)
    
    return features


  .str.replace('\s+', ' ', regex=True)


The function defined above computes our features while taking the data and the bounds of the time interval as inputs. Now, we need to use this function within another function to create the user-level dataset with the userId, all the features and the binary churn target variable for one window. Since we have 7 weeks of training data and only an observation window that lasts 14 days, we fit three observation windows in the training data. 

This feature-computing function takes as inputs the raw data, the end of the observation window, i.e., the beginning of the prediction window, the duration of the observation and prediction windows and the duration of the subwindow within the observation window that captures dynamics within the observation window by computing ratios and features taking into acocunt only events that are closer to the prediction window.

We drop the observations of the users churning within the observation window since they would be flagged as "non-churners in the prediction window" despite obviously having strong churner characteristics. Thus, leaving them in would distort the model.

Concretely, in this function, we first define the time bounds of the observation and prediction windows given the input info, we then sort the training set by time and compute the (first) churn time of each user. Then, we select the users that appear in the observation window and apply the aforementioned "alive filter": We keep only the users that have not churned before the beginnning of the prediction window. Hence, we have a set of users that appear in the observation window and are still alive at the end of it.

This set alive_users containing the desired userId is now used as a mask to select the training set events of the exact users we want in this window, this subset is called df_obs_users. Then, we select all the events of our desired users that take place during the prediction window and check whether a churn occurs. The IDs of the desired users that churn during the prediction window are stored in the churned_users object.

Based on that, we construct the user-level table only on alive users. The first obvious columns are the userId and the churn flag. Then, we compute the features using the function defined above to compute features for the entire 14-day period and for the most recent 7 days before the beginning of the prediction window. To capture the dynamics, we compute certain ratios, to compare the number of events, songs played and minutes listened to music of the entire 14 days relative to the most recent 7 days. 

If no events are detected in teh observation window, the DataFrame currently displays a missing value, however it is sensible to impute those mossing values with 0 since no, i.e., zero events happened during the window. 

In [94]:
def feature_engineering_single_window(
    df,
    observation_end,
    obs_days=OBS_DAYS,
    pred_days=PRED_DAYS,
    sub_obs_days=SUB_OBS_DAYS,
    first_churn_time=None,  #  NEW: precomputed series (optional)
):
    """
    Create training rows for a single observation window ending at `observation_end`:
      -> observe [observation_end - obs_days, observation_end)
      -> predict churn in the next pred_days days

    We only include users who are STILL ALIVE at observation_end:
    i.e., users whose first churn time is >= observation_end or have never churned.

    column names still use '14d' / '7d' suffix historically,
    but the actual durations are controlled by obs_days / sub_obs_days.
    """
    delta_obs = pd.Timedelta(days=obs_days)
    delta_pred = pd.Timedelta(days=pred_days)
    delta_sub = pd.Timedelta(days=sub_obs_days)

    observation_start = observation_end - delta_obs
    pred_start = observation_end
    pred_end = observation_end + delta_pred

    df = df.sort_values('time')

    # 0) if first_churn_time not provided, compute inside (slower, but safe default)
    if first_churn_time is None:
        churn_events = df[df['page'] == 'Cancellation Confirmation']
        first_churn_time = (
            churn_events.groupby('userId')['time']
            .min()
        )  # Series: index=userId

    # 1) users in observation window (raw)
    obs_mask = (df['time'] >= observation_start) & (df['time'] < observation_end)
    users_in_obs_window = df.loc[obs_mask, 'userId'].unique()
    if len(users_in_obs_window) == 0:
        return None

    # 1.5) ALIVE FILTER
    # Keep only users who have NOT churned before observation_end.
    # (Either no churn at all, or first churn_time >= observation_end)
    alive_users = []
    for uid in users_in_obs_window:
        fct = first_churn_time.get(uid, pd.NaT)
        if pd.isna(fct) or (fct >= observation_end):
            alive_users.append(uid)

    if len(alive_users) == 0:
        return None

    alive_users = np.array(alive_users)
    df_obs_users = df[df['userId'].isin(alive_users)]

    # 2) churn in prediction window (only alive users)
    pred_mask = (df['time'] >= pred_start) & (df['time'] < pred_end)
    df_pred = df[pred_mask & df['userId'].isin(alive_users)]
    churn_in_pred_window = df_pred[df_pred['page'] == 'Cancellation Confirmation']
    churned_users = churn_in_pred_window['userId'].unique()

    # 3) user-level label (only alive users)
    user_data = pd.DataFrame({'userId': alive_users})
    user_data['churned'] = np.where(
        user_data['userId'].isin(churned_users), 1, 0
    )

    # 4) full observation-window features
    features_14 = create_features(df_obs_users, observation_start, observation_end)
    user_data = user_data.merge(features_14, on='userId', how='left')

    # 5) last sub_obs_days features inside the observation window
    last_sub_start = observation_end - delta_sub
    features_7 = create_features(df_obs_users, last_sub_start, observation_end)
    user_data = user_data.merge(
        features_7,
        on='userId',
        how='left',
        suffixes=('', '_7d')  # keep suffix for compatibility
    )

    # 6) ratios full-window vs sub-window
    ratio_specs = [
        ('total_events',      'total_events_7d',      'event_ratio_14d_7d'),
        ('songs_played',      'songs_played_7d',      'song_ratio_14d_7d'),
        ('total_song_length', 'total_song_length_7d', 'song_length_ratio_14d_7d'),
    ]
    for col_14, col_7, col_ratio in ratio_specs:
        if col_14 in user_data.columns and col_7 in user_data.columns:
            user_data[col_ratio] = user_data[col_14] / (user_data[col_7] + 1)

    # 7) fill numeric NaNs
    num_cols = user_data.select_dtypes(include=[np.number]).columns
    user_data[num_cols] = user_data[num_cols].fillna(0)

    # keep which window this row came from (for debugging)
    user_data['obs_end'] = observation_end

    return user_data

In this last important function, we compute the backward-sliding observation windows. Using the two functions defined above, the features are computed given the raw training data and the time bounds of the observation window. 

The first observation window obviously must end early enough such that one entire prediction window still fits in the training data. The earliest possible observation window can start from the first observatio in the training data. Given those bounds, we start with the latest possible observation window, compute the user-level features and then step back by one length of the observation window to compute the second one. Using the training data, three complete observation windows can be constructed without overlap.

To avoid bugs, we included ValueErrors if there is not enogh training data in order to do the backward sliding. Though, we know for our case that there is enough data since one step contains two weeks of observations and the data spans more than seven weeks of events.

Using print commands, we designed user-friendly updates of the current progress of computing features. For each step, the userl-level data is appended into the all_frames list which is then concatenated into one big object containing the features for all three windows.

In [95]:
def feature_engineering_multi_windows(
    df,
    n_windows=3,
    obs_days=OBS_DAYS,
    pred_days=PRED_DAYS,
    sub_obs_days=SUB_OBS_DAYS,
    step_days=OBS_DAYS,
):
    """
    Build training data using multiple backward-sliding observation windows.

    Why backward sliding?
    
    In the test scenario, we predict churn at the end of the dataset:
        -> observe the last `obs_days`
        -> predict churn in the next `pred_days`

    To mimic this scenario during training, each training window should also:
        -> end close to the dataset’s end (latest region)
        -> slide backward by fixed steps

    
    Window definition:
    
        Observation window:
            [obs_end - obs_days, obs_end)
        
        Prediction window:
            [obs_end, obs_end + pred_days)
        
        obs_end values are chosen as:
            latest_obs_end,
            latest_obs_end - step_days,
            latest_obs_end - 2*step_days, ...
    
    Parameters
    
    df : DataFrame
        Full event-level dataset, i.e. train dataset
    n_windows : 
        Number of backward windows to generate, three in our case
    obs_days : 
        Length of the observation window, we use two weeks, i.e. 14 days
    pred_days : 
        Length of the prediction horizon, we use seven days
    sub_obs_days : 
        Length of the short-term observation subwindow (passed through), seven days in our case
    step_days : 
        How many days to slide backward when generating each window.
    
    This returns a DataFrame with stacked training rows from all backward windows.
    """

    df = df.sort_values('time').copy()

    # Basic timestamps
    min_ts = df['time'].min()
    max_ts = df['time'].max()

    delta_obs = pd.Timedelta(days=obs_days)
    delta_pred = pd.Timedelta(days=pred_days)
    delta_step = pd.Timedelta(days=step_days)

    # The observation_end must satisfy BOTH:
    #   1) enough history → obs_end >= min_ts + obs_days
    #   2) enough future for labeling → obs_end + pred_days <= max_ts
    earliest_obs_end = min_ts + delta_obs
    latest_obs_end   = max_ts - delta_pred

    if latest_obs_end <= earliest_obs_end:
        print("Not enough time range for multiple windows; using a single one.")
        return feature_engineering_single_window(
            df,
            earliest_obs_end,
            obs_days=obs_days,
            pred_days=pred_days,
            sub_obs_days=sub_obs_days,
        )

    # -----------------------------------------------------------
    # NEW: Backward sliding selection of observation_end values
    # -----------------------------------------------------------
    obs_ends = []
    current_obs_end = latest_obs_end

    for _ in range(n_windows):
        if current_obs_end < earliest_obs_end:
            break
        obs_ends.append(current_obs_end)
        current_obs_end = current_obs_end - delta_step

    if not obs_ends:
        raise ValueError("No valid observation_end values generated.")

    all_frames = []

    # -----------------------------------------------------------
    # Build training rows for each backward window
    # -----------------------------------------------------------
    for i, obs_end in enumerate(obs_ends):
        obs_start = obs_end - delta_obs
        print(
            f"Window {i+1}/{len(obs_ends)} (backward): "
            f"observe [{obs_start.date()} – {obs_end.date()}) "
            f"({obs_days} days) → predict next {pred_days} days"
        )

        window_df = feature_engineering_single_window(
            df,
            obs_end,
            obs_days=obs_days,
            pred_days=pred_days,
            sub_obs_days=sub_obs_days,
        )

        if window_df is None or window_df.empty:
            continue

        window_df['window_id'] = i
        all_frames.append(window_df)

    if not all_frames:
        raise ValueError("All windows were empty in backward sliding.")

    # Final stacked training dataset
    user_data_all = pd.concat(all_frames, ignore_index=True)
    return user_data_all


In [96]:
user_data_train = feature_engineering_multi_windows(train, n_windows=3)

Window 1/2 (backward): observe [2018-10-27 – 2018-11-10) (14 days) → predict next 10 days
Window 2/2 (backward): observe [2018-10-13 – 2018-10-27) (14 days) → predict next 10 days


Finally, we use the user agent string variable since the operating system or browser used might influence the user's experience and thus his inclination to churn. This is done using the GapEncoder that was also part of the Labs. The string is transformed into thre numerical components that are then used as numerical predictors in our churn user-level data.

In [97]:
# === !!!! NEW !!! UA GapEncoder: train fit + transform ===
ua_train = (
    user_data_train['ua_clean_mode']
    .fillna("unknown")
    .astype(str)
)

ge = GapEncoder(n_components=3, random_state=42)
ua_emb_train = ge.fit_transform(ua_train)

ua_emb_train = np.asarray(ua_emb_train)

user_data_train['ua_gap_0'] = ua_emb_train[:, 0]
user_data_train['ua_gap_1'] = ua_emb_train[:, 1]
user_data_train['ua_gap_2'] = ua_emb_train[:, 2]

user_data_train.head()


Unnamed: 0,userId,churned,total_events,songs_played,roll_advert,settings,add_playlist,thumbs_up,home,help,...,recency_days_7d,activity_volatility_7d,event_ratio_14d_7d,song_ratio_14d_7d,song_length_ratio_14d_7d,obs_end,window_id,ua_gap_0,ua_gap_1,ua_gap_2
0,1105959,0,341,286.0,1.0,0.0,14.0,9.0,15.0,2.0,...,0.0,0.0,0.0,0.0,0.0,2018-11-10,0,0.000875,0.000738,0.993279
1,1372655,0,1182,983.0,1.0,6.0,28.0,47.0,45.0,10.0,...,0.004861,53.96295,2.006791,1.993915,2.003268,2018-11-10,0,0.001007,0.000675,1.055979
2,1454596,0,485,387.0,1.0,2.0,12.0,50.0,14.0,5.0,...,0.899178,52.829916,1.447761,1.407273,1.405386,2018-11-10,0,0.69046,0.00047,0.000577
3,1494979,1,542,431.0,29.0,1.0,11.0,11.0,14.0,3.0,...,1.848565,94.045202,2.203252,2.033019,2.074129,2018-11-10,0,0.000529,0.582131,0.590078
4,1732044,0,343,285.0,0.0,4.0,9.0,16.0,11.0,3.0,...,0.548877,0.0,2.766129,2.794118,2.86473,2018-11-10,0,0.001012,0.000684,1.055966


In [98]:
user_data_train.shape

(26899, 84)

In [None]:
# drop columns ua_clean_mode and ua_clean_mode_7d
user_data_train = user_data_train.drop(
    columns=['ua_clean_mode', 'ua_clean_mode_7d', 'obs_end', 'window_id']
)

In [196]:
# export as csv
user_data_train.to_csv("user_data_train_enhanced.csv", index=False)

In [250]:
# read in csv of user_data_train
user_data_train = pd.read_csv("user_data_train_enhanced.csv")
print(user_data_train.shape)
user_data_train.head()

(26899, 80)


Unnamed: 0,userId,churned,total_events,songs_played,roll_advert,settings,add_playlist,thumbs_up,home,help,...,active_days_7d,tenure_days_7d,recency_days_7d,activity_volatility_7d,event_ratio_14d_7d,song_ratio_14d_7d,song_length_ratio_14d_7d,ua_gap_0,ua_gap_1,ua_gap_2
0,1105959,0,341,286.0,1.0,0.0,14.0,9.0,15.0,2.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000875,0.000738,0.993279
1,1372655,0,1182,983.0,1.0,6.0,28.0,47.0,45.0,10.0,...,7.0,41.0,0.004861,53.96295,2.006791,1.993915,2.003268,0.001007,0.000675,1.055979
2,1454596,0,485,387.0,1.0,2.0,12.0,50.0,14.0,5.0,...,4.0,213.0,0.899178,52.829916,1.447761,1.407273,1.405386,0.69046,0.00047,0.000577
3,1494979,1,542,431.0,29.0,1.0,11.0,11.0,14.0,3.0,...,2.0,67.0,1.848565,94.045202,2.203252,2.033019,2.074129,0.000529,0.582131,0.590078
4,1732044,0,343,285.0,0.0,4.0,9.0,16.0,11.0,3.0,...,1.0,75.0,0.548877,0.0,2.766129,2.794118,2.86473,0.001012,0.000684,1.055966


After the feature engineering, we print a list of all the features. Though the amount of features might scream overfitting, omitting the features with a low feature importance accoring to the respective sklearn feature importance measure, led to a decrease in performance on unseen data. Thus, we decided to keep the features in our model.

Givwn our user-level features and the binary churn target variable, we can train our model. To this end, we make a train-validation split with the validation set size 0.2 and define the pipeline:

We have four families of classifiers as candidates: Logistic Regression as baseline, Random Forest, Gradient Boosting, LightGBM and AdaBoost. For each classifier, we define a parameter grid of various potential hyperparameters and configure a times series split within our Randomized Search CV. Then, we run 50 different model specifications and maximize the f1 score to get the best_model.

Using the best_model, we can tune the threshold for predicting churn given the predicted probability. We use 37 candidate thresholds between 0.05 and 0.95 and compare the belanced accuracy for each threshold. It turns out that - based on validation set performance - the threshold 0.4 leads to the best balanced accuracy. This threshold will consequently be used in our test set predictions. The balanced accuracy on the validation set with the optimal threshold is 0.6523 and serves thus as a proxy of the performance on unseen data though there is some leakage due to the threshold tuning making it a bit optimistic. We see that the default threshold of 0.5 would have led to a lower balanced accuracy in the validation set.  

In [268]:
import numpy as np
import pandas as pd

from sklearn.model_selection import GroupShuffleSplit, StratifiedKFold, RandomizedSearchCV
from sklearn.model_selection import StratifiedGroupKFold
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.feature_selection import SelectKBest, mutual_info_classif
from sklearn.metrics import balanced_accuracy_score, roc_auc_score

from lightgbm import LGBMClassifier

# -----------------------
# 0) INPUTS (adapt to your variable names)
# -----------------------
# df: user-level table, one row per user
# columns: ['userId', 'churned', ...84 feature columns...]
# All features numeric (if not, encode before this step)
df = user_data_train.copy()

target_col = "churned"
group_col = "userId"

# Define features: all numeric columns except target and group
features = df.drop(columns=[target_col, group_col]).select_dtypes(include=[np.number]).columns.tolist()

y = df[target_col].astype(int)
groups = df[group_col]
X = df.loc[:, features].copy()


# -----------------------
# 1) HELD-OUT VALIDATION SPLIT (user-level, no leakage)
# -----------------------
gss = GroupShuffleSplit(n_splits=1, test_size=0.20, random_state=42)
train_idx, val_idx = next(gss.split(X, y, groups=groups))

X_train, y_train, g_train = X.iloc[train_idx], y.iloc[train_idx], groups.iloc[train_idx]
X_val,   y_val,   g_val   = X.iloc[val_idx],   y.iloc[val_idx],   groups.iloc[val_idx]

# -----------------------
# 2) PIPELINE: IMPUTE -> FEATURE SELECTION -> LIGHTGBM
#    Everything inside the pipeline => feature selection is fit only on CV-train folds
# -----------------------
base_lgbm = LGBMClassifier(
    objective="binary",
    boosting_type="gbdt",
    random_state=42,
    n_jobs=-1,
    # Important: use a large number of trees; we will rely on early stopping later
    n_estimators=5000
)

pipe = Pipeline(steps=[
    ("select", SelectKBest(score_func=mutual_info_classif)),
    ("clf", base_lgbm)
])

# -----------------------
# 3) GROUP-AWARE CV INSIDE RANDOM SEARCH
#    Prevents same user in both train and CV fold
# -----------------------
cv = StratifiedGroupKFold(n_splits=5, shuffle=True, random_state=42)

# -----------------------
# 4) PARAMETER DISTRIBUTIONS (anti-overfit leaning)
#    Includes feature subset size k as a hyperparameter
# -----------------------
param_distributions = {
    # feature selection: try different subset sizes out of 84
    "select__k": [15, 20, 25, 30],

    # conservative tree complexity
    "clf__max_depth": [2, 3, 4, 5],
    "clf__num_leaves": [7, 15, 31],   # keep small; leaf-wise growth can overfit
    "clf__min_child_samples": [50, 100, 200],
    "clf__min_split_gain": [0.0, 0.05, 0.1],

    # stochastic regularization
    "clf__subsample": [0.6, 0.8],
    "clf__subsample_freq": [1],       # ensures subsampling is actually used
    "clf__colsample_bytree": [0.6, 0.8],

    # shrink leaf values (often strong against overfitting)
    "clf__reg_alpha": [0.0, 0.1, 1.0],
    "clf__reg_lambda": [1.0, 5.0, 10.0],

    # learning dynamics
    "clf__learning_rate": [0.01, 0.03],
}

# Choose a robust metric for churn (you can switch to "roc_auc" too)
search = RandomizedSearchCV(
    estimator=pipe,
    param_distributions=param_distributions,
    n_iter=50,
    scoring="balanced_accuracy",
    cv=cv,
    n_jobs=-1,
    verbose=1,
    random_state=42,
    refit=True
)

# -----------------------
# 5) FIT RANDOM SEARCH ON TRAIN ONLY (validation is untouched)
# -----------------------
search.fit(X_train, y_train, groups=g_train)

print("Best CV score:", search.best_score_)
print("Best params:", search.best_params_)

# -----------------------
# 6) FINAL FIT WITH EARLY STOPPING USING HELD-OUT VALIDATION
#    This combats overfitting by stopping at optimal #trees for generalization to val.
# -----------------------
best_pipe = search.best_estimator_

best_pipe.named_steps["clf"].set_params(
    n_estimators=5000  # keep large; let early stopping find best iteration
)




Fitting 5 folds for each of 50 candidates, totalling 250 fits
[LightGBM] [Info] Number of positive: 1039, number of negative: 20483
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.002482 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3725
[LightGBM] [Info] Number of data points in the train set: 21522, number of used features: 30
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.048276 -> initscore=-2.981337
[LightGBM] [Info] Start training from score -2.981337
Best CV score: 0.505894475750316
Best params: {'select__k': 30, 'clf__subsample_freq': 1, 'clf__subsample': 0.8, 'clf__reg_lambda': 10.0, 'clf__reg_alpha': 0.0, 'clf__num_leaves': 31, 'clf__min_split_gain': 0.0, 'clf__min_child_samples': 100, 'clf__max_depth': 5, 'clf__learning_rate': 0.03, 'clf__colsample_bytree': 0.6}


0,1,2
,boosting_type,'gbdt'
,num_leaves,31
,max_depth,5
,learning_rate,0.03
,n_estimators,5000
,subsample_for_bin,200000
,objective,'binary'
,class_weight,
,min_split_gain,0.0
,min_child_weight,0.001


In [269]:
from lightgbm import early_stopping

preprocessor = best_pipe[:-1]

X_train_tr = preprocessor.transform(X_train)
X_val_tr   = preprocessor.transform(X_val)

clf = best_pipe.named_steps["clf"]

clf.fit(
    X_train_tr, y_train,
    eval_set=[(X_val_tr, y_val)],
    eval_metric="balanced_accuray",
    callbacks=[early_stopping(stopping_rounds=100, verbose=False)]
)

# -----------------------
# 7) EVALUATE ON HELD-OUT VALIDATION
# -----------------------
val_proba = best_pipe.predict_proba(X_val)[:, 1]
val_pred = (val_proba >= 0.5).astype(int)

print("Validation balanced accuracy:", balanced_accuracy_score(y_val, val_pred))
print("Validation ROC AUC:", roc_auc_score(y_val, val_proba))

# Optional: inspect how many features were selected
k_selected = best_pipe.named_steps["select"].get_params()["k"]
print("Selected k features:", k_selected)

[LightGBM] [Info] Number of positive: 1039, number of negative: 20483
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001880 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3725
[LightGBM] [Info] Number of data points in the train set: 21522, number of used features: 30
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.048276 -> initscore=-2.981337
[LightGBM] [Info] Start training from score -2.981337
Validation balanced accuracy: 0.5
Validation ROC AUC: 0.720217027190763
Selected k features: 30




In [270]:
# print selected feature names
selected_mask = best_pipe.named_steps["select"].get_support()
selected_features = X.columns[selected_mask]
print("Selected feature names:", selected_features.tolist())

Selected feature names: ['total_events', 'songs_played', 'settings', 'add_playlist', 'home', 'help', 'logout', 'thumbs_down', 'downgrade', 'add_friend', 'num_sessions', 'total_song_length', 'thumbs_up_rate', 'thumbs_down_rate', 'unique_pages', 'active_days', 'recency_days', 'activity_volatility', 'songs_played_7d', 'thumbs_up_7d', 'home_7d', 'logout_7d', 'thumbs_down_7d', 'num_sessions_7d', 'events_per_session_7d', 'avg_song_length_7d', 'unique_pages_7d', 'active_days_7d', 'recency_days_7d', 'event_ratio_14d_7d']


For a better overview, we plot again the classification report on the validation set with the tuned threshold 0.4. We recall 61% of churners in the validation set whil ekeeping a precision of 98% for non-churners. As the highest balanced accuracy is associated with the prediction of more churners than there are actually, i.e., a rather prudent forecast, only 7% of the flagged users actually churned. 

***Additional Metrics:***

Given the computed probs_val, we compute the churn predictions for different thresholds and check for the best threshold in terms of different performance measures. 

First, we maximize the F1 score, i.e., the harmonic mean of precision and recall. The best F1 score is actually obtained by choosing a threshold greater than the default value of 0.5, 0.689 as threshold maximizes the F1 score (F1 = 0.1672). However, the balanced accuracy suffers under this specification and we decided not to move forward with this measure.

In [271]:
from sklearn.metrics import f1_score

candidate_thresholds = np.linspace(0.01, 0.95, 37)

best_thr_f1 = 0.5
best_f1 = -1.0

for thr in candidate_thresholds:
    y_pred_thr = (val_proba >= thr).astype(int)
    f1 = f1_score(y_val, y_pred_thr)
    if f1 > best_f1:
        best_f1 = f1
        best_thr_f1 = thr

print(f"\n[Threshold tuning by F1]")
print(f"Best threshold: {best_thr_f1:.3f}")
print(f"Best F1: {best_f1:.4f}")
print(classification_report(y_val, (val_proba >= best_thr_f1).astype(int)))


[Threshold tuning by F1]
Best threshold: 0.088
Best F1: 0.2024
              precision    recall  f1-score   support

           0       0.96      0.89      0.92      5102
           1       0.14      0.34      0.20       275

    accuracy                           0.86      5377
   macro avg       0.55      0.62      0.56      5377
weighted avg       0.92      0.86      0.89      5377



In [272]:
from sklearn.metrics import classification_report

candidate_thresholds = np.linspace(0.01, 0.95, 37)

best_thr_bacc = 0.5
best_bacc = -1.0

for thr in candidate_thresholds:
    y_pred_thr = (val_proba >= thr).astype(int)
    bacc = balanced_accuracy_score(y_val, y_pred_thr)
    if bacc > best_bacc:
        best_bacc = bacc
        best_thr_bacc = thr

print(f"\n[Threshold tuning by Balanced Accuracy]")
print(f"Best threshold: {best_thr_bacc:.3f}")
print(f"Best Balanced Accuracy: {best_bacc:.4f}")
print(classification_report(y_val, (val_proba >= best_thr_bacc).astype(int)))


[Threshold tuning by Balanced Accuracy]
Best threshold: 0.062
Best Balanced Accuracy: 0.6594
              precision    recall  f1-score   support

           0       0.97      0.77      0.86      5102
           1       0.11      0.55      0.19       275

    accuracy                           0.76      5377
   macro avg       0.54      0.66      0.53      5377
weighted avg       0.93      0.76      0.83      5377



Additionally, we might be interested to maximize the precision, i.e., the proportion of churn predictions that turned out to flag actual churners. While optimizing the threshold to obtain the highest precision, we impose the constraint that the recall must be at least 50%, i.e., we mus t identify at least 50% of the churners and at the same time achieve a high precision. However, we deemed a high precision not to be the most important measure in this case especially combined with this constraint. If we punish "over-prudent churn predictions" of people that did not actually churn, while imposing a recall of more than 50%, the model has to deal with two objectives pointing in opposite directions: We want to detect most churners, which is hard enough and at the same time be cautious by avoiding false alerts. It is thus not surprising that even the best precision is below 10% in this scenario.

In [273]:
from sklearn.metrics import precision_score, recall_score

candidate_thresholds = np.linspace(0.01, 0.95, 37)

target_recall = 0.5
best_thr_prec = None
best_prec = -1.0

for thr in candidate_thresholds:
    y_pred_thr = (val_proba >= thr).astype(int)
    rec = recall_score(y_val, y_pred_thr)
    prec = precision_score(y_val, y_pred_thr, zero_division=0)
    
    if rec >= target_recall and prec > best_prec:
        best_prec = prec
        best_thr_prec = thr

print(f"\n[Threshold tuning: max precision | recall >= {target_recall}]")
print(f"Best threshold: {best_thr_prec:.3f}")
print(f"Precision(1): {best_prec:.4f}")
print(classification_report(y_val, (val_proba >= best_thr_prec).astype(int)))


[Threshold tuning: max precision | recall >= 0.5]
Best threshold: 0.062
Precision(1): 0.1149
              precision    recall  f1-score   support

           0       0.97      0.77      0.86      5102
           1       0.11      0.55      0.19       275

    accuracy                           0.76      5377
   macro avg       0.54      0.66      0.53      5377
weighted avg       0.93      0.76      0.83      5377



***End of presentation of other scores***


Now, we assess the best model that we have obtained: It is a LGBMClassifier with learning rate 0.05, balanced class weights and a subsample hyperparameter of  0.6.

In [274]:
best_pipe

0,1,2
,steps,"[('select', ...), ('clf', ...)]"
,transform_input,
,memory,
,verbose,False

0,1,2
,score_func,<function mut...001F69031CEA0>
,k,30

0,1,2
,boosting_type,'gbdt'
,num_leaves,31
,max_depth,5
,learning_rate,0.03
,n_estimators,5000
,subsample_for_bin,200000
,objective,'binary'
,class_weight,
,min_split_gain,0.0
,min_child_weight,0.001


Finally, we use the raw test set to compute the same features that will then be used to predict churn. We can use the functions defined above. 

The feature_engineering_test function takes as inputs the raw test data, the length of the entire observation window and the length of the most recent subwindow to capture dynamics within the 14-day observation window. Those lengths are kept the same and already defined in the kernel.

Again, we defien the global timestamps, determine the bounds of the window and the subwindow, select from the entire test set the users that appear in the observation window and compute the features for those users once for the complete 14-day window and once for the 7-days subwindow. Those features are merged in one user-level test dataframe and we compute the same ratios as in the training set to capture daynamics within the observation window. Again, missing values can be imputed with zeros since missing events in the observation window correspond to zero events in this context. 

In [119]:
def feature_engineering_test(
    df,
    obs_days=OBS_DAYS,
    sub_obs_days=SUB_OBS_DAYS
):
    # global timestamps
    min_ts = df['time'].min()
    max_ts = df['time'].max()

    delta_obs = pd.Timedelta(days=obs_days)
    delta_sub = pd.Timedelta(days=sub_obs_days)

    # observation window for test: last obs_days of the dataset
    observation_start = max_ts - delta_obs
    observation_end = max_ts

    # mask for observation window
    obs_mask_all = (df['time'] >= observation_start) & (df['time'] < observation_end)

    # users that appear in the observation window
    users_in_obs_window = df.loc[obs_mask_all, 'userId'].unique()

    # restrict df to those users
    df_obs = df[df['userId'].isin(users_in_obs_window)]

    # base user table for test users
    user_data_obs = pd.DataFrame({'userId': users_in_obs_window})

    # --------------------------------------------------------------
    # full-window features (same as train, using the same create_features)
    # --------------------------------------------------------------
    features_14 = create_features(df_obs, observation_start, observation_end)
    user_data_obs = user_data_obs.merge(features_14, on='userId', how='left')

    # --------------------------------------------------------------
    # sub-window features for test (last sub_obs_days of observation window)
    # --------------------------------------------------------------
    last_sub_start = observation_end - delta_sub
    features_7 = create_features(df_obs, last_sub_start, observation_end)

    # merge sub-window features; overlapping columns get suffix "_7d"
    user_data_obs = user_data_obs.merge(
        features_7,
        on='userId',
        how='left',
        suffixes=('', '_7d')
    )

    # --------------------------------------------------------------
    # same full vs sub ratio features as in train
    # --------------------------------------------------------------
    ratio_specs = [
        ('total_events',        'total_events_7d',        'event_ratio_14d_7d'),
        ('songs_played',        'songs_played_7d',        'song_ratio_14d_7d'),
        ('total_song_length',   'total_song_length_7d',   'song_length_ratio_14d_7d'),
    ]

    for col_14, col_7, col_ratio in ratio_specs:
        if col_14 in user_data_obs.columns and col_7 in user_data_obs.columns:
            user_data_obs[col_ratio] = user_data_obs[col_14] / (user_data_obs[col_7] + 1)

    num_cols = user_data_obs.select_dtypes(include=[np.number]).columns
    user_data_obs[num_cols] = user_data_obs[num_cols].fillna(0)

    # --------------------------------------------------------------
    # keep all userIds from the original test, merge with obs users
    # --------------------------------------------------------------
    all_users = df['userId'].unique()
    user_base_test = pd.DataFrame({'userId': all_users})
    user_base_test = user_base_test.merge(user_data_obs, on='userId', how='left')

    return user_base_test


Just like in the training set, we encode the user agent using the same GapEncoder with three components. To avoid data leakage, it is important to use the same GapEncoder as in the training set. Now, we have created our full user-level test set that contains the same features as the user-level train set. This dataset can now be used to predict the churn.

In [120]:
user_data_test = feature_engineering_test(test, obs_days=OBS_DAYS, sub_obs_days=SUB_OBS_DAYS)

# === NEW !!!!! UA GapEncoder ===
ua_test = (
    user_data_test['ua_clean_mode']
    .fillna("unknown")
    .astype(str)
)

ua_emb_test = ge.transform(ua_test)
ua_emb_test = np.asarray(ua_emb_test)

user_data_test['ua_gap_0'] = ua_emb_test[:, 0]
user_data_test['ua_gap_1'] = ua_emb_test[:, 1]
user_data_test['ua_gap_2'] = ua_emb_test[:, 2]

# ====================================================================
user_data_test.fillna(0, inplace=True)
user_data_test.head()

Unnamed: 0,userId,total_events,songs_played,roll_advert,settings,add_playlist,thumbs_up,home,help,logout,...,tenure_days_7d,ua_clean_mode_7d,recency_days_7d,activity_volatility_7d,event_ratio_14d_7d,song_ratio_14d_7d,song_length_ratio_14d_7d,ua_gap_0,ua_gap_1,ua_gap_2
0,1465194,426.0,351.0,7.0,1.0,10.0,19.0,12.0,2.0,3.0,...,53.0,mozilla 5 0 windows nt 6 1 wow64 applewebkit 5...,1.920856,82.672043,1.56044,1.56,1.56111,0.001012,0.000684,1.055966
1,1261737,166871.0,0.0,0.0,0.0,0.0,0.0,87522.0,5651.0,0.0,...,49.0,unknown,2.3e-05,2814.260285,2.027841,0.0,0.0,0.007145,0.000349,0.000527
2,1527155,351.0,301.0,0.0,3.0,5.0,13.0,8.0,3.0,4.0,...,0.0,0,0.0,0.0,0.0,0.0,0.0,0.001155,0.000836,1.013825
3,1507202,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0,0.0,0.0,0.0,0.0,0.0,0.007145,0.000349,0.000527
4,1429412,2288.0,1923.0,0.0,10.0,53.0,108.0,77.0,12.0,19.0,...,76.0,mozilla 5 0 iphone cpu iphone os 7 1 2 like ma...,5.8e-05,84.736567,1.416718,1.416053,1.417149,0.000567,1.349201,0.000816


In [121]:
# export as test user data csv
user_data_test.to_csv("user_data_test_enhanced.csv", index=False)

In [275]:
# read in test user data csv
user_data_test = pd.read_csv("user_data_test_enhanced.csv")

In [258]:
user_data_test.shape

(2904, 81)

To predict churn, we select all the features from the dataset, initialize a prediction vector of the according shape and replace the entreis of the vector with the prediction of the churn. To this end, we use the threshold of 0.4 which was determined by tuning the hyperparameter based on the validation set performance. 

To export a suitable prediction file, we create a pandas DataFrame containing "id" and "target", verfy the correct length of 2904 and export it as a csv file which can then be submitted on kaggle.

In [281]:
df_test = user_data_test.copy()

# add missing required features (fill with 0 or NaN; choose what matches your feature meaning)
missing = [c for c in features if c not in df_test.columns]
if missing:
    df_test = df_test.assign(**{c: 0 for c in missing})  # or np.nan
    print("Added missing columns:", missing)

# drop any extra columns not in features (except userId)
extra = [c for c in df_test.columns if c not in features + ["userId"]]
df_test = df_test.drop(columns=extra)
print("Dropped extra columns:", extra)

X_test = df_test.loc[:, features].copy()

assert list(X_test.columns) == features

# Use the full pipeline directly
test_proba = best_pipe.predict_proba(X_test)[:, 1]

# apply threshold from balanced accuracy tuning
test_pred = (test_proba >= best_thr_bacc).astype(int)

# build submission df
predictions_df = pd.DataFrame({
    "id": user_data_test[group_col],  # Use original user_data_test for userId
    "target": test_pred
})

predictions_df.head()

Dropped extra columns: ['ua_clean_mode', 'ua_clean_mode_7d']




Unnamed: 0,id,target
0,1465194,1
1,1261737,0
2,1527155,0
3,1507202,0
4,1429412,1


In [264]:
# Debug: Check what columns are in features vs user_data_test
print("Features expected:", len(features))
print("Columns in X_test:", len(X_test.columns))

# Check which features are missing from test
missing_in_test = set(features) - set(user_data_test.columns)
if missing_in_test:
    print("\nMissing in test data:", missing_in_test)

# Check which columns in test are not in features
extra_in_test = set(user_data_test.columns) - set(features) - {'userId'}
if extra_in_test:
    print("\nExtra in test data (not in features):", extra_in_test)

# drop any extra columns in test that are not in features


Features expected: 78
Columns in X_test: 78

Extra in test data (not in features): {'ua_clean_mode_7d', 'ua_clean_mode'}


In [277]:
predictions_df['target'].value_counts()

target
0    2036
1     868
Name: count, dtype: int64

In [230]:
X_test.shape, y_pred.shape

((2904, 78), (2904,))

In [278]:
# export as csv
predictions_df.to_csv("churn_predictions.csv", index=False)

In [249]:
predictions_df.shape

(2904, 2)

We see that the proportion of churn predictions is 39% though the actual proportion of churners is lower, as the performance measures on the validation set indicate. This is due to the prudent nature of our model that reaches a high balanced accuracy by aiming for a high recall of churners, willingly accepting a high rate of fals positives. 

In [279]:
# count values of predictions_df target
predictions_df['target'].mean()

np.float64(0.2988980716253444)

***FINISH***