## Online Activity Feature Engineering

In this section, we processed the raw `online_activity` dataset to create numeric features per customer suitable for churn modeling. The goal is to capture **digital engagement and recency of activity**.

### Original Columns
- `customer_id` – Unique identifier for each customer
- `activity_type` – Type of activity: App, Website, etc.
- `last_login_date` – Date of the last login
- `medium` – Platform used (App, Website, etc.)

### Feature Engineering Steps

1. **Filter relevant platforms**
   - Dropped `Website` medium as it was less predictive of churn.
   - Kept only platforms that meaningfully indicate engagement (e.g., App).

2. **Compute recency**
   - `last_login_date` – Most recent login per customer
   - `recency_days` – Number of days since the last login

```python
recency_days = (pd.Timestamp.today().normalize() - last_login_date).dt.days





In [14]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from narwhals.stable.v1 import Datetime
from sqlalchemy.dialects.mssql.information_schema import columns
from sqlalchemy.sql.functions import current_timestamp



In [8]:
online_activity = pd.read_csv('/Users/mac/PycharmProjects/Customer_Churn/Datasets/csv_files/crm_online_activity.csv')
online_activity.head(100)

Unnamed: 0,customer_id,last_login_date,login_frequency,service_usage
0,1,2023-10-21,34,Mobile App
1,2,2023-12-05,5,Website
2,3,2023-11-15,3,Website
3,4,2023-08-25,2,Website
4,5,2023-10-27,41,Website
...,...,...,...,...
95,96,2023-02-27,36,Mobile App
96,97,2023-12-20,38,Website
97,98,2023-05-23,9,Mobile App
98,99,2023-02-10,14,Online Banking


In [9]:
online_activity.drop('service_usage', axis=1, inplace=True)
online_activity.head()


Unnamed: 0,customer_id,last_login_date,login_frequency
0,1,2023-10-21,34
1,2,2023-12-05,5
2,3,2023-11-15,3
3,4,2023-08-25,2
4,5,2023-10-27,41


In [10]:
online_activity['last_login_date'] = pd.to_datetime(online_activity['last_login_date'])
today = pd.Timestamp.today().normalize()
online_activity['days_since_last_login'] = (today - online_activity['last_login_date']).dt.days

In [12]:
online_activity.drop('last_login_date', axis=1, inplace=True)
online_activity.head()

Unnamed: 0,customer_id,login_frequency,days_since_last_login
0,1,34,699
1,2,5,654
2,3,3,674
3,4,2,756
4,5,41,693


In [13]:
online_activity.to_csv('online_activity', index=False)