### Engineers features from the Amazon Prime user data.

In [1]:
# Import necesesary libraries 
import pandas as pd
from datetime import datetime

In [3]:
# Call the data
df = pd.read_csv("./Data/cleaned_amazon_prime_users.csv" ) 
df

Unnamed: 0,User ID,Name,Email Address,Username,Date of Birth,Gender,Location,Membership Start Date,Membership End Date,Subscription Plan,Payment Information,Renewal Status,Usage Frequency,Purchase History,Favorite Genres,Devices Used,Engagement Metrics,Feedback/Ratings,Customer Support Interactions
0,1,Ronald Murphy,williamholland@example.com,williamholland,1953-06-03,Male,Rebeccachester,2024-01-15,2025-01-14,Annual,Mastercard,Manual,Regular,Electronics,['Documentary'],['Smart TV'],Medium,3.6,3
1,2,Scott Allen,scott22@example.org,scott22,1978-07-08,Male,Mcphersonview,2024-01-07,2025-01-06,Monthly,Visa,Manual,Regular,Electronics,['Horror'],['Smartphone'],Medium,3.8,7
2,3,Jonathan Parrish,brooke16@example.org,brooke16,1994-12-06,Female,Youngfort,2024-04-13,2025-04-13,Monthly,Mastercard,Manual,Regular,Books,['Comedy'],['Smart TV'],Low,3.3,8
3,4,Megan Williams,elizabeth31@example.net,elizabeth31,1964-12-22,Female,Feliciashire,2024-01-24,2025-01-23,Monthly,Amex,Auto-renew,Regular,Electronics,['Documentary'],['Smart TV'],High,3.3,7
4,5,Kathryn Brown,pattersonalexandra@example.org,pattersonalexandra,1961-06-04,Male,Port Deborah,2024-02-14,2025-02-13,Annual,Visa,Auto-renew,Frequent,Clothing,['Drama'],['Smart TV'],Low,4.3,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2495,2496,Michael Lopez,williamsroberto@example.org,williamsroberto,1967-08-19,Male,Smithport,2024-01-25,2025-01-24,Annual,Visa,Auto-renew,Frequent,Electronics,['Comedy'],['Smartphone'],Medium,4.9,2
2496,2497,Matthew Woodard,lkaiser@example.com,lkaiser,1980-10-23,Male,Ethanport,2024-03-03,2025-03-03,Annual,Amex,Manual,Frequent,Books,['Comedy'],['Smart TV'],Medium,4.0,0
2497,2498,Morgan Barnes,erikaholland@example.net,erikaholland,1972-03-31,Female,Alexandraborough,2024-02-09,2025-02-08,Annual,Visa,Manual,Frequent,Electronics,['Documentary'],['Tablet'],Low,4.9,8
2498,2499,Gina Castaneda,reedcourtney@example.net,reedcourtney,1965-08-02,Female,Williammouth,2024-02-18,2025-02-17,Monthly,Visa,Manual,Regular,Clothing,['Comedy'],['Smartphone'],High,3.4,7


1. Temporal Features:
    - Membership Duration (Days): Calculates the duration of membership in days.
    - Age: Calculates the user's age.
    - Days Since Membership Start: Calculates the number of days since the membership started.

In [4]:
df['Membership Duration (Days)'] = (df['Membership End Date'] - df['Membership Start Date']).dt.days
df['Age'] = datetime.now().year - pd.to_datetime(df['Date of Birth']).dt.year
df['Days Since Membership Start'] = (datetime.now() - df['Membership Start Date']).dt.days

TypeError: unsupported operand type(s) for -: 'str' and 'str'

2. Categorical Features:
    - Uses pd.get_dummies() for one-hot encoding of Gender and Subscription Plan.

In [5]:
df = pd.get_dummies(df, columns=['Gender', 'Subscription Plan'], prefix=['Gender_', 'Plan_'])

3. Devices Used:
    - Creates binary features for each device type.

In [6]:
for device in ['Mobile', 'Desktop', 'Tablet', 'Smart TV']:
    df[f'Uses_{device}'] = df['Devices Used'].apply(lambda x: 1 if device in x else 0)

4. Favorite Genres:
    - Creates binary features for each genre.

In [8]:
for genre in set(df['Favorite Genres'].explode().unique()):
    df[f'Genre_{genre}'] = df['Favorite Genres'].apply(lambda x: 1 if genre in x else 0)

5. Churn Prediction:
    - Creates a binary target variable (Churned) based on whether the Membership End Date has passed.

In [9]:
df['Churned'] = (df['Membership End Date'] < datetime.now()).astype(int) 

TypeError: '<' not supported between instances of 'str' and 'datetime.datetime'