<a href="https://colab.research.google.com/github/WiseCat-Git/GoogleColab_ChatGPT_Data_Analysis/blob/main/Ecommerce_Case_study_part2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

I have a dataset "events.csv" take from
https://www.kaggle.com/datasets/mkechinov/ecommerce-events-history-in-electronics-store.


Now let's do feature engineering as our next step in the process towards
the final goal - predicting churn. Let' start with defining which features may help us predict churn.


Recency of Activity:
Measures how recently a user interacted with the store.
It's expected that users who have recently interacted might be less likely to churn.
Success Probability: 75%

Frequency of Activity: Represents the overall engagement of a user
by counting the number of events per user. More frequent interaction
likely indicates lower churn. Success Probability: 80%

Monetary Value: Measures the financial value of a user based on their purchases.
Users who have contributed more revenue may be less likely to churn.
Success Probability: 80%

Number of Sessions: More sessions might indicate a more engaged user, thus,
less likely to churn. Requires defining what constitutes a "session".
Success Probability: 75%

Cart Abandonment Rate: A higher cart abandonment rate might indicate
dissatisfaction, thus a higher risk of churn. Success Probability: 70%

Here are the top six feature ideas to predict churn from the recent list, rated based on potential success and confidence:

Total Quantity Purchased: This feature represents the total quantity of items a user has purchased. It's a direct measure of user engagement and satisfaction. It's fairly easy to compute and has a success probability of 75% and a confidence level of 80%.

Average Time Between Purchases: This indicates the user's purchasing frequency and habits. Although it might require careful tracking of purchase timings, it could be a strong indicator of churn, with a success probability of 70% and a confidence level of 70%.

Ratio of Cart Additions to Purchases: A reflection of the user's conversion rate from adding to cart to actual purchase. Despite potential skewness by certain user behaviors, this feature is expected to be a good churn indicator with a success probability of 70% and a confidence level of 70%.

Average Session Duration: Measures the average time a user spends in a session, indicating their level of engagement. Requires session identification and duration calculation, this feature is expected to be beneficial with a success probability of 65% and a confidence level of 70%.

Ratio of Views to Purchases: Reflects the user's decision-making process and satisfaction level. It's easy to calculate and has a success probability of 65% and a confidence level of 70%.

Average Price of Items Purchased: Reflects the user's buying power and preference. It's simple to compute and expected to be a useful indicator with a success probability of 65% and a confidence level of 70%.



In [None]:

import numpy as np

# Pick 3 random users
random_user_ids = np.random.choice(df['user_id'].unique(), 3)

for user_id in random_user_ids:
    print(f"User ID: {user_id}")

    # Subset the DataFrame for the current user
    user_df = df[df['user_id'] == user_id]

    # Recency of Activity
    print("Calculating Recency of Activity...")
    user_most_recent_date = user_df['event_time'].max()
    recency = (most_recent_date - user_most_recent_date).days
    print(f"Most recent date for user: {user_most_recent_date}")
    print(f"Recency of Activity: {recency}")

    # Frequency of Activity
    print("Calculating Frequency of Activity...")
    frequency = user_df.shape[0]
    print(f"Number of records for user: {frequency}")

    # Monetary Value
    print("Calculating Monetary Value...")
    purchases_df = user_df[user_df['event_type'] == 'purchase']
    monetary = purchases_df['price'].sum()
    print(f"Sum of purchase prices for user: {monetary}")

    # Number of Sessions
    print("Calculating Number of Sessions...")
    num_sessions = user_df['user_session'].nunique()
    print(f"Unique sessions for user: {num_sessions}")

    # Cart Abandonment Rate
    print("Calculating Cart Abandonment Rate...")
    num_cart = (user_df['event_type'] == 'cart').sum()
    num_purchase = (user_df['event_type'] == 'purchase').sum()
    cart_abandon_rate = (num_cart - num_purchase) / num_cart if num_cart > 0 else 0
    print(f"Number of cart actions: {num_cart}")
    print(f"Number of purchase actions: {num_purchase}")
    print(f"Cart Abandonment Rate: {cart_abandon_rate}")

    # Total Quantity Purchased
    print("Calculating Total Quantity Purchased...")
    total_qty_purchased = num_purchase
    print(f"Total Quantity Purchased: {total_qty_purchased}")

    # Average Time Between Purchases
    print("Calculating Average Time Between Purchases...")
    purchase_times = purchases_df['event_time']
    avg_time_between_purchases = purchase_times.diff().mean()
    print(f"Purchase times for user: {purchase_times}")
    print(f"Average Time Between Purchases: {avg_time_between_purchases}")

    # Ratio of Cart Additions to Purchases
    print("Calculating Ratio of Cart Additions to Purchases...")
    ratio_cart_to_purchase = num_cart / num_purchase if num_purchase > 0 else 0
    print(f"Ratio of Cart Additions to Purchases: {ratio_cart_to_purchase}")

    # Average Session Duration
    print("Calculating Average Session Duration...")
    session_duration = user_df.groupby('user_session')['event_time'].apply(lambda x: x.max() - x.min())
    avg_session_duration = session_duration.mean()
    print(f"Session durations for user: {session_duration}")
    print(f"Average Session Duration: {avg_session_duration}")

    # Ratio of Views to Purchases
    print("Calculating Ratio of Views to Purchases...")
    num_views = (user_df['event_type'] == 'view').sum()
    ratio_views_to_purchase = num_views / num_purchase if num_purchase > 0 else 0
    print(f"Number of view actions: {num_views}")
    print(f"Ratio of Views to Purchases: {ratio_views_to_purchase}")

    # Average Price of Items Purchased
    print("Calculating Average Price of Items Purchased...")
    avg_price_purchased = purchases_df['price'].mean()
    print(f"Average Price of Items Purchased: {avg_price_purchased}")

    print("\n" + "="*50 + "\n")

    # Compare with features_df
    print("From features_df:\n")
    print(features_df.loc[user_id])

    print("\n" + "="*50 + "\n")


