# Features Engineering:

### My Goal: Create new variables to support analysis.
Tasks:

-Calculate KPIs/ Metrics: CTR (Clicks/Impressions), CR (Conversions/Clicks), CPC, CPM, ROI

-Time-based features (e.g., campaign month, quarter, year)

-Encode categorical variables if needed

-Normalization or scaling (if ML planned later)

-Save engineered dataset > /data/processed/marketing_campaigns_features.csv

My Goal here is to create new, meaningful columns that will help with insights and modeling.

# Metrics:

#### -ctr: Click-through rate (Clicks ÷ Impressions)

#### -conversion_rate: Conversion rate (Conversions ÷ Clicks)

#### -roi: Return on investment ((Revenue – Cost) ÷ spendusd)

---
-clicks:	Number of clicks received

-impressions:	Number of times the campaign was shown

-conversions:	Number of successful conversions (e.g., purchases, signups)

-revenue:	Revenue generated from the campaign

-spendusd/cost:	Total cost of the campaign

-start_date:	Date when the campaign started

-end_date:	Date when the campaign ended

---


## Import Libraries

In [53]:
#Libraries

import pandas as pd
import os

## Ingest Data as DF

In [54]:
#INGEST DATA

df = pd.read_csv("../data/processed/marketing_campaign_all_clean.csv")


## Data Check

In [55]:
#DATA CHECK

print(df.shape)
print(df.info())

df.head()

(500, 16)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 16 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   campaign_id       500 non-null    object 
 1   campaign_name     500 non-null    object 
 2   start_date        500 non-null    object 
 3   end_date          500 non-null    object 
 4   channel           500 non-null    object 
 5   region            500 non-null    object 
 6   impressions       500 non-null    int64  
 7   clicks            500 non-null    int64  
 8   conversions       500 non-null    int64  
 9   spend_usd         500 non-null    float64
 10  revenue_usd       500 non-null    float64
 11  target_audience   500 non-null    object 
 12  product_category  500 non-null    object 
 13  device            500 non-null    object 
 14  year              500 non-null    int64  
 15  dataset_year      500 non-null    int64  
dtypes: float64(2), int64(5), object(9)

Unnamed: 0,campaign_id,campaign_name,start_date,end_date,channel,region,impressions,clicks,conversions,spend_usd,revenue_usd,target_audience,product_category,device,year,dataset_year
0,2025_0001,Campaign_2025_0001,2025-03-04,2025-07-20,Display,North America,7696,3587,15144,11046.09,117455.53,Seniors,Services,Desktop,2025,2025
1,2025_0002,Campaign_2025_0002,2025-04-05,2025-07-25,Print,Asia,79664,30373,75743,16419.25,92144.07,Youth,Home,Tablet,2025,2025
2,2025_0003,Campaign_2025_0003,2025-03-18,2025-12-06,Print,North America,33324,89728,79251,33333.21,24070.68,Youth,Clothing,Desktop,2025,2025
3,2025_0004,Campaign_2025_0004,2025-03-09,2025-11-02,Search,Asia,32528,33793,4948,14340.82,13570.48,Adults,Travel,Mobile,2025,2025
4,2025_0005,Campaign_2025_0005,2025-05-21,2025-09-28,Email,Africa,80785,35905,36563,37133.14,122995.42,Seniors,Services,Tablet,2025,2025


## Convert Data types - Dates, Float & Numeric

In [56]:
#FIX THE DATA TYPE:
for df in [df]:
    #df['campaign_id'] = pd.to_numeric(df['campaign_id'], errors='coerce')  #I'm keeping this as Obj unlike in SQL or anyother tool this still works for Python also is an identifier & not a numeric feature
    df['start_date'] = pd.to_datetime(df['start_date'], errors='coerce')
    df['end_date'] = pd.to_datetime(df['end_date'], errors='coerce')
    df['spend_usd'] = pd.to_numeric(df['spend_usd'], errors='coerce')
    df['revenue_usd'] = pd.to_numeric(df['revenue_usd'], errors='coerce')

## Compute derived metrics

In [57]:
#ADD CUSTOM COLUMNS: 
#ctr
#conversion_rate 
#roi 
#campaign_duration_days


df["ctr"] = df["clicks"] / df["impressions"]  # Click-through rate
df["conversion_rate"] = df["conversions"] / df["clicks"]
df["roi"] = (df["revenue_usd"] - df["spend_usd"]) / df["spend_usd"]
df["campaign_duration_days"] = (df["end_date"] - df["start_date"]).dt.days

#updated df

print(df.shape)
print(df.columns)
print(df.info())

df.head()

(500, 20)
Index(['campaign_id', 'campaign_name', 'start_date', 'end_date', 'channel',
       'region', 'impressions', 'clicks', 'conversions', 'spend_usd',
       'revenue_usd', 'target_audience', 'product_category', 'device', 'year',
       'dataset_year', 'ctr', 'conversion_rate', 'roi',
       'campaign_duration_days'],
      dtype='object')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 20 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   campaign_id             500 non-null    object        
 1   campaign_name           500 non-null    object        
 2   start_date              500 non-null    datetime64[ns]
 3   end_date                500 non-null    datetime64[ns]
 4   channel                 500 non-null    object        
 5   region                  500 non-null    object        
 6   impressions             500 non-null    int64         
 7   cli

Unnamed: 0,campaign_id,campaign_name,start_date,end_date,channel,region,impressions,clicks,conversions,spend_usd,revenue_usd,target_audience,product_category,device,year,dataset_year,ctr,conversion_rate,roi,campaign_duration_days
0,2025_0001,Campaign_2025_0001,2025-03-04,2025-07-20,Display,North America,7696,3587,15144,11046.09,117455.53,Seniors,Services,Desktop,2025,2025,0.466086,4.221912,9.633222,138
1,2025_0002,Campaign_2025_0002,2025-04-05,2025-07-25,Print,Asia,79664,30373,75743,16419.25,92144.07,Youth,Home,Tablet,2025,2025,0.381264,2.493761,4.611954,111
2,2025_0003,Campaign_2025_0003,2025-03-18,2025-12-06,Print,North America,33324,89728,79251,33333.21,24070.68,Youth,Clothing,Desktop,2025,2025,2.692594,0.883236,-0.277877,263
3,2025_0004,Campaign_2025_0004,2025-03-09,2025-11-02,Search,Asia,32528,33793,4948,14340.82,13570.48,Adults,Travel,Mobile,2025,2025,1.03889,0.146421,-0.053717,238
4,2025_0005,Campaign_2025_0005,2025-05-21,2025-09-28,Email,Africa,80785,35905,36563,37133.14,122995.42,Seniors,Services,Tablet,2025,2025,0.444451,1.018326,2.312282,130


## Save point

In [58]:
#Save processed dataset
#File: marketing_campaign_2024_2025_processed


processed_path = "../data/processed/marketing_campaign_2024_2025_processed.csv"
df.to_csv(processed_path, index=False)

print(f"Processed dataset saved to: {processed_path}")


Processed dataset saved to: ../data/processed/marketing_campaign_2024_2025_processed.csv
