# Features Engineering:

### My Goal: Create new variables to support analysis.
Tasks:

-Calculate KPIs/ Metrics: CTR (Clicks/Impressions), CR (Conversions/Clicks), CPC, CPM, ROI

-Time-based features (e.g., campaign month, quarter, year)

-Encode categorical variables if needed

-Normalization or scaling (if ML planned later)

-Save engineered dataset > /data/processed/marketing_campaigns_features.csv

My Goal here is to create new, meaningful columns that will help with insights and modeling.

# Metrics:

#### -ctr: Click-through rate (Clicks ÷ Impressions)

#### -conversion_rate: Conversion rate (Conversions ÷ Clicks)

#### -roi: Return on investment ((Revenue – Cost) ÷ spendusd)

---
-clicks:	Number of clicks received

-impressions:	Number of times the campaign was shown

-conversions:	Number of successful conversions (e.g., purchases, signups)

-revenue:	Revenue generated from the campaign

-spendusd/cost:	Total cost of the campaign

-start_date:	Date when the campaign started

-end_date:	Date when the campaign ended

---


## Import Libraries

In [38]:
#Libraries

import pandas as pd
import os

## Ingest Data as DF

In [39]:
#INGEST DATA

df = pd.read_csv("../data/processed/marketing_campaign_all_clean.csv")


## Data Check

In [40]:
#DATA CHECK

print(df.shape)
print(df.info())

df.head()

(1000, 16)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 16 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   campaign_id       1000 non-null   object 
 1   campaign_name     1000 non-null   object 
 2   start_date        1000 non-null   object 
 3   end_date          1000 non-null   object 
 4   channel           1000 non-null   object 
 5   region            1000 non-null   object 
 6   impressions       1000 non-null   int64  
 7   clicks            1000 non-null   int64  
 8   conversions       1000 non-null   int64  
 9   spend_usd         1000 non-null   float64
 10  revenue_usd       1000 non-null   float64
 11  target_audience   1000 non-null   object 
 12  product_category  1000 non-null   object 
 13  device            1000 non-null   object 
 14  year              1000 non-null   int64  
 15  dataset_year      1000 non-null   int64  
dtypes: float64(2), int64(5), object(

Unnamed: 0,campaign_id,campaign_name,start_date,end_date,channel,region,impressions,clicks,conversions,spend_usd,revenue_usd,target_audience,product_category,device,year,dataset_year
0,2024_0001,Campaign_2024_0001,2024-05-16,2024-08-16,Search,South America,28252,5609,65466,39193.43,79017.74,Youth,Electronics,Desktop,2024,2024
1,2024_0002,Campaign_2024_0002,2024-04-06,2024-10-13,Search,Asia,89608,83584,26865,17291.53,49868.54,Adults,Home,Mobile,2024,2024
2,2024_0003,Campaign_2024_0003,2024-05-08,2024-11-27,Social,Europe,37853,62661,43662,6729.63,63021.28,Seniors,Electronics,Desktop,2024,2024
3,2024_0004,Campaign_2024_0004,2024-01-28,2024-08-03,Display,Africa,10577,41421,75023,15077.58,133106.71,Seniors,Clothing,Desktop,2024,2024
4,2024_0005,Campaign_2024_0005,2024-02-06,2024-08-23,Social,Asia,84039,56010,11283,16877.69,144736.99,Adults,Home,Mobile,2024,2024


## Convert Data types - Dates, Float & Numeric

In [43]:
#FIX THE DATA TYPE:
for df in [df]:
    #df['campaign_id'] = pd.to_numeric(df['campaign_id'], errors='coerce')  #I'm keeping this as Obj unlike in SQL or anyother tool this still works for Python also is an identifier & not a numeric feature
    df['start_date'] = pd.to_datetime(df['start_date'], errors='coerce')
    df['end_date'] = pd.to_datetime(df['end_date'], errors='coerce')
    df['spend_usd'] = pd.to_numeric(df['spend_usd'], errors='coerce')
    df['revenue_usd'] = pd.to_numeric(df['revenue_usd'], errors='coerce')

## Compute derived metrics

In [50]:
#ADD CUSTOM COLUMNS: 
#ctr
#conversion_rate 
#roi 
#campaign_duration_days


df["ctr"] = df["clicks"] / df["impressions"]  # Click-through rate
df["conversion_rate"] = df["conversions"] / df["clicks"]
df["roi"] = (df["revenue_usd"] - df["spend_usd"]) / df["spend_usd"]
df["campaign_duration_days"] = (df["end_date"] - df["start_date"]).dt.days

#updated df

print(df.shape)
print(df.columns)
print(df.info())

df.head()

(1000, 20)
Index(['campaign_id', 'campaign_name', 'start_date', 'end_date', 'channel',
       'region', 'impressions', 'clicks', 'conversions', 'spend_usd',
       'revenue_usd', 'target_audience', 'product_category', 'device', 'year',
       'dataset_year', 'ctr', 'conversion_rate', 'roi',
       'campaign_duration_days'],
      dtype='object')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 20 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   campaign_id             1000 non-null   object        
 1   campaign_name           1000 non-null   object        
 2   start_date              1000 non-null   datetime64[ns]
 3   end_date                1000 non-null   datetime64[ns]
 4   channel                 1000 non-null   object        
 5   region                  1000 non-null   object        
 6   impressions             1000 non-null   int64         
 7   c

Unnamed: 0,campaign_id,campaign_name,start_date,end_date,channel,region,impressions,clicks,conversions,spend_usd,revenue_usd,target_audience,product_category,device,year,dataset_year,ctr,conversion_rate,roi,campaign_duration_days
0,2024_0001,Campaign_2024_0001,2024-05-16,2024-08-16,Search,South America,28252,5609,65466,39193.43,79017.74,Youth,Electronics,Desktop,2024,2024,0.198535,11.671599,1.016097,92
1,2024_0002,Campaign_2024_0002,2024-04-06,2024-10-13,Search,Asia,89608,83584,26865,17291.53,49868.54,Adults,Home,Mobile,2024,2024,0.932774,0.321413,1.883987,190
2,2024_0003,Campaign_2024_0003,2024-05-08,2024-11-27,Social,Europe,37853,62661,43662,6729.63,63021.28,Seniors,Electronics,Desktop,2024,2024,1.655377,0.696797,8.364747,203
3,2024_0004,Campaign_2024_0004,2024-01-28,2024-08-03,Display,Africa,10577,41421,75023,15077.58,133106.71,Seniors,Clothing,Desktop,2024,2024,3.916139,1.811231,7.828122,188
4,2024_0005,Campaign_2024_0005,2024-02-06,2024-08-23,Social,Asia,84039,56010,11283,16877.69,144736.99,Adults,Home,Mobile,2024,2024,0.666476,0.201446,7.57564,199
