#### **Bu notebook'un amacÄ±:**
1. Domain knowledge ile meaningful features yaratmak
2. Her feature'Ä±n "neden yarattÄ±k?" sorusunu cevaplamak
3. ModÃ¼ler, tekrar kullanÄ±labilir kod yazmak
4. 04_Model_Comparison'da feature selection iÃ§in hazÄ±rlÄ±k

#### **Feature Engineering Felsefesi:**
- "More is NOT better" â†’ Quality > Quantity
- Her feature bir hipotez test eder
- Feature selection 04'te yapacaÄŸÄ±z

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler

In [2]:
# Seed for reproducibility
np.random.seed(42)

In [3]:
df = pd.read_csv('../data/marketing_analytics_cleaned.csv')

print("ðŸ“Š Dataset Shape:", df.shape)
print("\nâœ… Target Variable Distribution:")
print(df['Conversion'].value_counts())
print(f"\nConversion Rate: {df['Conversion'].mean() * 100:.2f}%")


ðŸ“Š Dataset Shape: (48000, 20)

âœ… Target Variable Distribution:
Conversion
0    47393
1      607
Name: count, dtype: int64

Conversion Rate: 1.26%


#### **FEATURE ENGINEERING - ROI & COST METRICS**
**NEDEN BU FEATURE'LAR?**

Marketing'de en Ã¶nemli metrik: ROI (Return on Investment)
"Bu mÃ¼ÅŸteriye harcadÄ±ÄŸÄ±mÄ±z para ne kadar verimli?"

_HYPOTHESIS_:
- DÃ¼ÅŸÃ¼k CPA (Cost Per Acquisition) olan channel'lar daha iyi
- YÃ¼ksek ROI proxy'si olan mÃ¼ÅŸteriler convert olma eÄŸiliminde


In [4]:
print("\n" + "="*70)
print("FEATURE GROUP 1: ROI & COST METRICS")
print("="*70)

# Feature 1: Cost Per Acquisition (CPA) Proxy
# Formula: AdSpend / (Conversion + 1)
# Neden +1? Zero division protection (henÃ¼z convert olmamÄ±ÅŸ mÃ¼ÅŸteriler iÃ§in)
df['CPA_Proxy'] = df['AdSpend'] / (df['Conversion'] + 1)

print("\nâœ… CPA_Proxy created")
print(f"   Mean: ${df['CPA_Proxy'].mean():.2f}")
print(f"   Interpretation: Ortalama mÃ¼ÅŸteri baÅŸÄ±na maliyet proxy")


FEATURE GROUP 1: ROI & COST METRICS

âœ… CPA_Proxy created
   Mean: $2189.35
   Interpretation: Ortalama mÃ¼ÅŸteri baÅŸÄ±na maliyet proxy


In [5]:
# Feature 2: ROI Proxy
# Formula: (ConversionRate * Income) / AdSpend
# MantÄ±k: Potansiyel deÄŸer / Harcanan para
# YÃ¼ksek income + yÃ¼ksek conversion rate + dÃ¼ÅŸÃ¼k AdSpend = Ä°yi ROI
df['ROI_Proxy'] = (df['ConversionRate'] * df['Income']) / (df['AdSpend'] + 1)

print("\nâœ… ROI_Proxy created")
print(f"   Mean: {df['ROI_Proxy'].mean():.2f}")
print(f"   Interpretation: Potansiyel customer value / Marketing cost")


âœ… ROI_Proxy created
   Mean: 1.11
   Interpretation: Potansiyel customer value / Marketing cost


In [7]:
# Feature 3: Spend Efficiency
# Formula: ClickThroughRate / AdSpend
# MantÄ±k: Harcanan para baÅŸÄ±na tÄ±klama verimliliÄŸi
df['Spend_Efficiency'] = df['ClickThroughRate'] / (df['AdSpend'] + 1)

print("\nâœ… Spend_Efficiency created")
print(f"   Mean: {df['Spend_Efficiency'].mean():.6f}")
print(f"   Interpretation: AdSpend verimliliÄŸi (CTR per dollar)")


âœ… Spend_Efficiency created
   Mean: 0.000073
   Interpretation: AdSpend verimliliÄŸi (CTR per dollar)


#### **FEATURE ENGINEERING - ENGAGEMENT METRICS**

**NEDEN BU FEATURE'LAR?**

User engagement = Conversion'Ä±n en gÃ¼Ã§lÃ¼ gÃ¶stergelerinden biri
"MÃ¼ÅŸteri ne kadar engaged?"

**HYPOTHESIS:**
- Sitede daha fazla zaman geÃ§iren â†’ Daha fazla ilgili â†’ Conversion â†‘
- Daha fazla sayfa gezen â†’ Daha fazla araÅŸtÄ±ran â†’ Conversion â†‘
- Email aÃ§an ama tÄ±klamayan â†’ Ä°lgili ama kararsÄ±z

In [8]:
print("\n" + "="*70)
print("FEATURE GROUP 2: ENGAGEMENT METRICS")
print("="*70)


FEATURE GROUP 2: ENGAGEMENT METRICS


In [9]:

# Feature 4: Site Engagement Score
# Formula: TimeOnSite * PagesPerVisit
# MantÄ±k: Derinlemesine engagement (hem sÃ¼re hem depth)
df['Site_Engagement'] = df['TimeOnSite'] * df['PagesPerVisit'].fillna(1)

print("\nâœ… Site_Engagement created")
print(f"   Mean: {df['Site_Engagement'].mean():.2f}")
print(f"   Interpretation: Engagement depth (time Ã— pages)")


âœ… Site_Engagement created
   Mean: 13.77
   Interpretation: Engagement depth (time Ã— pages)


In [10]:
# Feature 5: Average Time Per Page
# Formula: TimeOnSite / PagesPerVisit
# MantÄ±k: Her sayfada ne kadar vakit geÃ§iriyor? (Bounce rate proxy)
df['Avg_Time_Per_Page'] = df['TimeOnSite'] / (df['PagesPerVisit'].fillna(1))

print("\nâœ… Avg_Time_Per_Page created")
print(f"   Mean: {df['Avg_Time_Per_Page'].mean():.2f}")
print(f"   Interpretation: Time per page (bounce rate inverse)")



âœ… Avg_Time_Per_Page created
   Mean: 1.13
   Interpretation: Time per page (bounce rate inverse)


In [11]:
# Feature 6: CTR to Conversion Ratio
# Formula: ConversionRate / ClickThroughRate
# MantÄ±k: TÄ±klama â†’ Conversion dÃ¶nÃ¼ÅŸÃ¼m verimliliÄŸi
df['CTR_to_Conversion'] = df['ConversionRate'] / (df['ClickThroughRate'] + 0.0001)

print("\nâœ… CTR_to_Conversion created")
print(f"   Mean: {df['CTR_to_Conversion'].mean():.2f}")
print(f"   Interpretation: Click â†’ Conversion efficiency")



âœ… CTR_to_Conversion created
   Mean: 16.71
   Interpretation: Click â†’ Conversion efficiency


In [12]:
# Feature 7: Email Engagement
# Formula: EmailClicks / (EmailOpens + 1)
# MantÄ±k: Email aÃ§anlarÄ±n ne kadarÄ± tÄ±klÄ±yor?
df['Email_Click_Rate'] = df['EmailClicks'] / (df['EmailOpens'] + 1)

print("\nâœ… Email_Click_Rate created")
print(f"   Mean: {df['Email_Click_Rate'].mean():.3f}")
print(f"   Interpretation: Email engagement quality")


âœ… Email_Click_Rate created
   Mean: 0.056
   Interpretation: Email engagement quality


In [13]:
# Feature 8: Social Virality
# Formula: SocialShares / (WebsiteVisits + 1)
# MantÄ±k: Ziyaret baÅŸÄ±na paylaÅŸÄ±m eÄŸilimi
df['Social_Virality'] = df['SocialShares'] / (df['WebsiteVisits'] + 1)

print("\nâœ… Social_Virality created")
print(f"   Mean: {df['Social_Virality'].mean():.3f}")
print(f"   Interpretation: Share propensity per visit")


âœ… Social_Virality created
   Mean: 0.857
   Interpretation: Share propensity per visit
