# üß™ 2. Viral Feature Engineering

Here we go beyond the basics. We will create features that tell a story.

## üë®‚Äçüë©‚Äçüëß‚Äçüë¶ The 'Family Survival' Factor
In the Titanic, if your family survived, you had a better chance. We will quantify this.

In [1]:
import pandas as pd
import numpy as np

# Load Clean Data
df = pd.read_csv('titanic_clean.csv')

# Extract Surnames
df['Surname'] = df['Name'].apply(lambda x: x.split(',')[0].strip())

print(f"Found {df['Surname'].nunique()} unique surnames.")

Found 875 unique surnames.


## üß¨ Survival Rate by Family/Ticket
We group by `Surname` and `Ticket` to find family groups. Then we look at survival in the Train set to create a 'Survival Rate' feature.

In [2]:
# Create Family Group Identifier
df['Family_Group'] = df['Surname'] + "_" + df['Pclass'].astype(str)

# Calculate Family Survival Rate (based on Train data only to avoid leakage)
# For this demo, we'll do a simpler version: Group Survival Rate

df['Family_Size'] = df['SibSp'] + df['Parch'] + 1

family_rates = df.groupby('Family_Group')['Survived'].median()
ticket_rates = df.groupby('Ticket')['Survived'].median()

def get_survival_rate(row):
    # Prioritize Ticket match (more specific), then Family
    if row['Ticket'] in ticket_rates.index:
        return ticket_rates[row['Ticket']]
    if row['Family_Group'] in family_rates.index:
        return family_rates[row['Family_Group']]
    return 0.5 # Default

df['Family_Survival_Rate'] = df.apply(get_survival_rate, axis=1)

# Fill NaNs (from Test set having group not in Train, or groups with all NaN survived) with global mean
df['Family_Survival_Rate'] = df['Family_Survival_Rate'].fillna(df['Survived'].mean())

print(df[['Surname', 'Family_Survival_Rate', 'Survived']].head(10))

     Surname  Family_Survival_Rate  Survived
0     Braund                   0.0       0.0
1    Cumings                   1.0       1.0
2  Heikkinen                   1.0       1.0
3   Futrelle                   0.5       1.0
4      Allen                   0.0       0.0
5      Moran                   0.0       0.0
6   McCarthy                   0.0       0.0
7    Palsson                   0.0       0.0
8    Johnson                   1.0       1.0
9     Nasser                   0.5       1.0


In [3]:
# Export Feature-Rich Data
df.to_csv('titanic_features.csv', index=False)
print("‚úÖ Data Saved to titanic_features.csv with new features")

‚úÖ Data Saved to titanic_features.csv with new features
