# Titanic - Feature Engineering Family Size

This time, we’ll keep things simple but slightly more meaningful than before.
Instead of just dropping Name and Ticket, we’ll create a new feature called FamilySize, which combines SibSp (number of siblings/spouses aboard) and Parch (number of parents/children aboard):

$$
\text{FamilySize} = \text{SibSp} + \text{Parch} + 1
$$

This gives us an idea of whether a passenger was traveling alone or with family members, which could be correlated with survival chances.
Other than that, we’ll leave the data mostly untouched — this serves as our family-size baseline, which we’ll later compare with more advanced feature engineering versions.

In [1]:
import os
import pandas as pd

# Load dataset
path_dir = os.path.join("..", "..", "data")
df = pd.read_csv(os.path.join(path_dir, "preprocessed", "preprocessed_train.csv"))
df_test = pd.read_csv(os.path.join(path_dir, "preprocessed", "preprocessed_test.csv"))

# Remove `Name` and `Ticket` features
df = df.drop(['Name', 'Ticket'], axis=1)
df_test = df_test.drop(['Name', 'Ticket'], axis=1)

df["FamilySize"] = df["SibSp"] + df["Parch"] + 1
df_test["FamilySize"] = df_test["SibSp"] + df_test["Parch"] + 1

df = df.drop(['Parch','SibSp'], axis=1)
df_test = df_test.drop(['Parch', 'SibSp'], axis=1)

# Save file
df.to_csv(os.path.join(path_dir, "feature_engineered", "familySize", "familySize_engineered_train.csv"), index=False)
df_test.to_csv(os.path.join(path_dir, "feature_engineered", "familySize", "familySize_engineered_test.csv"), index=False)

df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Sex,Age,Fare,Embarked,FamilySize
0,1,0,3,1,22.0,7.25,2,2
1,2,1,1,0,38.0,71.2833,0,2
2,3,1,3,0,26.0,7.925,2,1
3,4,1,1,0,35.0,53.1,2,2
4,5,0,3,1,35.0,8.05,2,1
