# Feature Engineering Assignment (Titanic Dataset)
Name: ____________________
Course: Data Science
---
This notebook performs complete Feature Engineering step-by-step.

## Step 1: Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.feature_selection import SelectKBest, chi2
from sklearn.decomposition import PCA
from scipy import stats

df = pd.read_csv('titanic.csv')
df.head()

## Step 2: Handle Missing Values

In [None]:
df.isnull().sum()

In [None]:
df['Age'].fillna(df['Age'].median(), inplace=True)
df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)
df.drop(columns=['Cabin'], inplace=True)
df.isnull().sum()

## Step 3: Handle Categorical Values

In [None]:
le = LabelEncoder()
df['Sex'] = le.fit_transform(df['Sex'])
df['Embarked'] = le.fit_transform(df['Embarked'])
df.head()

## Step 4: Remove Outliers (Z-Score Method)

In [None]:
z = np.abs(stats.zscore(df['Fare']))
df = df[(z < 3)]
df.shape

## Step 5: Feature Scaling

In [None]:
scaler = StandardScaler()
num_cols = ['Age','Fare','SibSp','Parch']
df[num_cols] = scaler.fit_transform(df[num_cols])
df.head()

## Step 6: Feature Selection (SelectKBest)

In [None]:
X = df.drop(columns=['Survived','Name','Ticket','PassengerId'])
y = df['Survived']

selector = SelectKBest(score_func=chi2, k=5)
X_new = selector.fit_transform(abs(X), y)

selected_features = X.columns[selector.get_support()]
selected_features

## Step 7: PCA (Dimensionality Reduction)

In [None]:
pca = PCA(n_components=2)
principal_components = pca.fit_transform(X)
pca.explained_variance_ratio_

## Conclusion
We performed all major feature engineering steps successfully.