# **📌 2. Tree-Based Models**
✅ **Used For:** Capturing non-linear relationships, feature importance.

| Model | Use Case | Key Concept |
|-------|----------|------------|
| **Decision Trees** | Regression & Classification | Recursive splitting on feature values |
| **Random Forest** | Regression & Classification | Bagging ensemble of decision trees |
| **Gradient Boosting (GBDT)** | Regression & Classification | Iteratively improves weak learners |
| **XGBoost / LightGBM / CatBoost** | Regression & Classification | Optimized gradient boosting libraries |

**📌 2. Implementation from Scratch**

**Preprocesing Titanic Dataset**


In [None]:
import pandas as pd
url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
df = pd.read_csv(url)

df["Age"] = df.groupby("Pclass")["Age"].transform(lambda x: x.fillna(x.median()))
df["Embarked"].fillna(df["Embarked"].mode()[0])
df.drop(columns=["Cabin", "Ticket", "Name"], inplace=True)

# New variables
df["family_size"] = df["SibSp"] + df["Parch"]
df.drop(columns=["SibSp", "Parch"], inplace=True)

# Transformations
from sklearn.preprocessing import OneHotEncoder, StandardScaler

# Categorical features
cat_features = ["Sex", "Embarked"]
encoder = OneHotEncoder(drop="first", sparse_output=False)
encoded_cat = encoder.fit_transform(df[cat_features]) # Encoded array
encoded_df = pd.DataFrame(encoded_cat, columns=encoder.get_feature_names_out(cat_features))

df = df.drop(columns=cat_features).reset_index(drop=True)
df = pd.concat([df, encoded_df], axis=1)

# Numerical variables
scaler = StandardScaler()

df["Age"] = scaler.fit_transform(df[["Age"]])
df["Fare"] = scaler.fit_transform(df[["Fare"]])
df.head()

from sklearn.model_selection import train_test_split
y = df["Survived"]
X = df.drop(columns=["Survived"])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

**📌 3. Implementation Using Libraries**