# Clayton Seabaugh: Building a Classifier
**Author:** Clayton Seabaugh  
**Date:** 3-28-2025  
**Objective:** Build and evaluate different models for machine learning classification

## Section 1. Import and Inspect the Data
Load the titanic dataset from the directly from the seaborn library.

In [1]:
# Imports

import seaborn as sns
import pandas as pd

In [2]:
# Load Titanic dataset
titanic = sns.load_dataset('titanic')

## Section 2. Data Exploration and Preparation
 
### 2.1 Handle Missing Values and Clean Data

In [10]:
# Impute missing values for age using the median
titanic['age'].fillna(titanic['age'].median(), inplace=True)

# Fill in missing values for embark_town using the mode
titanic['embark_town'].fillna(titanic['embark_town'].mode()[0], inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  titanic['age'].fillna(titanic['age'].median(), inplace=True)


### 2.2 Feature Engineering


In [11]:
# Create new features

# Add family_size - number of family members on board
titanic['family_size'] = titanic['sibsp'] + titanic['parch'] + 1
# Convert categorical "sex" to numeric
titanic['sex'] = titanic['sex'].map({'male': 0, 'female': 1})
# Convert categorical "embarked" to numeric
titanic['embarked'] = titanic['embarked'].map({'C': 0, 'Q': 1, 'S': 2})
# Binary feature - convert "alone" to numeric
titanic['alone'] = titanic['alone'].astype(int)

## Section 3. Feature Selection and Justification
### 3.1 Choose features and target

Select two or more input features (numerical for regression, numerical and/or categorical for classification)
<br> Use survived as the target. 
<br> We will do three input cases like the example.
<br> First:

- input features: alone
target: survived
Second:

- input features - age (or another variable of your choice)
target: survived
Third:

- input features -  age and family_size (or another combination of your choice)
target: survived

### 3.2 Define X (features) and y (target)
- Assign input features to X a pandas DataFrame with 1 or more input features
- Assign target variable to y (as applicable) - a pandas Series with a single target feature
- Again - use comments to run a single case at a time
- The follow starts with only the statements needed for case 1. 
- Double brackets [[ ]]]  makes a 2D DataFrame
- Single brackets [ ]  make a 1D Series
 

**# Case 1: alone only**
- X = titanic[['alone']]
- y = titanic['survived']

**# Case 2: age only**
- X = titanic[['age']]
- y = titanic['survived']

**# Case 3: age + family_size**
- X = titanic[['age', 'family_size']]
- y = titanic['survived']

## Reflection 3:

Why are these features selected?
<br>Are there features that are likely to be highly predictive of survival?