My Solution to the Titanic Challenge will be presented as a Tutorial. Care will be taken to provide extensive explaination and guidance using both markdown and commenting. Several approaches to the problem will be presented in addition to pros and cons of each, in addition to discussions on subjectivity and bias when dealing with incomplete data.
- This solution is still in progress
- Preprocessing, EDA, and Visualization are complete
- Modeling steps in progress
I will be breaking this challenge up between two different notebooks:
- This first notebook will focus on exploration and visualization of the data, including preprocessing the data to remove missing values, and feature engineer
- The second notebook will focus on the modeling required to predict survival
- Exploration of Missing Values (NaN Observations)
- Consider Either Removal or Imputation
- Column or Row Deletion
- Linear/Mean Imputation
- Data Import
- Exploration of the Variables
- Datatype
- Range
- Min/Max/Mean/Std
- Unique Values
- Visualization of Data Distributions
Following both the EDA and Preprocessing steps, visuals and preprocessed dataframes (.csv) will be exported to their respective folders within the working directory.
- Data can be obtained from Kaggle (data is also included in this repository)
- Python Libraries:
- numpy
- pandas
- matplotlib
- scikit-learn
All visualizations can be found in the images folder.


