# Titanic - Final Report and Conclusions
## Objective
In this section, we will summarize the key findings, evaluate feature importance,
and discuss potential improvements for the Titanic survival prediction model.

## Key Findings
1. **Data Exploration:**
   - Women had a higher survival rate compared to men.
   - Higher-class passengers had better chances of survival.
   - Many missing values were found in the 'Cabin' column.

2. **Feature Engineering:**
   - Family size and social status (title) proved to be important features.
   - Scaling and encoding categorical variables improved model performance.

3. **Model Performance:**
   - Random Forest outperformed Logistic Regression in terms of accuracy.
   - ROC analysis revealed an acceptable trade-off between sensitivity and specificity.


## Feature Importance Analysis
We will now analyze which features contributed the most to the Random Forest model's decisions.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

train_df = pd.read_csv('train_cleaned.csv')
features = ['Pclass', 'Sex', 'Age', 'Fare', 'FamilySize', 'IsAlone', 'Embarked_Q', 'Embarked_S', 'Title_Miss', 'Title_Mr', 'Title_Mrs']
X = train_df[features]
y = train_df['Survived']

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

feature_importances = pd.DataFrame({'Feature': features, 'Importance': rf.feature_importances_})
feature_importances.sort_values(by='Importance', ascending=False, inplace=True)
plt.figure(figsize=(10,6))
plt.barh(feature_importances['Feature'], feature_importances['Importance'])
plt.title('Feature Importance')
plt.xlabel('Importance')
plt.ylabel('Feature')
plt.show()

## Potential Improvements
Based on the findings, the following improvements can be considered:
- Fine-tuning hyperparameters of models using GridSearchCV.
- Applying advanced feature selection techniques to reduce dimensionality.
- Using ensemble models such as Gradient Boosting and XGBoost.
- Collecting additional data to enrich the dataset.


## Final Thoughts
- The Titanic dataset is a great case for learning data preprocessing and modeling.
- Despite limitations such as missing data, useful insights were derived.
- Further improvements in feature engineering and model selection can enhance predictions.