In [28]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report

titanic_data = pd.read_csv('titanic_copy.csv')  
titanic_data.columns = titanic_data.columns.str.strip().str.lower().str.replace(' ','_')

X = titanic_data[['pclass', 'sex', 'age', 'sibsp', 'parch']]
y = titanic_data['survived']

X = pd.get_dummies(X, columns=['sex'], drop_first=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = DecisionTreeClassifier(random_state=42)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
classification_rep = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print("Classification Report:")
print(classification_rep)

titanic_test_data

Accuracy: 0.7988826815642458
Classification Report:
              precision    recall  f1-score   support

           0       0.79      0.90      0.84       105
           1       0.83      0.65      0.73        74

    accuracy                           0.80       179
   macro avg       0.81      0.78      0.78       179
weighted avg       0.80      0.80      0.79       179



Unnamed: 0,passengerid,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,survived
0,892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q,0
1,893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0000,,S,1
2,894,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q,0
3,895,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S,0
4,896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S,1
...,...,...,...,...,...,...,...,...,...,...,...,...
413,1305,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.0500,,S,0
414,1306,1,"Oliva y Ocana, Dona. Fermina",female,39.0,0,0,PC 17758,108.9000,C105,C,1
415,1307,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.2500,,S,0
416,1308,3,"Ware, Mr. Frederick",male,,0,0,359309,8.0500,,S,0


In this analysis, a Decision Tree classifier was used to predict survival outcomes based on features from the Titanic Dataset, which already included a 'Survived' column. A Decision Tree model was chosen due to its simplicity, interpretability, and capability to handle both numerical and categorical features. The model was trained on a subset of the dataset, with features such as passenger class, gender, age, and family-related variables. The 'Sex' feature was encoded to numerical values, and the dataset was split into training and testing sets for model evaluation. During training, the Decision Tree algorithm recursively split the data based on features, creating a tree-like structure where each branch represented a decision based on specific criteria. The resulting model was then able to predict survival outcomes on unseen data. The evaluation metrics, including accuracy, precision, recall, and F1-score, provided insights into the model's performance. 