👋🛳️ Ahoy, welcome to Kaggle! You’re in the right place.
This is the legendary Titanic ML competition – the best, first challenge for you to dive into ML competitions and familiarize yourself with how the Kaggle platform works.

If you want to talk with other users about this competition, come join our Discord! We've got channels for competitions, job postings and career discussions, resources, and socializing with your fellow data scientists. Follow the link here: https://discord.gg/kaggle

The competition is simple: use machine learning to create a model that predicts which passengers survived the Titanic shipwreck.

Read on or watch the video below to explore more details. Once you’re ready to start competing, click on the "Join Competition button to create an account and gain access to the competition data. Then check out Alexis Cook’s Titanic Tutorial that walks you through step by step how to make your first submission!

In [3]:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier

# Leer los datos
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

test.head()




Unnamed: 0,PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
1,893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0,,S
2,894,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
4,896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S


In [None]:
# Seleccionar features simples para empezar
features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare']
X_train = train[features]
y_train = train['Survived']

print(X_train.head(10))
print(y_train.head(10))

In [None]:
# Procesamiento X_train
X_train = X_train.copy()  # 🔹 Esto elimina ambigüedad (recomendado)
X_train.loc[:, 'Sex'] = X_train['Sex'].map({'male': 0, 'female': 1})
X_train.loc[:, 'Age'] = X_train['Age'].fillna(X_train['Age'].median())
X_train.loc[:, 'Fare'] = X_train['Fare'].fillna(X_train['Fare'].median())

# Procesamiento X_test

In [None]:
X_test = X_test.copy()
X_test.loc[:, 'Sex'] = X_test['Sex'].map({'male': 0, 'female': 1})
X_test.loc[:, 'Age'] = X_test['Age'].fillna(X_test['Age'].median())
X_test.loc[:, 'Fare'] = X_test['Fare'].fillna(X_test['Fare'].median())

In [None]:
print(X_test['Sex'].unique())

In [12]:
X_test.loc[:, 'Sex'] = X_test['Sex'].map({'male': 0, 'female': 1})
X_test['Sex'] = X_test['Sex'].fillna(0).astype(int)  # Asumimos 'male' como valor por defecto


In [13]:
# Entrenar modelo sencillo
model = DecisionTreeClassifier()
model.fit(X_train, y_train)



0,1,2
,criterion,'gini'
,splitter,'best'
,max_depth,
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_features,
,random_state,
,max_leaf_nodes,
,min_impurity_decrease,0.0


In [14]:
# Predecir para test
predictions = model.predict(X_test)



In [15]:
# Crear archivo submission.csv
submission = pd.DataFrame({
    'PassengerId': test['PassengerId'],
    'Survived': predictions
})
submission.head()


Unnamed: 0,PassengerId,Survived
0,892,0
1,893,0
2,894,1
3,895,1
4,896,1


In [17]:
submission.to_csv('submission1.csv', index=False)



In [None]:
## Primer sumit, bastante bajo el score
Complete · now · first summision. 0.53110

In [None]:
## Predicción basado en reglas (en contexto con el problema).
# Generar predicciones simples basadas en reglas: Sex + Pclass
# - Si es mujer => sobrevive
# - Si es hombre de 1ra clase => sobrevive
# - Resto => muere

predictions = []

for sex, pclass in zip(X_test['Sex'], X_test['Pclass']):
    if sex == 1:  # 1 = female (después del mapeo anterior)
        predictions.append(1)
    elif pclass == 1:
        predictions.append(1)
    else:
        predictions.append(0)

# Crear archivo submission.csv
submission = pd.DataFrame({
    'PassengerId': test['PassengerId'],
    'Survived': predictions
})
submission.to_csv('submission2.csv', index=False)

In [None]:
💡 Idea para mejorar las reglas manuales:
👉 Reglas simples pero más inteligentes:

1️⃣ Si es mujer ⇒ predigo "sobrevive" (1)
2️⃣ Si es hombre y Pclass=1 y Age<40 ⇒ predigo "sobrevive" (1)
3️⃣ Todo lo demás ⇒ predigo "muere" (0)

🎯 Esta regla intenta capturar que:

Mujeres sobrevivieron mucho.

Hombres jóvenes de 1ra clase tenían más chances.

El resto tenía bajas probabilidades.

In [22]:
predictions = []

for sex, pclass, age in zip(X_test['Sex'], X_test['Pclass'], X_test['Age']):
    if sex == 1:  # mujer
        predictions.append(1)
    elif pclass == 1 and age < 40:
        predictions.append(1)
    else:
        predictions.append(0)

submission = pd.DataFrame({
    'PassengerId': test['PassengerId'],
    'Survived': predictions
})
submission.to_csv('submission3.csv', index=False) 

In [None]:
# Asegurarse que 'Fare' esté limpio
X_test['Fare'] = X_test['Fare'].fillna(X_test['Fare'].median())

# Generar predicciones basadas en reglas:
# - Mujer ⇒ sobrevive
# - Hombre con Fare > 50 ⇒ sobrevive
# - Resto ⇒ muere

predictions = []

for sex, fare in zip(X_test['Sex'], X_test['Fare']):
    if sex == 1:  # mujer
        predictions.append(1)
    elif fare > 50:
        predictions.append(1)
    else:
        predictions.append(0)

# Crear archivo submission.csv
submission = pd.DataFrame({
    'PassengerId': test['PassengerId'],
    'Survived': predictions
})
submission.to_csv('submission4.csv', index=False)

In [24]:
## Otra regla manual.
for sex, fare in zip(X_test['Sex'], X_test['Fare']):
    if sex == 1:
        predict = 1
    elif pclass == 1:
        predict = 1
    else:
        predict = 0

# Crear archivo submission.csv
submission = pd.DataFrame({
    'PassengerId': test['PassengerId'],
    'Survived': predictions
})
submission.to_csv('submission5.csv', index=False)


In [None]:
## Agregando Feature Engineer manual 
✅ 1️⃣ Incorporar Embarked
Se sabe que la variable Embarked también tiene correlación (los embarcados en C sobreviven más que los de S o Q).

✅ 2️⃣ Definir FamilySize
Podés crear una nueva feature:
train['FamilySize'] = train['SibSp'] + train['Parch'] + 1
test['FamilySize'] = test['SibSp'] + test['Parch'] + 1

✅ 3️⃣ Cambiar de regla a modelo:

from sklearn.ensemble import RandomForestClassifier

In [25]:
from sklearn.ensemble import RandomForestClassifier

# Features a usar
features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare']

# Aseguramos limpieza 
X_train = train.copy()
X_test = test.copy()

# Convertir Sex a numérico
X_train['Sex'] = X_train['Sex'].map({'male': 0, 'female': 1})
X_test['Sex'] = X_test['Sex'].map({'male': 0, 'female': 1})

# Completar valores faltantes
X_train['Age'] = X_train['Age'].fillna(X_train['Age'].median())
X_test['Age'] = X_test['Age'].fillna(X_test['Age'].median())

X_train['Fare'] = X_train['Fare'].fillna(X_train['Fare'].median())
X_test['Fare'] = X_test['Fare'].fillna(X_test['Fare'].median())

# Entrenar RandomForest
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train[features], X_train['Survived'])

# Predecir para el test set
predictions = model.predict(X_test[features])

# Crear archivo submission.csv
submission = pd.DataFrame({
    'PassengerId': X_test['PassengerId'],
    'Survived': predictions
})
submission.to_csv('submission6.csv', index=False)
