<a href="https://colab.research.google.com/github/AINERD007/AINERD007/blob/main/Titanic_Survival_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Python code for a simple Titanic Survival Prediction project using the popular Titanic dataset. We'll use the pandas library for data manipulation, scikit-learn for building the prediction model, and matplotlib for data visualization.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

# Load the Titanic dataset
titanic_data = pd.read_csv('titanic.csv')

# Data preprocessing
# Drop irrelevant columns and handle missing values
titanic_data.drop(['PassengerId', 'Name', 'Ticket', 'Cabin'], axis=1, inplace=True)
titanic_data['Age'].fillna(titanic_data['Age'].median(), inplace=True)
titanic_data['Embarked'].fillna(titanic_data['Embarked'].mode()[0], inplace=True)

# Convert categorical variables to numerical using one-hot encoding
titanic_data = pd.get_dummies(titanic_data, columns=['Sex', 'Embarked'], drop_first=True)

# Define features and target variable
X = titanic_data.drop('Survived', axis=1)
y = titanic_data['Survived']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build a Random Forest Classifier model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

print("Accuracy:", accuracy)
print("Confusion Matrix:")
print(conf_matrix)

# Visualization: Feature Importance
feature_importance = pd.Series(model.feature_importances_, index=X.columns)
feature_importance.sort_values(ascending=False, inplace=True)
plt.figure(figsize=(8, 6))
plt.bar(feature_importance.index, feature_importance.values)
plt.xticks(rotation=90)
plt.xlabel('Features')
plt.ylabel('Feature Importance')
plt.title('Feature Importance for Titanic Survival Prediction')
plt.show()


In this code, we first load the Titanic dataset and perform data preprocessing. We handle missing values, drop irrelevant columns, and convert categorical variables to numerical using one-hot encoding.

Next, we split the data into training and testing sets and build a Random Forest Classifier model using scikit-learn's RandomForestClassifier. We then make predictions on the test set and evaluate the model's accuracy using accuracy_score and the confusion matrix.

Finally, we visualize the feature importance using matplotlib to see which features have the most impact on the survival prediction.