# **INTRODUCTION**
![Bvglvr6.gifBvglvr6.gif](https://i.imgur.com/Bvglvr6.gif)

 ## Welcome to EDA! We'll explore the Titanic dataset, uncover insights, and visualize patterns to understand what factors influenced passenger survival. Let's embark on this exciting journey of data discovery and build a strong foundation for our predictive model!


# **DESCRIPTION** :
## ABOUT THE COMPETITION :-
### The Titanic Kaggle Challenge offers an engaging opportunity to explore the infamous historical event through data analysis. In this Exploratory Data Analysis (EDA) phase, we delve into a comprehensive dataset containing information about the passengers aboard the RMS Titanic, one of the most tragic maritime disasters in history. 

## GOAL :-
### Our goal is to gain valuable insights by investigating the factors that influenced passenger survival. Through data cleaning, visualization, and descriptive statistics, we aim to unravel the untold stories hidden within the dataset.  Let's embark on this data-driven adventure and decipher the secrets behind the fate of those onboard the Titanic.

![k1fPqu7.gif](https://i.imgur.com/k1fPqu7.gif)

## **IMPORTING SOME LIBRARIES** :-

 ### Gather 'round, for the coding superhero team is assembling , Jk ! NumPy, Pandas and scikit-learn are here to crunch numbers, wrangle data, create stunning plots, and wield the art of machine learning. Together, we'll conquer the Titanic challenge with humor and coding magic! 🚀💻😄
 
### Pandas :- https://pandas.pydata.org/docs/
### Numpy :-  https://numpy.org/doc/
### Sklear.model_selection :-  https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
### Sklearn.linear_model :-    https://scikit-learn.org/stable/modules/linear_model.html
###  Sklear.metrics :-  https://scikit-learn.org/stable/modules/model_evaluation.html
###  Xgboost :-  https://xgboost.readthedocs.io/en/stable/
###  Sklearn.preprocessing :-  https://scikit-learn.org/stable/modules/preprocessing.html
### Missingno :- https://pypi.org/project/missingno/
### Matplotlib :- https://matplotlib.org/stable/index.html
###  Warnings :- https://docs.python.org/3/library/warnings.html

In [None]:
import pandas as pd  
import numpy as np
from sklearn.model_selection import train_test_split   
from sklearn.linear_model import LogisticRegression    
from sklearn.metrics import accuracy_score  
from xgboost import XGBClassifier
from sklearn.preprocessing import LabelEncoder
import missingno as msno
import matplotlib.pyplot as plt
import warnings

warnings.filterwarnings('ignore')

## **Data Collection & Processing** :-

###  It's time to set sail on our Titanic data collection and processing expedition. We'll navigate through various data sources, cast our nets wide, and haul in a treasure trove of information about the passengers aboard that legendary ship.  🌊🗺️💻

### **why data collection & processing** ? 
#### Data collection acquisition involves collecting or adding to the data holdings and Data processing  is  series of actions or steps performed on data to verify, organize, transform, integrate, and extract data in an appropriate output form for subsequent use. Methods of processing must be rigorously documented to ensure the utility and integrity of the data.

In [None]:
data = pd.read_csv("/kaggle/input/titanic/train.csv")
test = pd.read_csv("/kaggle/input/titanic/test.csv")


In [None]:
data.head()

In [None]:
data.shape

In [None]:
data.info()

# **Finding Missing values**
### As we set sail through the vast ocean of data, we'll keep a keen eye out for those sneaky NaNs and missing pieces of information. Let's embark on this Null value hunt and conquer the seas of data processing! 🏴‍☠️🔍💻

In [None]:
msno.matrix(data,figsize=(15,9), fontsize=12);
msno.matrix(test,figsize=(15,9), fontsize=12);

In [None]:
msno.heatmap(data, labels = True, figsize = (10,10), cmap='RdBu')
msno.heatmap(test, labels = True, figsize = (10,10), cmap='RdBu')

In [None]:
missing_columns = [col for col in data.columns if data[col].isnull().sum() > 0]
missing_columns

In [None]:
msno.dendrogram(data[missing_columns])
msno.dendrogram(test[missing_columns])

In [None]:
def get_numerical_summary(df):
    total = data.shape[0]
    missing_columns = [col for col in data.columns if data[col].isnull().sum() > 0]
    missing_percent = {}
    for col in missing_columns:
        null_count = data[col].isnull().sum()
        per = (null_count/total) * 100
        missing_percent[col] = per
        print("{} : {} ({}%)".format(col, null_count, round(per, 5)))
    return missing_percent
missing_percent = get_numerical_summary(data)


In [None]:
data.describe()


In [None]:
obj_dtypes= [i for i in data.select_dtypes(include=np.object).columns]    #Categorical data
obj_dtypes

In [None]:
num_dtypes = [ i for i in data.select_dtypes(include=np.number).columns]            #Numerical data
num_dtypes

# **Handling the Missing Values**
## When we encounter these elusive NaNs, we'll evaluate the best approach to handle them. Should we bid them adieu and drop them from the dataset? Or perhaps, with the art of imputation, we'll fill these missing pieces cleverly, ensuring the integrity of our precious data.🏴‍☠️🌊💻

In [None]:
features = ['Pclass','Sex','Age','SibSp','Parch','Fare','Embarked']
x = data[features]
y = data['Survived']

x['Age'] = x['Age'].fillna(x['Age'].median())
x['Embarked'] = x['Embarked'].fillna(x['Embarked'].value_counts().index[0])

In [None]:
LE = LabelEncoder()
x['Sex'] = LE.fit_transform(x['Sex'])
x['Embarked'] = LE.fit_transform(x['Embarked'])

In [None]:
x_train , x_test , y_train , y_test = train_test_split(x,y, test_size = 0.1 , random_state = 0)

### The time has come to unleash the mighty XGBoost, the swashbuckling hero of machine learning algorithms, for our prediction quest! With its exceptional power and speed, XGBoost will lead us on an exhilarating journey through the Titanic Kaggle challenge.

In [None]:
classifier = XGBClassifier(colsample_bylevel = 0.9,
                           colsample_bytree = 0.8 ,
                           gamma= 0.99,
                           max_depth = 5,
                           n_estimators = 10,
                           nthread = 4 ,
                           random_state = 2,
                           silent = True)
classifier.fit(x_train, y_train)
classifier.score(x_test , y_test)
                    

In [None]:
test_x = test[features]

In [None]:
test_x['Age'] = test_x['Age'].fillna(test_x['Age'].median())
test_x['Fare'] = test_x['Fare'].fillna(test_x['Fare'].median())


In [None]:


test_x['Sex']= LE.fit_transform(test_x['Sex'])

test_x['Embarked'] = LE.fit_transform(test_x['Embarked'])

##### As we sail through the training process, XGBoost will bravely face the toughest obstacles, using its ensemble of decision trees to seek out hidden patterns and relationships. With every boosting round, it'll sharpen its predictive sword, growing stronger and more accurate.And when the time comes, we'll put XGBoost to the ultimate test, making predictions on the unseen test data with the anticipation of victory in our hearts.With XGBoost by our side, we're prepared to face the challenges of the Titanic Kaggle competition head-on. So let's hoist the XGBoost flag and embark on this thrilling prediction voyage! 🏴‍☠️⚔️🚀

In [None]:
prediction = classifier.predict(test_x)

##### The moment of truth has come! With XGBoost leading the way, our predictive model conquers the Titanic challenge with remarkable accuracy. Victorious, we set sail towards new data adventures, armed with valuable lessons and a passion for uncovering untold stories. Anchors aweigh, full speed ahead! 🏴‍☠️⛵️🌊

In [None]:
output = pd.DataFrame({"PassengerId" : test.PassengerId, 'Survived': prediction})
output.to_csv('submission.csv' , index = False)
output.head()

## **CONCLUSION** 
### In conclusion, I extend my heartfelt gratitude for allowing me to be a part of your data adventure. May the memories of our Titanic Kaggle challenge serve as a reminder of the incredible potential that lies within data analysis and machine learning.

## Thank you for joining me on this data adventure! Keep exploring, learning, and making waves in the world of data science. May your journey be filled with success and growth. happy kaggling! 🚀💻🌟

![XKvskoK.gif](https://i.imgur.com/3GxShZi.gif)

[](http://)