## 🚀 Project Initialization   
This repository is the starting point of my Titanic Data Analysis project. The goal is to explore the Titanic dataset, clean and preprocess the data, perform exploratory analysis, and build predictive models to understand which factors influenced passenger survival.   
-------------
📦 Next Steps   
- Set up the project structure (data, notebooks, src, reports).  
- Install the required Python libraries.  
- Begin exploratory data analysis (EDA).
-------------
⚙️ Required Libraries:

In [None]:
%pip install pandas numpy matplotlib seaborn scikit-learn jupyterlab nbconvert imblearn -q 

🔹 Next Steps: 

- Read the CSV file using pandas.  
- Inspect the first few rows and basic information (.head(), .info(), .describe()).  
- Begin exploratory data analysis (EDA).

In [1]:
import pandas as pd

data = r"D:\Desktop\Analysis_Titanic\Titanic-analysis\data\titanic.csv"

df = pd.read_csv(data)
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


#### Exploring Dataset Structure with 'df.info()'
The command:

In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


#### View summary statistics for numeric columns

In [3]:
df.describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,257.353842,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0
25%,223.5,0.0,2.0,20.125,0.0,0.0,7.9104
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.4542
75%,668.5,1.0,3.0,38.0,1.0,0.0,31.0
max,891.0,1.0,3.0,80.0,8.0,6.0,512.3292


#### Checking for Missing Data

After loading the Titanic dataset and reviewing its structure with 'df.info()' and, the next step is to identify missing values in the dataset.  

We use:

In [4]:
df.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

#### Separating Features and Target

In supervised machine learning, we divide the dataset into:

- **Target variable (`y`)** → the outcome we want to predict.  
- **Features (`X`)** → the input variables used to make the prediction.  

For the Titanic dataset:

In [5]:
#Target variable
y = df['Survived']

#Features - select relevant columns (can be adjusted later)
X = df[['Pclass', 'Sex', 'Age', 'Fare']]