# Titanic Survival Analysis

![Descriptive Image Alt Text](../data/titanic.webp)

## Introduction

### About the Dataset

The Titanic dataset is an essential resource in data science and machine learning, providing rich, detailed information about the passengers aboard the RMS Titanic. Renowned for its educational value in predictive modeling and statistical analysis, the dataset helps determine the likelihood of survival based on various socio-economic and demographic factors.

### Data Description:

This comprehensive dataset is frequently used to forecast outcomes and discern statistical associations. It allows for the examination of how factors such as socio-economic status, age, and gender may correlate with a higher chance of surviving the disaster.

- **Number of Instances**: 1309
- **Number of Attributes**: 14

The attributes cover a range of data points, including passenger names, ticket class, age, sex, and survival status, among others. Each entry is a snapshot of history, encapsulating a story within the larger narrative of the Titanic's fateful journey.

The goal is to utilize this dataset to understand the factors that significantly influenced the survival of individuals during the Titanic's sinking on April 15, 1912. The insights drawn from this analysis can offer valuable lessons in both historical context and methodological approaches in data science.

### About the Problem
Our aim is to investigate the factors that contributed to the survival of passengers during the sinking of the RMS Titanic. This is a binary classification problem where the outcome variable indicates survival (1) or not (0).

### About the Solution
By employing machine learning techniques, we intend to predict survival outcomes. Our model will be trained on historical data and evaluated on its ability to accurately predict whether a passenger survived based on the available attributes.

This work is facilitated by:

| Name            | Number      |
|-----------------|-------------|
| Fernando Afonso | up202108686 |
| Gonçalo Matias  | up202108703 |
| Tiago Simões    | up202108857 |


## Work Specification

### Task Overview
The task at hand is to analyze and understand the factors that influenced survival rates among passengers during the tragic sinking of the RMS Titanic on April 15, 1912.

### Dataset Attributes:

- **Pclass**: Ticket class indicating the socio-economic status of the passenger.
- **Survived**: If the passenger survived (1) or not (0).
- **Name**: Full name including title (e.g., Mr., Mrs., etc.).
- **Sex**: Gender of each passenger.
- **Age**: Age of each passenger in years.
- **SibSp**: Number of siblings or spouses aboard the Titanic for the respective passenger.
- **Parch**: Number of parents or children aboard the Titanic for the respective passenger.
- **Ticket**: The ticket number assigned to the passenger.
- **Fare**: Paid by the passenger for the ticket.
- **Cabin**: Cabin number assigned to the passenger, if available.
- **Embarked**: The port of embarkation for the passenger.
- **Boat**: If the passenger survived, this column contains the identifier of the lifeboat they were rescued in.
- **Body**: If the passenger did not survive, this column contains the identification number of their recovered body, if applicable.
- **Home.dest**: The destination or place of residence of the passenger.

### Objective
The task is to analyze and comprehend the survival factors during the Titanic's sinking on April 15, 1912.

## Tools and Algorithms

### Programming Language:
- **Python**

### Python Libraries:
- (METER OQ USAMOS such as Pandas, NumPy, Scikit-learn, etc.)

### Development Environment:
- **Jupyter Notebook**
- **VSCode**
- **PyCharm**

### Machine Learning Algorithms:
- (METER OQ USAMOS)



## Step 1: Import Libraries

Before diving into the analysis, we'll need to import several key Python libraries. These libraries provide necessary functions and tools to manipulate, analyze, and visualize the data effectively.

Here's a list of commonly used libraries:

- `pandas`: for data manipulation and analysis.
- `numpy`: for numerical computing.
- `matplotlib.pyplot`: for creating static, interactive, and animated visualizations.
- `seaborn`: for making statistical graphics.
- `scikit-learn`: for machine learning and predictive data analysis.

To import these libraries, you would typically run the following Python code:


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
