Skip to content

GiulioCMSanto/titanic-data-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Titanic Data Analysis - A Descriptive Statistics Approach

This project aims to analyse the kaggle titanic data (https://www.kaggle.com/c/titanic/data) following a descriptive statistics approach.

Motivation

The motivation of this project is to follow some steps of the CRISP-DM approach, including formulating questions, preparing and cleaning the data and exploring the data. This project, however, will not drive into inferential statistics or machine learning.

Files in this Repository

README.md: the present file

titanic-data-6.csv: the kaggle titanic data (https://www.kaggle.com/c/titanic/data)

titanic_project.ipynb: the jupyter notebook with the code, analysis and visualizations

titanic_project.html: a html version of the jupyter notebook

Results

Some of the obtained results were:

  • It seems that age and number of deaths are correlated. The average age between those how didn't survive is higher than between those how survived. Moreover, the age distribution amongst those how didn't survive is right skewed.

  • There is no evidence that couples with children had a lower chance of survival.

  • People in the economy class were in greater proportion among those who did not survive.

Used Libraries

  • Pandas
  • Numpy
  • Matplotlib
  • seaborn
  • mpl_toolkits.mplot3d

Acknowledgements

I would like to ackonwledge Udacity and Kaggle for providing the datasets.

About

Descriptive Statistics Project With Titanic Data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published