This project aims to analyse the kaggle titanic data (https://www.kaggle.com/c/titanic/data) following a descriptive statistics approach.
The motivation of this project is to follow some steps of the CRISP-DM approach, including formulating questions, preparing and cleaning the data and exploring the data. This project, however, will not drive into inferential statistics or machine learning.
README.md: the present file
titanic-data-6.csv: the kaggle titanic data (https://www.kaggle.com/c/titanic/data)
titanic_project.ipynb: the jupyter notebook with the code, analysis and visualizations
titanic_project.html: a html version of the jupyter notebook
Some of the obtained results were:
-
It seems that age and number of deaths are correlated. The average age between those how didn't survive is higher than between those how survived. Moreover, the age distribution amongst those how didn't survive is right skewed.
-
There is no evidence that couples with children had a lower chance of survival.
-
People in the economy class were in greater proportion among those who did not survive.
- Pandas
- Numpy
- Matplotlib
- seaborn
- mpl_toolkits.mplot3d
I would like to ackonwledge Udacity and Kaggle for providing the datasets.