Skip to content

This project, carried out in Jupyter Notebook, aims to explore the main Data Analysis techniques with Python tools. Pandas, Numpy, Seaborn, Matplotlib, Plotly and sklearn are used. Divided into three notebooks, I separate the data cleaning, data analysis and machine learning part. For more details and goals, see README

Notifications You must be signed in to change notification settings

LucasDeMatheo/DataScienceProject_Titanic

Repository files navigation

DataScienceProject_Titanic

This project, carried out in Jupyter Notebook, aims to explore the main Data Analysis techniques with Python tools.

  • Pandas, Numpy, Seaborn, Matplotlib, Plotly and sklearn are used. Divided into three notebooks, I separate the data cleaning, data analysis and machine learning part.

CoverImage

General Information

Author: Lucas Lobianco De Matheo
Title: Kaggle Titanic DataSet
This dataset was one of the first I worked on and today I feel able to explore it better and with more techniques. Extension: .csv Source: https://www.kaggle.com/azeembootwala/titanic Date: 01-02-2022

Main Skills of this project:

  • Data Preparation
  • Data Cleansing
  • Data Wrangling
  • Data pre-processing
  • Exploratory Data Analysis (EAD)
  • Data Visualization

How to Use

In this project you will find a .csv file that was used to start the project (titanic.csv) and a preprocessing result file (titanic_preprocessed.csv) and (titanic_preprocessed_2.csv).

The notebook that starts the project is the "Titanic DataSet - Data Cleasing", which generates the titanic_preprocessed.csv, useful for the analysis part.

The notebook "Titanic DataSet - Data Analysis" takes the titanic_preprocessed.csv as input and generates the titanic_preprocessed_2.csv which can later be used in BI platforms like PowerBI as an add-on.

The "Titanic DataSet - Machine Learning" notebook also uses the titanic_preprocessed.csv as input, as the M.L. algorithms used prioritize numerical variables.

PipeLine

Insights

Result

About

This project, carried out in Jupyter Notebook, aims to explore the main Data Analysis techniques with Python tools. Pandas, Numpy, Seaborn, Matplotlib, Plotly and sklearn are used. Divided into three notebooks, I separate the data cleaning, data analysis and machine learning part. For more details and goals, see README

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published