GitHub - LucasNatalePires/kaggle_titanic: Titanic - Machine Learning from Disaster (Kaggle Competition)

This repository was created to publish the Titanic challenge

The code was divided into 5 steps, in search of the best possible result, each with different alternatives, which will be explained below, what was done and why.

FIRST CODE:

Handling null data using mean() and mode()
Due to the high cardinality, detected by the nunique() function of some columns, at this stage, I chose to execute them since there was no pattern initially
I excluded the 'Embarked' column because it had string values. At first, I tested the model's accuracy without treating it
I created 3 different models using KNC, Random Forest and Logistic Regression I also tested the accuracy and Matrix Confusion of the respective models

Score: 0.66746

SECOND CODE:

In addition to everything that was done in the first code, the only addition was:
- I treated the 'Embarked' column, considering that the variables contained in it were of the string type, therefore, the One Hot Encoder algorithm models would not work

Score: 0.76555

THIRD CODE

I used Robust Scaler to scale the 'Age' and 'Fare' columns, very discrepant values.
These values can be easily detected using Mat Plot Lib
Creation of columns from the 'SibSp' and 'Parch' columns seeking the best accuracy
Correlation of variables to understand what can be created/deleted

Score: 0.76555

FOURTH CODE:

In this stage, all treatments already carried out in the previous stage were applied.
In addition to Random Forest, I applied MLP Classifier (neural networks) to select the best parameters
Despite the apparent improvement, there was Overfitting(basically when the algorithm works very well for training, but does not perform the same in testing)

Score: 0.69856

FINAL CODE

To solve the problem of OverfittingI used Grid Search CV to find the best parameters
In the end, we used Random Forest to make the submission and had an improvement compared to the previous code

Score: 0.7799

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
test_clean.csv		test_clean.csv
test_clean_upd.csv		test_clean_upd.csv
titanic_version1.ipynb		titanic_version1.ipynb
titanic_version2.ipynb		titanic_version2.ipynb
titanic_version3.ipynb		titanic_version3.ipynb
titanic_version4.ipynb		titanic_version4.ipynb
titanic_version5.ipynb		titanic_version5.ipynb
train_clean.csv		train_clean.csv
train_clean_upd.csv		train_clean_upd.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FIRST CODE:

SECOND CODE:

THIRD CODE

FOURTH CODE:

FINAL CODE

About

Releases

Packages

Languages

LucasNatalePires/kaggle_titanic

Folders and files

Latest commit

History

Repository files navigation

FIRST CODE:

SECOND CODE:

THIRD CODE

FOURTH CODE:

FINAL CODE

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages