ModelSelection

In this repo i have:

Clean the data
Split the data into train, validation and test
Fit an initial model and evaluate
Tune the hyperparameter
Evaluate the data into validation
Select and evaluate final model on test set

Titanic.ipynb file is the foundation. Cleaning the data and Spliting data into train, val and test was done in that file. Then i have created five models :

Logistic Regression
Support Vector Machines
Multilayer Perceptron
Random Forest
Boosting

Below are the brief introduction for each model

Logistic Regression

Regression is a statistical process for estimating the relationships among variables. This is often used to make a prediction about some outcome. Linear regression is one type of regression that is used when you have a continuous target variable.
Support Vector Machines

Support vector machines is a classifier that finds an optimal hyperplane that maximizes the margin between two classes.
MLP

Multi-layer perceptron is a classic feed-forward artificial neural network. This is one of the core components of some deep learning algorithms.
RF

A random forest is a machine learning technique that’s used to solve regression and classification problems. It utilizes ensemble learning, which is a technique that combines many classifiers to provide solutions to complex problems.
Boosting

Boosting combines the weak learners to form a strong learner, where a weak learner defines a classifier slightly correlated with the actual classification. In contrast to a weak learner, a strong learner is a classifier associated with the correct categories.

I have created five notebook for five model and train the model with the train data set and then i have found the best param or the best estimator for each model and best estimator was saved into json files of each model.

Best Estimator of each model

Comparision of algorithm

FinalModelSelection does this things :

Evaluate all of our saved models on the validation set
Select the best model based on performance on the validation set
Evaluate that model on the holdout test set

How it does it

Notes:

I have used google colab and i have mounted my drive so all the path of the files are related to my drive. Please change if you are using in the locally or your colab
titanic.csv is the dataset which is used in the machine learning

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
FinalModelSelection.ipynb		FinalModelSelection.ipynb
GB_model.json		GB_model.json
LICENSE		LICENSE
LR_model.json		LR_model.json
LogisticRegression.ipynb		LogisticRegression.ipynb
MLP.ipynb		MLP.ipynb
MLP_model.json		MLP_model.json
README.md		README.md
RF_model.json		RF_model.json
RandomForest.ipynb		RandomForest.ipynb
SVM.ipynb		SVM.ipynb
SVM_model.json		SVM_model.json
Titanic.ipynb		Titanic.ipynb
prediction.csv		prediction.csv
test_features.csv		test_features.csv
test_labels.csv		test_labels.csv
titanic.csv		titanic.csv
titanic_cleaned.csv		titanic_cleaned.csv
train_features.csv		train_features.csv
train_labels.csv		train_labels.csv
val_features.csv		val_features.csv
val_labels.csv		val_labels.csv

License

callmemaze/ModelSelection

Folders and files

Latest commit

History

Repository files navigation

ModelSelection

Best Estimator of each model

Comparision of algorithm

How it does it

About

Resources

License

Stars

Watchers

Forks

Languages