- Data preprocessing & Data Visualisation (how many classes are divided in our dataset ).
- Over Sample & Under Sample (if data is imbalance make it balance by level uping )
- Normalisation (making larger values into samller values ).
- Divide data set into three parts :
- Training set
- Validation set
- Testing set
- (Logistics Regression
- Support Vector Classifier
- Decision Tree
- K-Nears Neighbors classifier
- Single Layer classifier
- Multi Layer Classifier
- By using the logistic regression , I have got the accuracy for the test data set as 80%
- By using the K – Nearest Neighbors(KNN), I have got the accuracy for the test data set as 80% which is same as logistic regression
- By using the Multi Layer Classifier (MLP), I have got the accuracy for the test data set as 84.168% which is comparatively more than other models
- By using the Single layer Perceptron (SLP), I have got the accuracy for the test data set as 72.5% which is comparatively lesser than other models
- By using the Support Vector Classifier (SVC) , I have got the accuracy for the test data set as 80% which is same as log reg and KNN
- By using the Decision Tree Classifier , I have got the accuracy for the test data set as 83% which is comparatively less than MLP
- By Comparing above model’s accuracy the Best Model is “MLP Classifier”
- Checking True Positive Rate and False Positive Rate by ploting at different rate
- Distingushing the performance of a given classifier
- Confusion Matrix gives us a comparison between actual and predicted values
- ACC = (TP + TN)/(TP + FP + FN + TN)
- K-Fold Cross-Validation. Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample
- The procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into