Skip to content

Comparing the performance of 3 Classification Algorithms; Decision Tree, Random Forest and Support Vector Machine on a dataset.

Notifications You must be signed in to change notification settings

AtiyaSA/ClassificationAlgorithms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

ClassificationAlgorithms

Comparing the performance of 3 Classification Algorithms; Decision Tree, Random Forest and Support Vector Machine on a dataset.
This is a Python code that reads a dataset from a CSV file, performs some data exploration and analysis, and trains three machine learning models on the data. The three models are a Decision Tree, a Random Forest, and a Support Vector Machine (SVM), and the code uses hyperparameter tuning and cross-validation to find the best hyperparameters for each model. Finally, the code evaluates the performance of each model on a test dataset using various metrics, including precision, recall, accuracy, and F1 score. The SVM model is determined to be the best model among the three based on its performance metrics.
The code starts by importing the necessary libraries, including pandas, numpy, and scikit-learn. It then reads the dataset from a CSV file using the pandas read_csv() function and prints out information about the dataset, including the number of samples for each class and histograms of the features.
The code then splits the data into training and testing datasets using the scikit-learn train_test_split() function. It trains a Decision Tree model on the training data and uses GridSearchCV to find the best hyperparameters for the model. The best hyperparameters are then used to train a new Decision Tree model on the training data, and the model is evaluated on the test data using various metrics.
The code then repeats the same process for a Random Forest model, using GridSearchCV to find the best hyperparameters and evaluating the model on the test data using various metrics.
Finally, the code repeats the same process for an SVM model, using GridSearchCV to find the best hyperparameters and evaluating the model on the test data using various metrics. It then plots a confusion matrix for the best models and a bar graph showing the precision, recall, and F1 score for each class.

Results

Based on the analysis of the results, the Support Vector Machine (SVM) and Random Forest (RF) algorithms outperformed the Decision Trees classification method on this dataset.
The SVM algorithm achieved the highest accuracy of 0.95, while the RF algorithm had an accuracy of 0.944. Both algorithms had high precision, recall, and F1-score values for all classes. The confusion matrix for both algorithms shows that they had high true positive rates and low false positive rates for all classes.
One potential disadvantage of the SVM algorithm is that it can be computationally expensive and may require more resources compared to other algorithms. On the other hand, the RF algorithm can be more prone to overfitting and may not work well with high-dimensional data.
Overall, this assignment has provided a good understanding of how different classification algorithms work and how they can be evaluated using various metrics. In future projects, it may be helpful to explore other types of algorithms and evaluate their performance on similar datasets. Additionally, it would be useful to consider using more advanced techniques such as feature selection and hyperparameter tuning to further improve the performance of the classification models.

About

Comparing the performance of 3 Classification Algorithms; Decision Tree, Random Forest and Support Vector Machine on a dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published