ClassificationAlgorithms

Comparing the performance of 3 Classification Algorithms; Decision Tree, Random Forest and Support Vector Machine on a dataset.
This is a Python code that reads a dataset from a CSV file, performs some data exploration and analysis, and trains three machine learning models on the data. The three models are a Decision Tree, a Random Forest, and a Support Vector Machine (SVM), and the code uses hyperparameter tuning and cross-validation to find the best hyperparameters for each model. Finally, the code evaluates the performance of each model on a test dataset using various metrics, including precision, recall, accuracy, and F1 score. The SVM model is determined to be the best model among the three based on its performance metrics.
The code starts by importing the necessary libraries, including pandas, numpy, and scikit-learn. It then reads the dataset from a CSV file using the pandas read_csv() function and prints out information about the dataset, including the number of samples for each class and histograms of the features.
The code then splits the data into training and testing datasets using the scikit-learn train_test_split() function. It trains a Decision Tree model on the training data and uses GridSearchCV to find the best hyperparameters for the model. The best hyperparameters are then used to train a new Decision Tree model on the training data, and the model is evaluated on the test data using various metrics.
The code then repeats the same process for a Random Forest model, using GridSearchCV to find the best hyperparameters and evaluating the model on the test data using various metrics.
Finally, the code repeats the same process for an SVM model, using GridSearchCV to find the best hyperparameters and evaluating the model on the test data using various metrics. It then plots a confusion matrix for the best models and a bar graph showing the precision, recall, and F1 score for each class.

Results

Based on the analysis of the results, the Support Vector Machine (SVM) and Random Forest (RF) algorithms outperformed the Decision Trees classification method on this dataset.
The SVM algorithm achieved the highest accuracy of 0.95, while the RF algorithm had an accuracy of 0.944. Both algorithms had high precision, recall, and F1-score values for all classes. The confusion matrix for both algorithms shows that they had high true positive rates and low false positive rates for all classes.
One potential disadvantage of the SVM algorithm is that it can be computationally expensive and may require more resources compared to other algorithms. On the other hand, the RF algorithm can be more prone to overfitting and may not work well with high-dimensional data.
Overall, this assignment has provided a good understanding of how different classification algorithms work and how they can be evaluated using various metrics. In future projects, it may be helpful to explore other types of algorithms and evaluate their performance on similar datasets. Additionally, it would be useful to consider using more advanced techniques such as feature selection and hyperparameter tuning to further improve the performance of the classification models.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
ClassificationAlgorithms.ipynb		ClassificationAlgorithms.ipynb
README.md		README.md
dataset_assignment1.csv		dataset_assignment1.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ClassificationAlgorithms

Results

About

Uh oh!

Releases

Packages

Languages

AtiyaSA/ClassificationAlgorithms

Folders and files

Latest commit

History

Repository files navigation

ClassificationAlgorithms

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages