Skip to content

Applied ML: Analysis of adaboost, linear SVC, radial basis function SVM, random forests, decision trees, and BERT on two sentimental NLP datasets - IMDb and 20Newsgroup.

Notifications You must be signed in to change notification settings

andrewncheng/BERT-Trees-RBF-RF-Adaboost

Repository files navigation

AppliedML-BERT-Trees-Boost

Project for my graduate-level ML class (COMP 551). You can find the paper under "writeup.pdf." (last file above).

We analyzed adaboost, linear SVC, radial basis function SVM, random forests, decision trees, and BERT on two sentimental NLP datasets - IMDb and 20Newsgroup.

Abstract

Many machine learning algorithms have been developed in recent history. We will explore the performance of some of the most common models in this paper given a categorical or a binary classification problem on text files. These models include Adaboost, linear SVC, linear regression, radial basis function SVM, random forest, decision tree, and BERT. Our results show the effects of regularization, resampling methods such as bagging (bootstrap aggregating) and 5-fold crossvalidation as well as boosting on model accuracy. We also examine the effects of these strategies on bias-variance tradeoff to determine the best models for each algorithm and data set. Our highest test accuracies were achieved using BERT: 72.40% on 20 Newsgroups and 94.15% on IMDb.

About

Applied ML: Analysis of adaboost, linear SVC, radial basis function SVM, random forests, decision trees, and BERT on two sentimental NLP datasets - IMDb and 20Newsgroup.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published