Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Movie review sentiment analysis #14

Closed
bdzyubak opened this issue Mar 27, 2024 · 1 comment
Closed

Movie review sentiment analysis #14

bdzyubak opened this issue Mar 27, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request Experiment Run an evaluation and compare multiple approaches in the same data

Comments

@bdzyubak
Copy link
Owner

bdzyubak commented Mar 27, 2024

Fine-tune Distilbert for movie sentiment review on the following dataset:
https://www.kaggle.com/competitions/sentiment-analysis-on-movie-reviews/data

The IMDB dataset is also interesting for sentiment review. Potentially, implement as a separate experiment and then cross validate training on one or both. Implement the other common networks and compare performance. For those that come with out-of-the-box sentiment analysis, evaluate performance without fine-tuning.

@bdzyubak bdzyubak added enhancement New feature or request Experiment Run an evaluation and compare multiple approaches in the same data labels Mar 27, 2024
@bdzyubak bdzyubak added this to the April Beta Release milestone Mar 27, 2024
@bdzyubak bdzyubak self-assigned this Mar 27, 2024
@bdzyubak
Copy link
Owner Author

bdzyubak commented Apr 1, 2024

DistilBERT was fine tuned to achieve 0.8 training accuracy and 0.68 validation accuracy. The peak validation loss accuracy was achieved in one epoch after which point validation performance deteriorated severely as the model overfit the training data and lost its generalized pre-trained weights. Due to checkpointing, the best model at the first epoch was saved, so we don't need to worry about the later epochs.

training_metrics_version_7

The validation performance is relatively poor, and much lower than the training performance. A larger model might fit the training data better than 80% solving the underfitting issue. On the other hand, I expect even training accuracy to be limited by the observation below. The overfitting may be addressed by freezing most of the network layers and training just the detection head.

The data labels have been augmented by heavily resampling each review into smaller chunks down to one letter. The smaller chunks inherit the label of the original review, so the target sentiment for "A" and "A series", "occasionally amuses" and "none of which amounts to much of a story" all map to the label of the combination of these. Without more intelligent splitting, this may cap the ability of the network to learn sentiments as datapoints like "A"/"A series" will have variable labels.

Next steps:

  1. The IMDB dataset is also interesting for sentiment review. Potentially, implement as a separate experiment and then cross validate training on one or both.
  2. Implement the other common networks and compare performance. For those that come with out-of-the-box sentiment analysis, evaluate performance without fine-tuning.
  3. Compare fine-tuning with frozen layers and only the sentiment head being allowed to train.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Experiment Run an evaluation and compare multiple approaches in the same data
Projects
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant