Reddit Scarcasm Detection

Using reddit data to detect sarcasm in the comments. The aim of our project is to detect if the comments in Reddit threads are sarcastic or not. A sarcastic expression or comment can be defined as one that is caustic, bitter, or cutting. Sarcasm detection is an arduous task, as it’s largely dependent on context, prior knowledge and the tone in which the sentence was spoken or written. It is crucial to know what exactly sarcasm is since its borders are not exactly well defined unlike in sentiment analysis where the sentiment categories are very clearly defined (”love” objectively has a positive sentiment, ”hate” a negative sentiment no matter who you ask or what language you speak).

Dataset

This dataset contains 1.3 million Sarcastic comments from the Internet commentary website Reddit. The dataset was generated by scraping comments from Reddit containing the (sarcasm) tag. This tag is often used by Redditors to indicate that their comment is in jest and not meant to be taken seriously, and is generally a reliable indicator of sarcastic comment content.This is a balanced dataset. Attribute Information:

label: If comment is Sarcastic or not
comment: The comment for which we need to determine if its sarcastic or not
author: Author of the comment
subreddit: The subreddit in which the comment was posted
score: The net of upvote and downvotes
ups: The number of upvotes
downs: The number of downvotes
date: The date comment was posted
created utc: The timestamp when the comment was posted.
parent comment: The parent comment to which the comment was posted as a response.

Results:

The below table shows that we started with using Logistic Regression with TF-IDF which gave us an accuracy of 68.32%. However, no improvement was observed with other models except FastText and Bi-directional LSTM with one_hot.

After using the models listed in above table, we found that the "Bidirectional-LSTM with one_hot" performs the best and results in an accuracy of 72.15%. Hence, moving forward with this model we have shown the confusion matrix below based on it.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
docs		docs
models		models
notebooks		notebooks
references		references
reports		reports
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
test_environment.py		test_environment.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reddit Scarcasm Detection

Dataset

Results:

About

Releases

Packages

Contributors 2

Languages

License

PraveenKumarSridhar/reddit_sarcasm_detection

Folders and files

Latest commit

History

Repository files navigation

Reddit Scarcasm Detection

Dataset

Results:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages