Skip to content

MG1800/gsac

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

GSAC: A Gujarati Sentiment Analysis Corpus from Twitter

This repository contains the GSAC dataset described in this paper. It contains a total of 6,575 tweets manually annotated by native speakers in three sentiment classes - positive, negative, and neutral. The dataset is split into train, dev, and test splits in a 70-10-20 ratio. For further details, refer to the linked paper.

As per Twitter's privacy policy, only the ID of the tweet and the sentiment labels are available in the dataset. The dataset must first be hydrated using Twitter API to retrieve the full tweet and other information about each individual tweet using the ID.

If you are using this dataset, please cite the following paper -

@inproceedings{gokani-mamidi-2023-gsac,
title = "{GSAC}: A {G}ujarati Sentiment Analysis Corpus from {T}witter",
author = "Gokani, Monil  and
  Mamidi, Radhika",
booktitle = "Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, {\&} Social Media Analysis",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.wassa-1.12",
doi = "10.18653/v1/2023.wassa-1.12",
pages = "129--137",
}

About

Gujarati Sentiment Analysis Cropus

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published