VictorianLit

VictorianLit Dataset for Deep Learning-Based Sentiment Analysis of Victorian Literary Texts | by Hoyeol Kim

Download: VictorianLit (Kaggle)

You can download the VictorainLit dataset directly by using the following URL:

https://elibooklover.github.io/VictorianLit/VictorianLit.csv

Here is example code for loading the VictorianLit dataset:

df=pd.read_csv('https://elibooklover.github.io/VictorianLit/VictorianLit.csv')
df.head()

Dataset

There are two columns: sentences and label. The VictorianLit dataset has five labels based on sentiment: 0 (very negative), 1 (negative), 2 (neutral), 3 (positive), 4 (very positive).

The VictorianLit dataset, which has 53,826 rows and 2 columns, consists of five different novels from the Victorian era: Charles Dickens' Little Dorrit and Oliver Twist, Elizabeth Gaskell's North and South, George Eliot's Adam Bede, and Mary Elizabeth Braddon's Lady Audley's Secret. The maximum sentence length of the VictorianLit dataset is 372.

Test Results

The VictorianLit dataset was tested with the BERT-Base model released by Google Research. The BERT-Base, Uncased model (12-layer, 768-hidden, 12-heads, 100M parameters) was run with the VictorianLit dataset in order to validate the dataset.

For fine-tuning BERT for sentiment analysis, the following hyperparameters and training environments were set:

tokenizer: BertTokenizer
max_sequence_length: 400
batch_size: 16
model_name: BERT-base, Uncased (12-layer, 768-hidden, 12-heads, 110M parameters)
learning_rate: 1e-5
epochs: 4
GPU: Tesla T4

The accuracy is 93%, and the average training loss is 0.12. If the batch_size was larger, the accuracy would be higher. If your GPU ram is enough to cover the large batch_size, I recommend you set the batch_size to 64 or 128.

Feedback

The VictorianLit dataset will be continuously updated, added upon, and tested. Please feel free to provide any feedback or suggest sentiment value changes with supporting statements.

Citation

Please use the following reference to cite the dataset:

@misc{VictorianLit,
    author       = {Hoyeol Kim},
    title        = {{VictorianLit Dataset for Machine Learning-Based Sentiment Analysis of Victorian Literary Texts}},
    month        = Sep,
    year         = 2020,
    publisher    = {GitHub},
    url          = {https://github.com/elibooklover/VictorianLit}
    }

or

Kim, Hoyeol, VictorianLit Dataset for Machine Learning-Based Sentiment Analysis of Victorian Literary Texts, September 2020. GitHub repository: github.com/elibooklover/VictorianLit.

License

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
README.md		README.md
VictorianLit.csv		VictorianLit.csv
VictorianLit.ipynb		VictorianLit.ipynb
VictorianLit1.png		VictorianLit1.png
VictorianLit2.png		VictorianLit2.png
VictorianLit3.png		VictorianLit3.png
license.png		license.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

VictorianLit.csv

VictorianLit.csv

VictorianLit.ipynb

VictorianLit.ipynb

VictorianLit1.png

VictorianLit1.png

VictorianLit2.png

VictorianLit2.png

VictorianLit3.png

VictorianLit3.png

license.png

license.png

Repository files navigation

VictorianLit

Download: VictorianLit (Kaggle)

You can download the VictorainLit dataset directly by using the following URL:

Dataset

Test Results

Feedback

Citation

About

Releases

Packages

Languages

elibooklover/VictorianLit

Folders and files

Latest commit

History

Repository files navigation

VictorianLit

Download: VictorianLit (Kaggle)

You can download the VictorainLit dataset directly by using the following URL:

Dataset

Test Results

Feedback

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages