Hansard Speeches and Sentiment
Repository for a public dataset of speeches in the Hansard. The dataset provides information on each speech of ten words or longer, made in the House of Commons between 1980 and 2016, with information on the speaking MP, their party, gender and age at the time of the speech. The dataset also includes all speeches of ten words made from 1936 to 1980, for a total of 4,212,134 speeches and 773,585,770 words. More information on the dataset is available here. The dataset itself can be accessed through Zenodo.
The speeches have been classified for sentiment using a total of four libraries from the R package
lexicon, one from
syuzhet and one from this paper. All six scores used the method from the
sentimentr package. The libraries are:
The Opinion Mining, Sentiment Analysis and Opinion Spam Detection dataset by Bing Liu, Minqing Hu and Junsheng Cheng, labelled
bing. The Bing library was access through the
The Hu & Liu dataset, by Minqing Hu and Bing Liu, labelled
Hu. The Hu & Liu library was accessed through the
A modified version of the unnamed lexicon from the paper Measuring Emotion in Parliamentary Debates with Automated Textual Analysis, labelled
rheault. As the method in
sentimentrdoes not use distinguish between the same word that can occupy multiple lexical categories, I used the average polarity score assigned to such words.
The dataset is licensed under a Creative Commons Attribution 4.0 International License.
The code included in this repository is licensed under an MIT license.
Please contact me or open an issue here if you find any errors in the dataset. The integrity of the public Hansard record is questionable at times, and while I have improved it, the data is presented 'as is'.
New in 2.4.3
- "Julia Dockerill" name changed to "Julia Lopez" to reflect MP's actual name change