GitHub - Democracy-Lab/hansardr: Access a cleaned version of the c19 Hansard corpus with improved speaker names in the R environment.

About hansardr

hansardr makes it easy to access the parsed debates from The Hansard 19th-Century British Parliamentary Debates with Improved Speaker Names within the R environment.

This is a clean corpus of the 19th-century British Parliamentary Debates (1803-1909), also known as Hansard. It identifies debates whose records are missing from UK Parliament’s corpus, and it also offers a field for disambiguated speakers. We believe these improvements will enable researchers to analyze the Hansard debates, including speaker discourse, in a way that has not been accessible before.

For supplementary materials meant to support the analysis of the Hansard debates, including tokens and their raw counts, bigrams and their raw counts, special vocabulary, speaker metadata, and topics from LDA topic modeling, see our full data set hosted on the Harvard Dataverse.

Installation

source("https://raw.githubusercontent.com/stephbuon/hansardr/master/tools/install_hansardr.R")

Now the package can be imported as usual:

library(hansardr)

Accessing the Corpus

hansardr comes with a sample data set of 10 rows per decade subset. To download the full corpus, use download_hansard(). The samples will be replaced with data for the entire century.

Label	Description	Key
`hansard_YYYY`	Hansard debate text	`sentence_id`
`debate_metadata_YYYY`	Hansard debate metadata such as speechdate and title.	`sentence_id`
`speaker_metadata_YYYY`	Original speaker name, disambiguated speaker name, and more.	`sentence_id`
`file_metadata_YYYY`	Corpus metadata such as IDs for speech, source file, column, and more.	`sentence_id`

We also provide keywords lists that were used in scholarly research.

Label	Description
`events`	Manually selected list of events and their years

Usage

Load hansardr.

library(hansardr)

Download the entire corpus. This will only need to be done once.

download_hansard()

Read files into the R environment.

data("hansard_1880")

data("debate_metadata_1880")

Constructing a larger data set from each subsection of the data is easy.

Tables can be joined on the sentence_id field, a unique ID assigned to each sentence of the Hansard debates.

combined_hansard_df_1800 <- left_join(hansard_1800, debate_metadata_1800, by = "sentence_id")

Tables can be bound by row using rbind() from base R, or bind_rows() from the tidyverse.

hansard_df_1850_through_1860 <- rbind(hansard_1850, hansard_1860)

or

library(tidyverse)

hansard_df_1850_through_1860 <- bind_rows(hansard_1850, hansard_1860)

Report a Problem

This is the first analysis-ready c19 Hansard corpus with disambiguated speaker names. As described in our research, we use mixed methods (algorithmic and qualitative) to disambiguate speaker names, and we arrive at about an approximate 85% disambiguation rate. If, while using our data set, you find a bug we would appreciate you sharing it with us! You can write an issue on our hansard-speakers repository.

Citation

Buongiorno, Steph, 2021, hansardr. Available: https://github.com/stephbuon/hansardr.

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
.Rproj.user		.Rproj.user
R		R
data		data
man		man
tools		tools
.Rbuildignore		.Rbuildignore
.Rhistory		.Rhistory
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.md		README.md
hansardr.Rproj		hansardr.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

About hansardr

Installation

Accessing the Corpus

Contents

Usage

Report a Problem

Citation

About

Licenses found

Uh oh!

Releases

Packages

Languages

License

Licenses found

Democracy-Lab/hansardr

Folders and files

Latest commit

History

Repository files navigation

About hansardr

Installation

Accessing the Corpus

Contents

Usage

Report a Problem

Citation

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages