A practical primer on processing semantic property norm data

Erin M. Buchanan, Simon De Deyne, & Maria Montefinese

Abstract: Semantic property listing tasks require participants to generate short propositions (e.g., <barks>, <has fur>) for a specific concept (e.g., dog). This task is the cornerstone of the creation of semantic property norms which are essential for modelling, stimuli creation, and understanding similarity between concepts. However, despite the wide applicability of semantic property norms for a large variety of concepts across different groups of people, the methodological aspects of the property listing task have received less attention, even though the procedure and processing of the data can substantially affect the nature and quality of the measures derived from them. The goal of this paper is to provide a practical primer on how to collect and process semantic property norms. We will discuss the key methods to elicit semantic properties and compare different methods to derive meaningful representations from them. This will cover the role of instructions and test context, property pre-processing (e.g., lemmatization), property weighting, and relationship encoding using ontologies. With these choices in mind, we propose and demonstrate a processing pipeline that transparently documents these steps resulting in improved comparability across different studies. The impact of these choices will be demonstrated using intrinsic (e.g., reliability, number of properties) and extrinsic measures (e.g., categorization, semantic similarity, lexical processing). This practical primer will offer potential solutions to several longstanding problems and allow researchers to develop new property listing norms overcoming the constraints of previous studies.

Docs: Folder contains drafts and comments to versions of the manuscript.

Manuscript: Folder contains all information necessary to create the PDF/Docx version of the manuscript. Scripts are written inline with the text.

Output_data: Data created from scripts used in the processing pipeline.

Packrat: A compiled backup of the packages used in the manuscript and processing pipeline for reproducibility purposes.

R: R scripts detailed in the manuscript for individual use in the processing pipeline steps.

Raw_data: Data used to demonstrate the processing pipeline and the convergence with other similar projects.

Update: If you have issues with TreeTagger, please check out our discussion on udpipe here.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
R		R
docs		docs
manuscript		manuscript
output_data		output_data
packrat		packrat
raw_data		raw_data
.Rprofile		.Rprofile
.gitattributes		.gitattributes
.gitignore		.gitignore
FLT-Primer.Rproj		FLT-Primer.Rproj
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R

R

docs

docs

manuscript

manuscript

output_data

output_data

packrat

packrat

raw_data

raw_data

.Rprofile

.Rprofile

.gitattributes

.gitattributes

.gitignore

.gitignore

FLT-Primer.Rproj

FLT-Primer.Rproj

LICENSE

LICENSE

README.md

README.md

Repository files navigation

A practical primer on processing semantic property norm data

About

Releases

Packages

Contributors 2

Languages

License

doomlab/FLT-Primer

Folders and files

Latest commit

History

Repository files navigation

A practical primer on processing semantic property norm data

About

Resources

License

Stars

Watchers

Forks

Languages