GitHub - billdthompson/cogsci-auto-norm: Autmatic Generalisation of Lexical Norms to New Languages

Concreteness Estimates in 50 Languages

Author: Bill Thompson (biltho@mpi.nl)

Summary

This repository contains concreteness estimates in 50 languages, and code to produce further estimates.

Workflow

Here's how you would compute concreteness estimates in Dutch from a set of experimental lexical norms of concreteness.

Clone this repo.
Download the English and Dutch Wikipedia-trained Skipgram semantic models released in 2017 by Facebook Artificial Intelligence Research. Be sure to download the .vec versions of these models (i.e. wiki.en.vec, wiki.nl.vec), and move the files into this directory.
Ensure this directory contains a csv file with the experimental norms on which you want to train a model. This file should have a column named word and a column named concreteness (or whatever norm you are training on). This repository already includes the file norms.csv which contains the Brysbaert Concreteness norms for English.
Train a simple linear model to predict concreteness from semantic vectors by running:

python distill.py -l en -n concreteness -f norms.csv

This will result in a new dataset of estimated concreteness norms in English, and a vector of estimated coefficients in the linear regression (here's one i made ealier: concreteness-norms-en-prediction-transform.coef).

Transform the Dutch semantic model into English semantic space using a vector-alignment transform (such as those released by Babylon and availible here for 78 languages). This repository already contains the file nl.txt, which is the Babylon-released transform for Dutch. Then apply the inferred regression coefficients to this transformed semantic model. All this is achieved by running:

python extend.py -l nl -n concreteness -v nl.txt -c concreteness-norms-en-prediction-transform.coef

This will produce a new file concreteness-estimates-nl.csv containing estimates of concreteness for the most frequent N terms in the Skipgram vocabulary for Dutch (N = 100000 by default; change this in extend.py).

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
results		results
.gitignore		.gitignore
README.md		README.md
concreteness-norms-en-prediction-transform.coef		concreteness-norms-en-prediction-transform.coef
distill.py		distill.py
extend.py		extend.py
nl.txt		nl.txt
norms.csv		norms.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Concreteness Estimates in 50 Languages

Summary

Workflow

About

Releases

Packages

Languages

billdthompson/cogsci-auto-norm

Folders and files

Latest commit

History

Repository files navigation

Concreteness Estimates in 50 Languages

Summary

Workflow

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages