# Model training

This notebook contains the code for the computational simulations for the true form-to-meaning mapping and the random form-to-meaning mapping.

## Preliminaries

Load the libraries required for running the simulations.

In [1]:
# Load libraries
suppressMessages(library(ndl))

Read in the data from the British Lexicon Project and the Calgary Semantic Decision Project.

In [2]:
# Read data
blp = readRDS("data/blp.rds")
csdp = readRDS("data/csdp.rds")

## True form-to-meaning mapping

Simulation for the true form-to-meaning mapping. We read in the data from the BLP, fit a discrimination learning model, and calculate simulated response times.

In [3]:
# Define cues
blp$Cues = orthoCoding(blp$Word)

# Define outcomes
blp$Outcomes = blp$Word

# Calculate associations between cues and outcomes
w = estimateWeights(blp)

# Define function to get activations of lexical representations
getActivation.fnc = function(num, data, weights) {
  cues = as.character(unlist(strsplit(data$Cues[num], "_")))
  acts = sum(weights[cues, data$Word[num]])
  return(acts)
}

# Get activations of lexical activations
blp$Activation = as.numeric(sapply(1:nrow(blp), getActivation.fnc, data = blp, 
  weights = w))

# Calculate simulated response times
blp$SimRT = log(1 / (blp$Activation + 0.00001))

# Restrict to relevant columns
blp = blp[,c("Word", "RT", "RTInv", "SimRT", "Frequency", "LogFrequency",
  "Length", "LogOLD20Norm", "LogMeanBigramFrequency")]

# Save the data
saveRDS(blp, file = "data/blp_simulation.rds")

# Save version of CSDP data with simulated response times
csdp = merge(csdp, blp[,c("Word", "SimRT")], by = "Word")
rownames(csdp) = 1:nrow(csdp)
saveRDS(csdp, file = "data/csdp_simulation.rds")

## Random form-to-meaning mapping

Simulation for the randomized form-to-meaning mapping. We randomize the mapping, fit a discrimination learning model, and calculate simulated response times.

In [4]:
# Create data frame for random mapping
blp_random = readRDS("data/blp.rds")

# Set seed for reproducibility
set.seed(13)

# Define order of re-assignment of forms
order = sample(1:nrow(blp_random))

# Re-define length
blp_random$Length = blp$Length[order]

# Re-define OLD20
blp_random$LogOLD20Norm = blp$LogOLD20Norm[order]

# Re-define mean bigram frequency
bigram_random_list = strsplit(orthoCoding(blp_random$Word[order]), "_")
bigrams = unique(unlist(bigram_random_list))
bigram_random_freqs = sapply(bigrams, FUN = function(x) {
  sum(blp_random$Frequency[which(unlist(lapply(bigram_random_list, 
  function(y){x%in%y})))])  
})
blp_random$MeanBigramFrequency = unlist(lapply(bigram_random_list,
  function(x){sum(bigram_random_freqs[x])/length(x)}))
blp_random$LogMeanBigramFrequency = log(blp_random$MeanBigramFrequency)
blp_random = blp_random[,-which(colnames(blp_random)=="MeanBigramFrequency")]

# Define cues
blp_random$Cues = orthoCoding(blp_random$Word[order])

# Define outcomes
blp_random$Outcomes = blp_random$Word

# Calculate associations between cues and outcomes
w_random = estimateWeights(blp_random)

# Get activations of lexical activations
blp_random$Activation = as.numeric(sapply(1:nrow(blp_random), getActivation.fnc, 
  data = blp_random, weights = w_random))

# Calculate simulated response times
blp_random$SimRT = log(1 / (blp_random$Activation + 0.00001))

# Restrict to relevant columns
blp_random = blp_random[,c("Word", "RT", "RTInv", "SimRT", "Frequency", 
  "LogFrequency", "Length", "LogOLD20Norm", "LogMeanBigramFrequency")]

# Save the data
saveRDS(blp_random, file = "data/blp_simulation_random.rds")

# Create random mapping file for CSDP data
update = c("SimRT", "Length", "LogOLD20Norm", "LogMeanBigramFrequency")
csdp = csdp[,-which(colnames(csdp)%in%update)]
csdp = merge(csdp, blp_random[,c("Word", update)], by = "Word")
rownames(csdp) = 1:nrow(csdp)
saveRDS(csdp, file = "data/csdp_simulation_random.rds")