Skip to content
Chris Cox edited this page Jul 7, 2017 · 4 revisions

Welcome to the wiki for the AAE Modeling Project!

The objective of this work is to study basic computational and representational differences between speaking and reading a dialect or language variant that differs substantially from the mainstream language used in the educational system.

Here is a summary of the work, co-presented by Mark Seidenberg and Chris Cox on June 4, 2015.

ModelingSummary_4_June_2015.pptx

(Additional talks and summary slides should be uploaded under Issue #1).

Where do the representations come from?

The patterns used as semantic and phonological input are based on those used by Harm and Seidenberg, 2004. The phonological representations are a subset of those used and described within Mike Harm's dissertation. The othography is simply a unique bit for each letter in the English alphabet, with no intrinsic structure.

More information

Because this work has not yet culminated in a paper, the documents that summarize and describe the work are still rather scattered. A large body of notes and discussion relevant to work happened over email. To that end, I've archived many of the emails pertaining to the work here within the wiki. Eventually, these will be worked through and the relevant bits extracted into actual documentation. Most of it, though, chronicals trial and error, and attempts to think more deeply about the model in order to overcome the current hurdle.

Things we are manipulating

The chief goal is to show that there are computational challenges associated with using a non-standard dialect in mainstream classes, that can explain learning challenges, even when controlling for SES, home life, etc. Modelling is essentially the only way to isolate the challenges associated with juggling two distinct but highly intercorrelated ways of communicating while, for example, trying to learn to read.

To this end, the modelling revolves around defining a phonology that is deemed "standard" and deriving a variant according to some set of rules that have basis in reality.

The challenge, from a theoretical perspective, is that not deviations from the standard way of speaking are as troublesome as others. Various populations have distinctive accents and dialects, and not all of them struggle to the same degree with learning to read. Thus, an important challenge to the modeling enterprise is to identify which deviations seem to matter most, and which don't. We can then see if that pattern of difficulty seems to predict which real-world student populations tend to struggle the most.

There are several things that can be varied:

  • The rule set for defining another dialect/major language variant.
  • The procedure by which more subtle language variants can be derived.
  • The proportion of the words in the training corpus that differ between the standard and alternative ways of speaking.
  • The words that compose the corpus (this is a confound that can be overcome by training the model on many comparable subsamples of the corpus.
  • The pattern of exposure (blocked exposure, interleaved exposure, isolated exposure followed by ...).
  • Others... probably...

Coping with complexity...

Because there are many variables in play, a large number of model variants will need to be fit, each sub-sample of the corpus needs a slightly different model architecture, and the volume of data these models will produce is very large, many of the procedures around setting up and running the models have been scripted. The data itself is organized in a database.

Clone this wiki locally