pdp approach to matematical cognition

For the summer research project at Stanford CSLI, supervised by Professor Jay McClelland.

There is an RL driven model for a couting sub-task. Given a sequence of objects (<=7), placed on a one-dimensional line, the model learns to touch everything exactly once from left to right. The following figures display the model's architecture and its performance under different training regime.

Here's the model. It learns via the Q learning rule, supported by a linear neural network function approximator.

Since the model is so simple, model follows the standard RL fails to learn the task. However, with "intermediate feedback" and "teaching", the model can learn the task quite well (especially when you combine these two strategies).

Even with additional teaching strategies, we needed 10,000 epochs to train the model (figure above). It turns out the training experience required can be reduced substantially with an experience replay buffer.

Specifically, the replay buffer store the most recent 500 transitions {s_t, a_t, r_t, s_t+1}. In each epoch, the model learns from a batch of transitions (with fixed batch size), sampled uniformly with replacement from the buffer. All models use the combination of intermediate reward and teacher demonstration. When the model is augmented with the replay buffer, it only needs 500 epochs to match the previous performance (figure below). However, too many replay (red curve) doesn't help and it can even be detrimental.

Name		Name	Last commit message	Last commit date
Latest commit History 503 Commits
[past]		[past]
[plots]		[plots]
demo_nn		demo_nn
shared		shared
sim20.1_simplify		sim20.1_simplify
sim20.2_scaleRwd		sim20.2_scaleRwd
sim20_perfect		sim20_perfect
sim21.0_replay		sim21.0_replay
sim21.1_hiddenUnits		sim21.1_hiddenUnits
sim21.2_targetNet		sim21.2_targetNet
sim21.3_hidden_targ		sim21.3_hidden_targ
sim21.4_autoStop		sim21.4_autoStop
sim22.0_touchInput		sim22.0_touchInput
sim23.0_count		sim23.0_count
stats		stats
statsGroup		statsGroup
testGroup		testGroup
testModel		testModel
.gitignore		.gitignore
README.md		README.md
testing.m		testing.m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdp approach to matematical cognition

About

Releases

Packages

Languages

qihongl/mathCognition_PDP_RL

Folders and files

Latest commit

History

Repository files navigation

pdp approach to matematical cognition

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages