Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics

nnRNN - NeurIPS 2019

expRNN code taken from here

EURNN tests based on code taken from here

Summary of Current Results

Copytask

Permuted Sequential MNIST

PTB

Changes from paper:

Testing of Adam optimizer using betas (0.0, 0.9) on expRNN and nnRNN
Added grad clipping
Note the large improvements in nnRNN
expRNN did not achieve improvements through new optimizer, but was improved by searching higher learning rates

Test Bit per Character (BPC)

	Fixed # of params (~1.32 M)		Fixed # hidden units (N=1024)
Model	T_PTB = 150	T_PTB = 300	T_PTB = 150	T_PTB = 300
RNN	2.89 ± 0.002	2.90 ± 0.002	2.89 ± 0.002	2.90 ± 0.002
RNN-orth	1.62 ± 0.004	1.66 ± 0.006	1.62 ± 0.004	1.66 ± 0.006
EURNN	1.61 ± 0.001	1.62 ± 0.001	1.69 ± 0.001	1.68 ± 0.001
expRNN	1.43 ± 0.002	1.44 ± 0.002	1.45 ± 0.002	1.48 ± 0.008
nnRNN	1.40 ± 0.003	1.42 ± 0.003	1.40 ± 0.003	1.42 ± 0.003

Accuracy

	Fixed # of params (~1.32 M)		Fixed # hidden units (N=1024)
Model	T_PTB = 150	T_PTB = 300	T_PTB = 150	T_PTB = 300
RNN	40.01 ± 0.026	39.97 ± 0.025	40.01 ± 0.026	39.97 ± 0.025
RNN-orth	66.29 ± 0.07	65.53 ± 0.09	66.29 ± 0.07	65.53 ± 0.09
EURNN	65.68 ± 0.002	65.55 ± 0.002	64.01 ± 0.002	64.20 ± 0.003
expRNN	69.02 ± 0.0005	68.98 ± 0.0003	68.69 ± 0.0004	68.57 ± 0.0004
nnRNN	69.89 ± 0.001	69.54 ± 0.001	69.89 ± 0.001	69.54 ± 0.001

Hyperparameters for reported results

Copytask

Model	Hidden Size	Optimizer	LR	Orth. LR	δ	T decay	Recurrent init
RNN	128	RMSprop α=0.9	0.001				Glorot Normal
RNN-orth	128	RMSprop α=0.99	0.0002				Random Orth
EURNN	128	RMSprop α=0.5	0.001
EURNN	256	RMSprop α=0.5	0.001
expRNN	128	RMSprop α=0.99	0.001	0.0001			Henaff
expRNN	176	RMSprop α=0.99	0.001	0.0001			Henaff
nnRNN	128	RMSprop α = 0.99	0.0005	10^-6	0.0001	10^-6	Cayley

sMNIST

Model	Hidden Size	Optimizer	LR	Orth. LR	δ	T decay	Recurrent init
RNN	512	RMSprop α=0.9	0.0001				Glorot Normal
RNN-orth	512	RMSprop α=0.99	5*10^-5				Random orth
EURNN	512	RMSprop α=0.9	0.0001
EURNN	1024	RMSprop α=0.9	0.0001
expRNN	512	RMSprop α=0.99	0.0005	5*10^-5			Cayley
expRNN	722	RMSprop α=0.99	5*10^-5				Cayley
nnRNN	512	RMSprop α=0.99	0.0002	2*10^-5	0.1	0.0001	Cayley
LSTM	512	RMSprop α=0.99	0.0005				Glorot Normal
LSTM	257	RMSprop α=0.9	0.0005				Glorot Normal

PTB

Model	Hidden Size	Optimizer	LR	Orth. LR	δ	T decay	Recurrent init	Grad Clipping Value
Length=150
RNN	1024	RMSprop α=0.9	10^-5				Glorot Normal
RNN-orth	1024	RMSprop α=0.9	0.0001				Cayley
EURNN	1024	RMSprop α=0.9	0.001
EURNN	2048	RMSprop α=0.9	0.001
expRNN	1024	RMSprop α=0.9	0.001				Cayley
expRNN	1386	RMSprop α=0.9	0.008	0.0008			Cayley
nnRNN	1024	Adam β = (0.0,0.9)	0.002	0.0002	0.0001	10^-5	Cayley	10
Length=300
RNN	1024	RMSprop α=0.9	10^-5				Glorot Normal
RNN-orth	1024	RMSprop α=0.9	0.0001				Cayley
EURNN	1024	RMSprop α=0.9	0.001
EURNN	2048	RMSprop α=0.9	0.001
expRNN	1024	RMSprop α=0.9	0.001				Cayley
expRNN	1386	RMSprop α=0.9	0.001				Cayley
nnRNN	1024	Adam β = (0.0, 0.9)	0.002	0.0002	0.0001	10^-6	Cayley	5

Usage

Copytask

python copytask.py [args]

Options:

net-type : type of RNN to use in test
nhid : number if hidden units
cuda : use CUDA
T : delay between sequence lengths
labels : number of labels in output and input, maximum 8
c-length : sequence length
onehot : onehot labels and inputs
vari : variable length
random-seed : random seed for experiment
batch : batch size
lr : learning rate for optimizer
lr_orth : learning rate for orthogonal optimizer
alpha : alpha value for optimizer (always RMSprop)
rinit : recurrent weight matrix initialization options: [xavier, henaff, cayley, random orth.]
iinit : input weight matrix initialization, options: [xavier, kaiming]
nonlin : non linearity type, options: [None, tanh, relu, modrelu]
alam : strength of penalty on (δ in the paper)
Tdecay : weight decay on upper triangular matrix values

permuted sequtential MNIST

python sMNIST.py [args]

Options:

net-type : type of RNN to use in test
nhid : number if hidden units
epochs : number of epochs
cuda : use CUDA
permute : permute the order of the input
random-seed : random seed for experiment (excluding permute order which has independent seed)
batch : batch size
lr : learning rate for optimizer
lr_orth : learning rate for orthogonal optimizer
alpha : alpha value for optimizer (always RMSprop)
rinit : recurrent weight matrix initialization options: [xavier, henaff, cayley, random orth.]
iinit : input weight matrix initialization, options: [xavier, kaiming]
nonlin : non linearity type, options: [None, tanh, relu, modrelu]
alam : strength of penalty on (δ in the paper)
Tdecay : weight decay on upper triangular matrix values
save_freq : frequency in epochs to save data and network

PTB

Adapted from here

python language_task.py [args]

Options:

net-type : type of RNN to use in test
emsize : size of word embeddings
nhid : number if hidden units
epochs : number of epochs
bptt : sequence length for back propagation
cuda : use CUDA
seed : random seed for experiment (excluding permute order which has independent seed)
batch : batch size
log-interval : reporting interval
save : path to save final model and test info
lr : learning rate for optimizer
lr_orth : learning rate for orthogonal optimizer
rinit : recurrent weight matrix initialization options: [xavier, henaff, cayley, random orth.]
iinit : input weight matrix initialization, options: [xavier, kaiming]
nonlin : non linearity type, options: [None, tanh, relu, modrelu]
alam : strength of penalty on (δ in the paper)
Tdecay : weight decay on upper triangular matrix values
optimizer : choice of optimizer between RMSprop and Adam
alpha : alpha value for optimizer (always RMSprop)
betas : beta values for adam optimizer

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
expRNN		expRNN
.gitignore		.gitignore
LSTM.py		LSTM.py
README.md		README.md
RNN_Cell.py		RNN_Cell.py
SUPPL_nnRNN_NeurIPS_2019 .pdf		SUPPL_nnRNN_NeurIPS_2019 .pdf
adding_problem.py		adding_problem.py
copytask.py		copytask.py
copytask_samehidpng.png		copytask_samehidpng.png
copytask_sameparamspng.png		copytask_sameparamspng.png
exprnn.py		exprnn.py
figure1_nonnormal.ipynb		figure1_nonnormal.ipynb
final_plot.py		final_plot.py
geo_SGD.py		geo_SGD.py
language_task.py		language_task.py
psMNIST_samehid.png		psMNIST_samehid.png
psMNIST_sameparams.png		psMNIST_sameparams.png
sMNIST.py		sMNIST.py
utils.py		utils.py

KyleGoyette/nnRNN

Folders and files

Latest commit

History

Repository files navigation

Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics

Summary of Current Results

Copytask

Permuted Sequential MNIST

PTB

Test Bit per Character (BPC)

Accuracy

Hyperparameters for reported results

Copytask

sMNIST

PTB

Usage

Copytask

permuted sequtential MNIST

PTB

About

Resources

Stars

Watchers

Forks

Languages