char-rnn pos tagger

Requirements

This code is written in Lua and requires Torch. If you're on Ubuntu, installing Torch in your home directory may look something like:

$ curl -s https://raw.githubusercontent.com/torch/ezinstall/master/install-deps | bash
$ git clone https://github.com/torch/distro.git ~/torch --recursive
$ cd ~/torch;
$ ./install.sh      # and enter "yes" at the end to modify your bashrc
$ source ~/.bashrc

See the Torch installation documentation for more details. After Torch is installed we need to get a few more packages using LuaRocks (which already came with the Torch install). In particular:

$ luarocks install luautf8
$ luarocks install nngraph
$ luarocks install optim
$ luarocks install nn

If you'd like to train on an NVIDIA GPU using CUDA (this can be to about 15x faster), you'll of course need the GPU, and you will have to install the CUDA Toolkit. Then get the cutorch and cunn packages:

$ luarocks install cutorch
$ luarocks install cunn

If you'd like to use OpenCL GPU instead (e.g. ATI cards), you will instead need to install the cltorch and clnn packages, and then use the option -opencl during training (cltorch issues):

$ luarocks install cltorch
$ luarocks install clnn

Usage

Data

All input data is stored inside the torch_data/data directory. You'll notice that the torch_data/ directory does not exists, in fact that directory is created on first script run. All the needed data, except for the Evalita dataset, will be automatically downloaded in that folder for you.

All the checkpoints are stored in torch_data/save directory.

Evalita: In order to use the evalita dataset, you need to download the evalita.tar.gz archive and unpack it in torch_data/data/CoNLL directory. At the end you should have devel and train files in torch_data/data/CoNLL/Evalita.

With

$ th init.lua

you can generate torch_data/ directory and all the needed subdirectory without actually starting training.

Training

To start training multiple rnn instances with one command you can use create_experiment_start.lua script.

Simply type

$ th create_experiment_starter.lua && ./experiment_starter.sh

to generate and execute a shell script with all the commands needed to start the experiment.

The script provides some parameter to customize the experiment_starter.sh generation. Below the full list of available parameters with default values:

$ th create_experiment_starter.lua -h
#Creates a shell script to start the experiment

#Options
#  -use_space       True to train with space too. [true]
#  -min_nodes       The min number of node to use. [128]
#  -max_nodes       The max number of node to use. [1024]
#  -max_layers      The max number of layers to use [5]
#  -min_seq_lengths The min value of sequence length to use [60]
#  -max_seq_lengths The max value of sequence length to use [100]
#  -seq_length_step The step value to use for increment sequence length [20]
#  -max_epochs      The max number of full passes through the training data [150]

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
data		data
modules		modules
utils		utils
.gitignore		.gitignore
Readme.md		Readme.md
classify.lua		classify.lua
convert_gpu_cpu_checkpoint.lua		convert_gpu_cpu_checkpoint.lua
create_evaluation_starter.lua		create_evaluation_starter.lua
create_experiment_starter.lua		create_experiment_starter.lua
english2000.tar.gz		english2000.tar.gz
evaluate.lua		evaluate.lua
init.lua		init.lua
plot_all.py		plot_all.py
train.lua		train.lua

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

char-rnn pos tagger

Requirements

Usage

Data

Training

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

char-rnn pos tagger

Requirements

Usage

Data

Training

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages