Char2Char

A Character level Langauge model build using tensorflow

Getting Started

Clone the repository
```
git clone https://github.com/coder3101/char2char
```
OR

Download as a zip from this link and then unzip
Install the required packages using the given command below
```
pip install -r requirements.txt
```
The above command should be executed from the direcory you have downloaded or cloned the repository. You need a working internet connectivity for the above commad to execuute properly

Training your own model

To train a model with your own dataset. You need to run the script named python. First get your data in a text file. Say data.txt contains some large text. Copy that file and paste it to the file/folder you have cloned or unzipped the repo.

Run the following for training the model on your own data (say data.txt)

python train.py --file_name "./data.txt" --name "name_of_model"

To tune the hyper parameters like batch_size, learning_rate, etc, you can pass additional args to train script as

python train.py --file_name "./data.txt" \
                --name "name_of_model" \
                --encoding "utf-8" \
                --epochs 50 \
                --batch_size 100 \
                --units 256 \
                --num_layers 3 \
                --cell_type "gru" \
                --input_dropout_keep_prob 0.8 \
                --output_dropout_keep_prob 0.8 \
                --learning_rate 0.01 \
                --optimizer "rms"

To See what each argument does. Simply type

python train.py --help

The train script will run for the specified number of iterations and a progress bar will be shown in the terminal window. Upon completion of the script a JSON file and trained parameters files folders will be produced. These files will be used by Sample script to get prediction from trained model.

File(s) produced	Use
model_name.json	Contains configs for the sample file to be used.
saved-v1 direcory	Contained learned parameters of training.

Sampling from trained model

To get predictions from trained model. You need to run the sample script that will write the predictions to a file.

python sample.py --output_json "name_of_model.json" \
		         --seq_len 200 \
		         --source_file "./data.txt"

The above script will generate a file name_of_model-output.txt this file contains output produced by the model. The number of characters in the file is specified by --seq_len argument.

Here --output_json is the name of the json generated by train script and --source_file is the file you have trained.

For more info you can run

python sample.py --help

Examples

We ran the model with company_names.txt and then sampled the predictions and got the following new names :

Marsen
Penin
Genir

Many more other names were generated you can have a look at new_names.txt file that was generated by the sample script.

While it may seem the names are not very novel, we accept but it was because we trained for lesser number of epochs. You can always train the model again with higher epochs, and more units will generate really novel texts

SPECIAL THANKS TO Rohan FOR HIS HARDWORK IN COLLECTING THE DATA

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
company_names.txt		company_names.txt
model.py		model.py
new_names.txt		new_names.txt
reader.py		reader.py
requirements.txt		requirements.txt
sample.py		sample.py
train.py		train.py
utilits.py		utilits.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Char2Char

Getting Started

Training your own model

Sampling from trained model

Examples

About

Releases

Packages

Contributors 2

Languages

License

coder3101/char2char

Folders and files

Latest commit

History

Repository files navigation

Char2Char

Getting Started

Training your own model

Sampling from trained model

Examples

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages