SIGRNN: Synthetic Minority Instances Generation in Imbalanced Datasets Using a Recurrent Neural Network Approach
- Before generating any data first we transform the input data to a paragraph.
transform_data.py
is an example of a code snippiet to convert a csv file of features to a text file. This text file if fed as input to the SIGRNN in the next step.
- To train SIGRNN run
main.py
. The script takes a set of inputs and can be executed as the following example:python main.py --cuda --nhid 1024 --dropout 0.5 --epochs 20 --batch_size 8 --nlayers 2 --data ./data/seer --bptt 11
- Flag descriptions:
--code
is used for GPU execution. The GPU id can be set within the code.--nhid
is the number of hidden units.--droput
is the percentage of drouped out connections.--batch_size
is to specify the batch size.--data
is the folder where the data is saved.- 3 files need to be present under the directory
train.txt, test.txt, and valid.txt
- 3 files need to be present under the directory
--nlayers
is the number of hidden layers. The default cell is LSTM.--bptt
is for the length of the sequence.- The model will be saved in
model.pt
- Flag descriptions:
- To generate synthetic SIGRNN run
generate.py
. The script takes a set of inputs and can be executed as the following example:python generate.py --cuda --words 1500000 --data ./data/seer --outf ./data/seer/generated.txt --temperature 1.3
- Flag descriptions:
--cuda
is used for GPU execution. The GPU id can be set within the code.--words
is the number of words to be generated.--outf
is to specify the output file.--temperature
is to increase/decrese the varience in the generated examples.--data
is the folder where the data is saved.- 3 files need to be present under the directory train.txt, test.txt, and valid.txt
- Flag descriptions: