Skip to content

SIGRNN: Synthetic Minority Instances Generation in Imbalanced Datasets Using a Recurrent Neural Network Approach

Notifications You must be signed in to change notification settings

NU-CUCIS/SIGRNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SIGRNN

SIGRNN: Synthetic Minority Instances Generation in Imbalanced Datasets Using a Recurrent Neural Network Approach

Transform data

  • Before generating any data first we transform the input data to a paragraph. transform_data.py is an example of a code snippiet to convert a csv file of features to a text file. This text file if fed as input to the SIGRNN in the next step.

Train SIGRNN

  • To train SIGRNN run main.py. The script takes a set of inputs and can be executed as the following example: python main.py --cuda --nhid 1024 --dropout 0.5 --epochs 20 --batch_size 8 --nlayers 2 --data ./data/seer --bptt 11
    • Flag descriptions:
      • --code is used for GPU execution. The GPU id can be set within the code.
      • --nhid is the number of hidden units.
      • --droput is the percentage of drouped out connections.
      • --batch_size is to specify the batch size.
      • --data is the folder where the data is saved.
        • 3 files need to be present under the directory train.txt, test.txt, and valid.txt
      • --nlayers is the number of hidden layers. The default cell is LSTM.
      • --bptt is for the length of the sequence.
      • The model will be saved in model.pt

Generate using SIGRNN

  • To generate synthetic SIGRNN run generate.py. The script takes a set of inputs and can be executed as the following example: python generate.py --cuda --words 1500000 --data ./data/seer --outf ./data/seer/generated.txt --temperature 1.3
    • Flag descriptions:
      • --cuda is used for GPU execution. The GPU id can be set within the code.
      • --words is the number of words to be generated.
      • --outf is to specify the output file.
      • --temperature is to increase/decrese the varience in the generated examples.
      • --data is the folder where the data is saved.
        • 3 files need to be present under the directory train.txt, test.txt, and valid.txt

About

SIGRNN: Synthetic Minority Instances Generation in Imbalanced Datasets Using a Recurrent Neural Network Approach

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages