NU-CUCIS / SIGRNN Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

SIGRNN: Synthetic Minority Instances Generation in Imbalanced Datasets Using a Recurrent Neural Network Approach

0 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
data.py		data.py
generate.py		generate.py
main.py		main.py
model.py		model.py
transform_data.py		transform_data.py

Repository files navigation

SIGRNN

SIGRNN: Synthetic Minority Instances Generation in Imbalanced Datasets Using a Recurrent Neural Network Approach

Transform data

Before generating any data first we transform the input data to a paragraph. transform_data.py is an example of a code snippiet to convert a csv file of features to a text file. This text file if fed as input to the SIGRNN in the next step.

Train SIGRNN

To train SIGRNN run main.py. The script takes a set of inputs and can be executed as the following example: python main.py --cuda --nhid 1024 --dropout 0.5 --epochs 20 --batch_size 8 --nlayers 2 --data ./data/seer --bptt 11
- Flag descriptions:
  - --code is used for GPU execution. The GPU id can be set within the code.
  - --nhid is the number of hidden units.
  - --droput is the percentage of drouped out connections.
  - --batch_size is to specify the batch size.
  - --data is the folder where the data is saved.
    - 3 files need to be present under the directory train.txt, test.txt, and valid.txt
  - --nlayers is the number of hidden layers. The default cell is LSTM.
  - --bptt is for the length of the sequence.
  - The model will be saved in model.pt

Generate using SIGRNN

To generate synthetic SIGRNN run generate.py. The script takes a set of inputs and can be executed as the following example: python generate.py --cuda --words 1500000 --data ./data/seer --outf ./data/seer/generated.txt --temperature 1.3
- Flag descriptions:
  - --cuda is used for GPU execution. The GPU id can be set within the code.
  - --words is the number of words to be generated.
  - --outf is to specify the output file.
  - --temperature is to increase/decrese the varience in the generated examples.
  - --data is the folder where the data is saved.
    - 3 files need to be present under the directory train.txt, test.txt, and valid.txt

About

SIGRNN: Synthetic Minority Instances Generation in Imbalanced Datasets Using a Recurrent Neural Network Approach

machine-learning

Custom properties

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%