DataSifterText

$ python3 run_classifier.py --task_name=cdc --do_train=true --do_eval=true --do_predict=true --data_dir=./data/ --vocab_file=./cased_L-12_H-768_A-12/vocab.txt --bert_config_file=./cased_L-12_H-768_A-12/bert_config.json --max_seq_length=512 --train_batch_size=32 --learning_rate=2e-5 --num_train_epochs=3.0 --output_dir=./bert_output/ --do_lower_case=False

The result will be shown in bert_output directory.

References

DataSifter-Lite (V 1.0)
DataSifter website
Marino, S, Zhou, N, Zhao, Yi, Wang, L, Wu Q, and Dinov, ID. (2019) DataSifter: Statistical Obfuscation of Electronic Health Records and Other Sensitive Datasets, Journal of Statistical Computation and Simulation, 89(2): 249–271, DOI: 10.1080/00949655.2018.1545228.
Zhou, N, Wang, L, Marino, S, Zhao, Y, Dinov, ID. (2022) DataSifter II: Partially Synthetic Data Sharing of Sensitive Information Containing Time-varying Correlated Observations, Journal of Algorithms & Computational Technology, Volume 15: 1–17, DOI: 10.1177/17483026211065379.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
DataSifterText-package		DataSifterText-package
DataSifter_Text_app		DataSifter_Text_app
yelp_test_result		yelp_test_result
.DS_Store		.DS_Store
.gitignore		.gitignore
GetTfidf.py		GetTfidf.py
LICENSE		LICENSE
README.md		README.md
run_classifier.py		run_classifier.py
train_sifter.py		train_sifter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataSifterText

Table of Contents

Setup:

Set up python virtual environment

remove pre-existing env

define new env

activate virtual env

install required package

Usage:

Run the whole obfuscation model:

Example

To train a BERT model:

Clone BERT Github Repository:

Download pre-trained BERT model here (Our work uses BERT-Base, Cased):

Using run_classifier.py in this repository, replace the old run_classifier.py

Create "./data" and "./bert_output" directory

Move train_sifter.py to the directory, run train_sifter.py inside the BERT Repository; make sure the data is in the "./data" directory

Now the data is ready. run the following command to start training:

See also

References

About

Releases

Packages

Contributors 5

Languages

License

SOCR/DataSifterText

Folders and files

Latest commit

History

Repository files navigation

DataSifterText

Table of Contents

Setup:

Set up python virtual environment

remove pre-existing env

define new env

activate virtual env

install required package

Usage:

Run the whole obfuscation model:

Example

To train a BERT model:

Clone BERT Github Repository:

Download pre-trained BERT model here (Our work uses BERT-Base, Cased):

Using run_classifier.py in this repository, replace the old run_classifier.py

Create "./data" and "./bert_output" directory

Move train_sifter.py to the directory, run train_sifter.py inside the BERT Repository; make sure the data is in the "./data" directory

Now the data is ready. run the following command to start training:

See also

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages