# Unsupervised Data Augmentation Experimentations

## CS2565

## Final Project



### Acknowledgement
This project expanded upon the experiments done in the open-sourced Unsupervised Data Augmentation paper (https://arxiv.org/pdf/1904.12848.pdf) and released code. 

## Introduction 
This is the main notebook for the various experiments run for the project. The preprocessing step is done in a separate notebook. 

### Hyper-parameter Definitions

#### Task Options:
`do_train` - *Boolean type*, indicate whether to perform training 

`do_eval` - *Boolean type*, indicate whether to perform evaluation

### Training Options:
`sup_train_data_dir` - *String type*, input directory of the supervised data. 

`eval_data_dir` - *String type*, The input data directory of the evaluation data. 

`unsup_data_dir` - *String type*, The input data directory of the unsupervised data. 

`bert_config_file` - *String type* Absolute path to the json file corresponding to the pre-trained BERT model. 

`vocab_file` - *String type*, The vocabulary file that the BERT model was trained on. 

`init_checkpoint` - *String type* Initial checkpoint from the pre-trained BERT model. 

`task_name` - *String type* The name of the task to train. 

`model_dir` - *String type*, The output directory where the model checkpoints will be written. 

### Model configuration
`use_one_hot_embeddings` - *Boolean type*,  default: True, If True, tf.one_hot will be used for embedding lookups, otherwise tf.nn.embedding_lookup will be used. On TPUs, this should be True since it is much faster."

`max_seq_length"` - *Integer type*, default = 128, The maximum total sequence length after WordPiece tokenization. Sequences longer than this will be truncated, and sequences shorter than this will be padded. 

`model_dropout` - *Float type*, default = -1 (i.e., no dropout). Dropout rate for both the attention and the hidden states.

### Training hyper-parameters
`train_batch_size` - *Integer type*, default = 32. 

`eval_batch_size` - *Integer type*, default = 8, "Base batch size for evaluation."

`save_checkpoints_num` - *Integer type*, set to 4, number of checkpoints to save during training.

`iterations_per_loop` - *Integer type*, default = 200, number of steps to make in each estimator call.

`num_train_steps` - *Integer type*, no default, number of training steps

### Optimizer hyperparameters
`learning_rate` - Float, default = 2e-5, the initial learning rate for Adam Optimizer

`num_warmup_steps` - *Integer type*, no default, number of warmup steps

`clip_norm` - *Float type*, default= 1.0, Gradient clip hyperparameter.

### UDA Options:
`unsup_ratio` - *Integer type* - ratio between unsupervised batch size and supervised batch size. If zero - dont use

`aug_ops` - *String type* - what augmentation procedure do you want to run

`aug_copy` - *Integer type* - how many augmentations per example are to be generated

`uda_coeff` - *Float type* - default 1 - This is the coefficient on the UDA loss. Basically you can rely more or less on the UDA loss during the supervised training. The UDA paper generally kept this at 1

`tsa` - *String type* - Annealing schedule to use. Options provided are "" none, linear_schedule, log_schedule, exp_schedule

`uda_softmax_temp` - *Float type*, default -1, A smaller temperature will accentuate differences in probabilities. Low temps were used in the UDA paper for cases with low numbers of labeled data, after masking out uncertain predictions.

`uda_confidence_thresh` - *Float type*, default -1, Threshold value above which the consistency loss term from the UDA is used. Basically ensures we are using loss from random guesses.

### TPU and GPU Options:
`use_tpu` - *Boolean type*, - Whether to use TPU or GPU/CPU.

`tpu_name` - *String type* - address of the TPU

`gcp_project` - *String type* - project name when using TPU

`tpu_zone` - *String type* - can be set or detected

`master` - *String type* Address of the TPU master, if applicable

### Setup - Imports and Mounting Google Drive
Note that the current tensorflow default in colab is 2.x and this code runs on 1.15 so we have to run a cell with the tensorflow_version magic before running `import tensorflow`.

In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import json
import os
import collections
from absl import app
from absl import logging
%tensorflow_version 1.x 
import tensorflow as tf
#tf.enable_eager_execution()
print(tf.__version__)
import yaml
import pprint
import pandas as pd
import numpy as np
import gzip

# from google.colab import drive
# drive.mount('/content/drive')
drive_path = '/content/drive/My Drive/MLproj/'

TensorFlow 1.x selected.
1.15.2


In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Additional Setup for TPU

In [3]:
os.listdir(drive_path)
if 'COLAB_TPU_ADDR' not in os.environ:
  print('ERROR: Not connected to a TPU runtime!')
  tpu_address = False
else:
  from google.colab import auth
  tpu_address = 'grpc://' + os.environ['COLAB_TPU_ADDR']
  print ('TPU address is', tpu_address)

# # Create a bucket in GCP to store datasets and models.
#   bucket_address = "gs://mlproj_0"
#   assert bucket_address, 'Must specify an existing GCS bucket name'
#   print('Using bucket: {}'.format(bucket_address))

#   auth.authenticate_user()
  
  # # Upload credentials to TPU.
  # with tf.Session(tpu_address) as sess:    
  #   with open('/content/adc.json', 'r') as f:
  #     auth_info = json.load(f)
  #   tf.contrib.cloud.configure_gcs(sess, credentials=auth_info)

  with tf.Session(tpu_address) as session:
    devices = session.list_devices()
    
  print('TPU devices:')
  pprint.pprint(devices)

TPU address is grpc://10.94.141.2:8470
TPU devices:
[_DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:CPU:0, CPU, -1, 3043255326141030630),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 16263463755364228666),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 4934673428699722086),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 5293267956941169635),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 8847239818130559550),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:3, TPU, 17179869184, 16386501669672467513),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:4, TPU, 17179869184, 18238429429772722315),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:5, TPU, 17179869184, 10942415255710021637),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:6, TPU, 17179869184, 8916143593705

In [4]:
try:
  tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection
  print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
except ValueError:
  raise BaseException('ERROR: Not connected to a TPU runtime; please see the previous cell in this notebook for instructions!')

tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)

Running on TPU  ['10.94.141.2:8470']
INFO:tensorflow:Initializing the TPU system: 10.94.141.2:8470
INFO:tensorflow:Finished initializing TPU system.
INFO:tensorflow:Querying Tensorflow master (grpc://10.94.141.2:8470) for TPU system metadata.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, 3043255326141030630)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 4934673428699722086)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 5293267956941169635)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 8847239818130559550)
INFO:tensorflow:*** Available Device: _DeviceAttributes(

In [5]:
# Create a bucket in GCP to store datasets and models.
bucket_address = "gs://mlproj_0"
assert bucket_address, 'Must specify an existing GCS bucket name'
print('Using bucket: {}'.format(bucket_address))

auth.authenticate_user()

Using bucket: gs://mlproj_0


The pretrained BERT model is downloaded, along with the back translation model provided by the UDA code. The IMDb dataset is then converted to a more usable format using `utils/imdb_format.py` function. Since the IMDb dataset is rather large with huge number of files, the conversion is done on the local machine. The resulting files `test.csv` and `train.csv` will be used for training.

In [None]:
OUTPUT_DIR = 'gs://mlproj_0/bert_pretrained'
# tf.gfile.MakeDirs(OUTPUT_DIR)
# ! mkdir gs://mlproj_0/bert_pretrained
# # download bert base
# ! wget -P gs://mlproj_0/bert_pretrained https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
# ! unzip gs://mlproj_0/bert_pretrained/uncased_L-12_H-768_A-12.zip 
# ! rm gs://mlproj_0bert_pretrained/uncased_L-12_H-768_A-12.zip
# ! mkdir gs://mlproj_0/bert_pretrained/bert_base
# ! mv uncased_L-12_H-768_A-12 gs://mlproj_0/bert_pretrained/bert_base
# ! mv gs://mlproj_0/bert_pretrained/bert_base/uncased*/* gs://mlproj_0/bert_pretrained/bert_base/
# os.listdir(drive_path + '/bert_pretrained/bert_base/')

In [None]:
#!mkdir /content/drive/My\ Drive/W266/Project/data
#!mkdir /content/drive/My\ Drive/W266/Project/data/back_translation
#!wget /content/drive/My\ Drive/W266/Project/data/back_translation https://storage.googleapis.com/uda_model/text/imdb_back_trans.zip
#!unzip /content/drive/My\ Drive/W266/Project/data/back_translation/imdb_back_trans.zip 
#!mkdir /content/drive/My\ Drive/W266/Project/data/IMDB_raw
#!wget /content/drive/My\ Drive/W266/Project/data/IMDB_raw https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
# !tar xzvf /content/drive/My\ Drive/MLproj/data/IMDB_raw/aclImdb_v1.tar.gz -C /content/drive/My\ Drive/W266/Project/data/IMDB_raw/

#!python /content/drive/My\ Drive/W266/Project/utils/imdb_format.py --raw_data_dir=/content/drive/My\ Drive/W266/Project/data/IMDB_raw/AcLImdb --train_id_path=/content/drive/My\ Drive/W266/Project/data/IMDB_raw/train_id_list.txt --output_dir=/content/drive/My\ Drive/W266/Project/data/IMDB_raw/csv
x

In [None]:
print(bucket_address)
print(tpu_address)

gs://mlproj_0
grpc://10.67.184.146:8470


In [None]:
!python3 /content/drive/My\ Drive/MLproj/preprocess.py \
  --task_name=20news \
  --raw_data_dir=gs://mlproj_0/data/proc_data/20news \
  --output_base_dir=gs://mlproj_0/data/proc_data/20news/unsup \
  --data_type=unsup \
  --sub_set=unsup_in \
  --aug_ops=tf_idf-0.7 \
  --aug_copy_num=0 \
  --max_seq_length=128 \
  --vocab_file=/content/drive/My\ Drive/MLproj/pretrained_models/bert_base/vocab.txt \

Output hidden; open in https://colab.research.google.com to view.

# running

fix with sgd 9000

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main_fix.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=6 \
  --model_dir=gs://mlproj_0/ckpt/news_9_fix_SGD \
  --num_train_steps=9000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=400 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=tf_idf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \




W0506 00:28:30.478863 140506364635008 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:178: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0506 00:28:30.479045 140506364635008 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:178: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0506 00:28:30.479209 140506364635008 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0506 00:28:30.677878 140506364635008 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:188: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0506 00:28:31.141680 140506364635008 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:191: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFil

fix with sgd next 9000

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main_fix.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=6 \
  --model_dir=gs://mlproj_0/ckpt/news_9_fix_SGD \
  --num_train_steps=9000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=400 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=tf_idf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \




W0506 03:03:17.925849 139876287694720 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:178: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0506 03:03:17.926016 139876287694720 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:178: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0506 03:03:17.926177 139876287694720 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0506 03:03:18.480052 139876287694720 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:188: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0506 03:03:18.569794 139876287694720 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:191: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFil

fix with sgd next 20000

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main_fix.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=5 \
  --model_dir=gs://mlproj_0/ckpt/news_9_fix_SGD \
  --num_train_steps=20000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=400 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=tf_idf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \




W0506 05:10:48.026136 139978373556096 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:178: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0506 05:10:48.026318 139978373556096 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:178: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0506 05:10:48.026485 139978373556096 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0506 05:10:48.239673 139978373556096 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:188: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0506 05:10:48.323203 139978373556096 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:191: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFil

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=False \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=4 \
  --model_dir=gs://mlproj_0/ckpt/base_uda \
  --num_train_steps=6000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=400 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=tf_idf-0.7 \
  --aug_copy=1 \
  --uda_coeff=1 \




W0503 23:10:14.022547 140335273072512 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0503 23:10:14.022789 140335273072512 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0503 23:10:14.023034 140335273072512 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0503 23:10:14.278149 140335273072512 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0503 23:10:14.373395 140335273072512 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=4 \
  --model_dir=gs://mlproj_0/ckpt/news_9 \
  --num_train_steps=6000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=400 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=tf_idf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \
# --train_batch_size=8 \
# --train_batch_size=4 \




W0503 20:03:06.307394 140071658047360 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0503 20:03:06.307621 140071658047360 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0503 20:03:06.307824 140071658047360 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0503 20:03:06.500219 140071658047360 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0503 20:03:07.053009 140071658047360 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

fixmatch

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main_fix.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=4 \
  --model_dir=gs://mlproj_0/ckpt/news_9_test \
  --num_train_steps=6000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=400 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=tf_idf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \
# --train_batch_size=8 \
# --train_batch_size=4 \




W0505 10:01:44.750872 139810620811136 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:178: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0505 10:01:44.751105 139810620811136 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:178: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0505 10:01:44.751343 139810620811136 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0505 10:01:44.986563 139810620811136 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:188: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0505 10:01:45.082596 139810620811136 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:191: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFil

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main_fix.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=6 \
  --model_dir=gs://mlproj_0/ckpt/news_9_fix \
  --num_train_steps=9000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=400 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=tf_idf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \




W0505 20:10:45.048820 140006851504000 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:178: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0505 20:10:45.049064 140006851504000 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:178: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0505 20:10:45.049322 140006851504000 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0505 20:10:45.215125 140006851504000 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:188: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0505 20:10:45.297553 140006851504000 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:191: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFil

test for news 0.9 unsurpervised learning fixmatch sgd

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main_fix.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=6 \
  --model_dir=gs://mlproj_0/ckpt/news_9_fix \
  --num_train_steps=9000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=400 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=tf_idf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main_fix.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=5 \
  --model_dir=gs://mlproj_0/ckpt/news_9_fix \
  --num_train_steps=5000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=500 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=tf_idf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \




W0505 06:21:48.871197 139970328373120 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:178: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0505 06:21:48.871814 139970328373120 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:178: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0505 06:21:48.872291 139970328373120 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0505 06:21:49.146124 139970328373120 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:188: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0505 06:21:49.277007 139970328373120 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:191: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFil

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main_fix.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=5 \
  --model_dir=gs://mlproj_0/ckpt/news_9_fix_test \
  --num_train_steps=5000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=500 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=tf_idf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \




W0505 09:21:12.169082 140534490830720 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:178: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0505 09:21:12.169289 140534490830720 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:178: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0505 09:21:12.169525 140534490830720 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0505 09:21:12.380596 140534490830720 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:188: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0505 09:21:12.472637 140534490830720 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:191: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFil

# tf-idf0.6 20news

20news preprocess tfidf0.6

In [None]:
!python3 /content/drive/My\ Drive/MLproj/preprocess.py \
  --task_name=20news \
  --raw_data_dir=gs://mlproj_0/data/proc_data/20news \
  --output_base_dir=gs://mlproj_0/data/proc_data/20news/unsup \
  --data_type=unsup \
  --sub_set=unsup_in \
  --aug_copy_num=0 \
  --max_seq_length=128 \
  --vocab_file=/content/drive/My\ Drive/MLproj/pretrained_models/bert_base/vocab.txt \
  --aug_ops=tf_idf-0.6 \

Output hidden; open in https://colab.research.google.com to view.

20news run tfidf0.6

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=15 \
  --model_dir=gs://mlproj_0/ckpt/news_tf6 \
  --num_train_steps=18000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=500 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=tf_idf-0.6 \
  --aug_copy=1 \
  --uda_coeff=1 \




W0508 08:41:36.447047 140571949410176 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0508 08:41:36.447339 140571949410176 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0508 08:41:36.447552 140571949410176 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0508 08:41:36.586365 140571949410176 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0508 08:41:37.030520 140571949410176 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

# 20news bt

20news bt preprocess

In [None]:
!python3 /content/drive/My\ Drive/MLproj/preprocess1.py \
  --task_name=20news \
  --raw_data_dir=gs://mlproj_0/data/proc_data/20news \
  --output_base_dir=gs://mlproj_0/data/proc_data/20news/unsup_bt \
  --data_type=unsup \
  --sub_set=unsup_fix \
  --aug_copy_num=0 \
  --max_seq_length=128 \
  --vocab_file=/content/drive/My\ Drive/MLproj/pretrained_models/bert_base/vocab.txt \
  --aug_ops=tf-0.9 \


W0507 16:22:22.214705 140573843003264 module_wrapper.py:139] From /content/drive/My Drive/MLproj/utils/tokenization.py:40: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0507 16:22:22.284399 140573843003264 module_wrapper.py:139] From /content/drive/My Drive/MLproj/preprocess1.py:583: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

INFO:tensorflow:Create unsup. data: subset unsup_fix => gs://mlproj_0/data/proc_data/20news/unsup_bt/tf-0.9/0
I0507 16:22:22.284620 140573843003264 preprocess1.py:584] Create unsup. data: subset unsup_fix => gs://mlproj_0/data/proc_data/20news/unsup_bt/tf-0.9/0
INFO:tensorflow:random seed: 27227
I0507 16:22:22.284818 140573843003264 preprocess1.py:471] random seed: 27227
INFO:tensorflow:getting examples
I0507 16:22:22.284952 140573843003264 preprocess1.py:473] getting examples

W0507 16:22:22.285122 140573843003264 module_wrapper.py:139] From /content/drive/My Drive/MLproj/utils/raw_dat

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



INFO:tensorflow:finished tokenizing example 10000
I0507 16:22:47.945415 140573843003264 preprocess1.py:164] finished tokenizing example 10000
INFO:tensorflow:number of examples to process: 11314
I0507 16:22:50.449308 140573843003264 preprocess1.py:177] number of examples to process: 11314
INFO:tensorflow:processing 0
I0507 16:22:50.449520 140573843003264 preprocess1.py:190] processing 0
INFO:tensorflow:*** Example ***
I0507 16:22:50.449880 140573843003264 preprocess1.py:265] *** Example ***
INFO:tensorflow:guid: unsup_in-0
I0507 16:22:50.449943 140573843003264 preprocess1.py:266] guid: unsup_in-0
INFO:tensorflow:tokens: [CLS] i was wondering if anyone out there could en ##light ##en me on this car i saw the other day . it was a 2 - door sports car , looked to be from the late 60s / early 70s . it was called a brick ##lin . the doors were really small . in addition , the front bumper was separate from the rest of the body . this is all i know . if anyone can tell ##me a model name , eng

20news bt preprocess tokenized


W0508 03:19:29.792361 140685010528128 module_wrapper.py:139] From /content/drive/My Drive/MLproj/utils/tokenization.py:40: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0508 03:19:29.861152 140685010528128 module_wrapper.py:139] From /content/drive/My Drive/MLproj/preprocess1.py:584: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

INFO:tensorflow:Create unsup. data: subset unsup_in => gs://mlproj_0/data/proc_data/20news/unsup_bt/tf-0.1/0
I0508 03:19:29.861371 140685010528128 preprocess1.py:585] Create unsup. data: subset unsup_in => gs://mlproj_0/data/proc_data/20news/unsup_bt/tf-0.1/0
INFO:tensorflow:random seed: 81295
I0508 03:19:29.861531 140685010528128 preprocess1.py:471] random seed: 81295
INFO:tensorflow:getting examples
I0508 03:19:29.861604 140685010528128 preprocess1.py:473] getting examples

W0508 03:19:29.861736 140685010528128 module_wrapper.py:139] From /content/drive/My Drive/MLproj/utils/raw_data_

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



20news bt preprocess unif

In [None]:
!python3 /content/drive/My\ Drive/MLproj/preprocess1.py \
  --task_name=20news \
  --raw_data_dir=gs://mlproj_0/data/proc_data/20news \
  --output_base_dir=gs://mlproj_0/data/proc_data/20news/unsup_bt \
  --data_type=unsup \
  --sub_set=unsup_in \
  --aug_ops=unif-0.7 \
  --aug_copy_num=0 \
  --max_seq_length=128 \
  --vocab_file=/content/drive/My\ Drive/MLproj/pretrained_models/bert_base/vocab.txt \

Output hidden; open in https://colab.research.google.com to view.

20news bt preprocess test

In [None]:
!python3 /content/drive/My\ Drive/MLproj/preprocess.py \
  --task_name=20news \
  --raw_data_dir=gs://mlproj_0/data/20news \
  --output_base_dir=gs://mlproj_0/data/proc_data/20news/dev1 \
  --data_type=sup \
  --sub_set=dev \
  --max_seq_length=128 \
  --vocab_file=/content/drive/My\ Drive/MLproj/pretrained_models/bert_base/vocab.txt \

Output hidden; open in https://colab.research.google.com to view.

20news bt preprocess train

In [None]:
!python3 /content/drive/My\ Drive/MLproj/preprocess.py \
  --task_name=20news \
  --raw_data_dir=gs://mlproj_0/data/20news \
  --output_base_dir=gs://mlproj_0/data/proc_data/20news/train_25_2 \
  --data_type=sup \
  --sub_set=train \
  --sup_size=500 \
  --max_seq_length=128 \
  --vocab_file=/content/drive/My\ Drive/MLproj/pretrained_models/bert_base/vocab.txt \
  $@

Output hidden; open in https://colab.research.google.com to view.

20news bt old translate 0.67

20news bt new translate (without EU commission)

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup_bt \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=10 \
  --model_dir=gs://mlproj_0/ckpt/news_bt_2 \
  --num_train_steps=15000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=500 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=tf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \




W0507 18:14:53.547812 140494460725120 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0507 18:14:53.547977 140494460725120 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0507 18:14:53.548161 140494460725120 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0507 18:14:53.829323 140494460725120 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0507 18:14:54.257458 140494460725120 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

unif-0.7 20news bt new translate (without EU commission)

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup_bt \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=10 \
  --model_dir=gs://mlproj_0/ckpt/news_bt_tonken \
  --num_train_steps=15000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=500 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=unif-0.7 \
  --aug_copy=1 \
  --uda_coeff=1 \




W0508 03:57:58.248060 140607913150336 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0508 03:57:58.248283 140607913150336 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0508 03:57:58.248481 140607913150336 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0508 03:57:58.467777 140607913150336 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0508 03:57:58.543705 140607913150336 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup_bt \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=10 \
  --model_dir=gs://mlproj_0/ckpt/news_bt_tonken \
  --num_train_steps=15000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=0 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=unif-0.7 \
  --aug_copy=1 \
  --uda_coeff=1 \




W0508 07:18:24.971225 140627992803200 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0508 07:18:24.971451 140627992803200 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0508 07:18:24.971648 140627992803200 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0508 07:18:25.187232 140627992803200 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0508 07:18:25.259486 140627992803200 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

20news bt new translate (without EU commission) and old train/test, new tokenized unsup

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=False \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup_bt \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=10 \
  --model_dir=gs://mlproj_0/ckpt/news_bt_tonken \
  --num_train_steps=15000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=500 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=tf-0.1 \
  --aug_copy=1 \
  --uda_coeff=1 \




W0508 03:06:38.816224 139870578509696 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0508 03:06:38.816488 139870578509696 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0508 03:06:38.816714 139870578509696 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0508 03:06:39.057621 139870578509696 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0508 03:06:39.141387 139870578509696 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

20news bt new translate (without EU commission) and new train with new test

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25_2 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup_bt \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev1 \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=10 \
  --model_dir=gs://mlproj_0/ckpt/news_bt_newtrain \
  --num_train_steps=15000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=500 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=tf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \




W0507 23:40:45.605551 140607693719424 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0507 23:40:45.605710 140607693719424 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0507 23:40:45.605885 140607693719424 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0507 23:40:45.790201 140607693719424 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0507 23:40:46.168916 140607693719424 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

Evaluate with new test

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=False \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup_bt \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev1 \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=6 \
  --model_dir=gs://mlproj_0/ckpt/news_bt_1 \
  --num_train_steps=9000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=500 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=tf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \




W0507 23:25:21.089628 140379994064768 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0507 23:25:21.089820 140379994064768 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0507 23:25:21.090012 140379994064768 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0507 23:25:21.273378 140379994064768 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0507 23:25:21.345231 140379994064768 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=False \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup_bt \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=6 \
  --model_dir=gs://mlproj_0/ckpt/news_bt_1 \
  --num_train_steps=9000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=500 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=tf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \




W0507 23:15:15.348609 139772764678016 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0507 23:15:15.348766 139772764678016 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0507 23:15:15.348942 139772764678016 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0507 23:15:15.524694 139772764678016 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0507 23:15:15.620556 139772764678016 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

train with old test

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup_bt \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=6 \
  --model_dir=gs://mlproj_0/ckpt/news_bt_1 \
  --num_train_steps=6000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=600 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=tf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \




W0507 16:24:36.643861 140412084275072 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0507 16:24:36.644088 140412084275072 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0507 16:24:36.644279 140412084275072 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0507 16:24:36.803280 140412084275072 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0507 16:24:37.378227 140412084275072 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup_bt \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=6 \
  --model_dir=gs://mlproj_0/ckpt/news_bt_1 \
  --num_train_steps=9000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=600 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=tf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \




W0507 17:19:01.248446 140177068857216 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0507 17:19:01.248632 140177068857216 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0507 17:19:01.248809 140177068857216 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0507 17:19:01.449001 140177068857216 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0507 17:19:01.520021 140177068857216 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

20news bt new translate (without EU commission) fix adam

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main_fix.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup_bt \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=6 \
  --model_dir=gs://mlproj_0/ckpt/news_bt_fix \
  --num_train_steps=9000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=500 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=tf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main_fix.py \
  --use_tpu=True \
  --do_train=False \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/20news/unsup_bt \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev1 \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=6 \
  --model_dir=gs://mlproj_0/ckpt/news_bt_fix \
  --num_train_steps=9000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=500 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --unsup_ratio=3 \
  --tsa=linear_schedule \
  --aug_ops=tf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \




W0507 21:05:20.455990 140718249248640 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:178: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0507 21:05:20.456172 140718249248640 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:178: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0507 21:05:20.456351 140718249248640 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0507 21:05:20.663013 140718249248640 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:188: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0507 21:05:20.738924 140718249248640 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main_fix.py:191: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFil

Baseline

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25 \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=4 \
  --model_dir=gs://mlproj_0/ckpt/news_all \
  --num_train_steps=5000 \
  --learning_rate=3e-05 \
  --num_warmup_steps=500 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \




W0503 19:03:43.662999 139816730118016 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0503 19:03:43.663225 139816730118016 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0503 19:03:43.663414 139816730118016 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0503 19:03:43.976674 139816730118016 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0503 19:03:44.065408 139816730118016 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

baseline with new train

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/20news/train_25_1 \
  --eval_data_dir=gs://mlproj_0/data/proc_data/20news/dev1 \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=20news \
  --save_checkpoints_num=4 \
  --model_dir=gs://mlproj_0/ckpt/news_all_new \
  --num_train_steps=5000 \
  --learning_rate=3e-05 \
  --num_warmup_steps=500 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \




W0508 01:03:45.739513 140013455447936 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0508 01:03:45.739681 140013455447936 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0508 01:03:45.739867 140013455447936 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0508 01:03:45.901908 140013455447936 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0508 01:03:46.297903 140013455447936 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

In [None]:
!python /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/AG1000/train_1000 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/AG1000/unsup5 \
  --eval_data_dir=gs://mlproj_0/data/proc_data/AG1000/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=ag1000 \
  --model_dir=gs://mlproj_0/ckpt/uda_t1000_u50k_tf9 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --num_train_steps=600 \
  --learning_rate=2e-05 \
  --num_warmup_steps=60 \
  --save_checkpoints_num=4 \
  --unsup_ratio=3 \
  --train_batch_size=8 \
  --tsa=linear_schedule \
  --aug_ops=tf_idf-0.8 \
  --aug_copy=1 \
  --uda_coeff=1 \




W0502 08:00:53.669609 139927799662464 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0502 08:00:53.669780 139927799662464 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0502 08:00:53.669960 139927799662464 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0502 08:00:53.895204 139927799662464 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0502 08:00:53.975876 139927799662464 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

In [None]:
!python /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/AG1000/train_1000 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/AG1000/unsup5 \
  --eval_data_dir=gs://mlproj_0/data/proc_data/AG1000/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=ag1000 \
  --model_dir=gs://mlproj_0/ckpt/uda_t1000_u50k_tf9 \
  --max_seq_length=128 \
  --tpu_name=$tpu_address \
  --num_train_steps=3700 \
  --learning_rate=2e-05 \
  --num_warmup_steps=370 \
  --save_checkpoints_num=2 \
  --unsup_ratio=3 \
  --train_batch_size=8 \
  --tsa=linear_schedule \
  --aug_ops=tf_idf-0.8 \
  --aug_copy=1 \
  --uda_coeff=1 \




W0502 12:07:11.231901 140286837073792 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0502 12:07:11.232121 140286837073792 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0502 12:07:11.232339 140286837073792 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0502 12:07:11.534437 140286837073792 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0502 12:07:11.640637 140286837073792 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

train_40, unsup 80k

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/AG1000/train_40 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/AG1000/unsup8 \
  --eval_data_dir=gs://mlproj_0/data/proc_data/AG1000/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=ag1000 \
  --model_dir=gs://mlproj_0/ckpt/uda_t40_u80k_tf9 \
  --max_seq_length=128 \
  --tpu_name=$TPU_NAME \
  --num_train_steps=10000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=1000 \
  --save_checkpoints_num=10 \
  --unsup_ratio=3 \
  --train_batch_size=8 \
  --eval_batch_size=32 \
  --tsa=linear_schedule \
  --aug_ops=tf_idf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \
  $@




W0502 15:00:04.163624 140088081844096 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0502 15:00:04.163822 140088081844096 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0502 15:00:04.164000 140088081844096 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0502 15:00:04.595355 140088081844096 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0502 15:00:04.677976 140088081844096 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/AG1000/train_40 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/AG1000/unsup8 \
  --eval_data_dir=gs://mlproj_0/data/proc_data/AG1000/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=ag1000 \
  --model_dir=gs://mlproj_0/ckpt/uda_t40_u80k_tf9 \
  --max_seq_length=128 \
  --tpu_name=$TPU_NAME \
  --num_train_steps=11000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=1100 \
  --save_checkpoints_num=10 \
  --unsup_ratio=3 \
  --train_batch_size=8 \
  --eval_batch_size=32 \
  --tsa=linear_schedule \
  --aug_ops=tf_idf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \
  $@




W0502 17:43:01.807129 140460146300800 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0502 17:43:01.807330 140460146300800 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0502 17:43:01.807520 140460146300800 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0502 17:43:02.080708 140460146300800 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0502 17:43:02.194985 140460146300800 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/AG1000/train_40 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/AG1000/unsup8 \
  --eval_data_dir=gs://mlproj_0/data/proc_data/AG1000/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=ag1000 \
  --model_dir=gs://mlproj_0/ckpt/uda_t40_u80k_tf9 \
  --max_seq_length=128 \
  --tpu_name=$TPU_NAME \
  --num_train_steps=20000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=2000 \
  --save_checkpoints_num=10 \
  --unsup_ratio=3 \
  --train_batch_size=8 \
  --eval_batch_size=32 \
  --tsa=linear_schedule \
  --aug_ops=tf_idf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \
  $@




W0502 18:49:39.696171 140088662022016 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0502 18:49:39.696353 140088662022016 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0502 18:49:39.696531 140088662022016 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0502 18:49:39.872024 140088662022016 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0502 18:49:39.954168 140088662022016 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

train40, unsup all

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/AG1000/train_40 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/AG1000/unsup \
  --eval_data_dir=gs://mlproj_0/data/proc_data/AG1000/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=ag1000 \
  --model_dir=gs://mlproj_0/ckpt/uda_t40_uall_tf9 \
  --max_seq_length=128 \
  --tpu_name=$TPU_NAME \
  --num_train_steps=40000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=4000 \
  --save_checkpoints_num=16 \
  --unsup_ratio=3 \
  --eval_batch_size=32 \
  --tsa=linear_schedule \
  --aug_ops=tf_idf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \
  --train_batch_size=8 \




W0506 10:08:47.362271 139709988710272 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0506 10:08:47.362457 139709988710272 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0506 10:08:47.362630 139709988710272 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0506 10:08:47.513147 139709988710272 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0506 10:08:47.594380 139709988710272 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

In [None]:
!python3 /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=True \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/AG1000/train_40 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/AG1000/unsup \
  --eval_data_dir=gs://mlproj_0/data/proc_data/AG1000/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=ag1000 \
  --model_dir=gs://mlproj_0/ckpt/uda_t40_uall_tf9 \
  --max_seq_length=128 \
  --tpu_name=$TPU_NAME \
  --num_train_steps=40000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=4000 \
  --save_checkpoints_num=16 \
  --unsup_ratio=3 \
  --eval_batch_size=32 \
  --tsa=linear_schedule \
  --aug_ops=tf_idf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \
  --train_batch_size=8 \

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
I0506 17:11:46.565225 140270893934336 tpu_estimator.py:551] Outfeed thread finished, shutting down.
INFO:tensorflow:outfeed marked as finished
I0506 17:11:46.565469 140275764799360 error_handling.py:101] outfeed marked as finished
INFO:tensorflow:Shutdown TPU system.
I0506 17:11:46.565661 140275764799360 tpu_estimator.py:616] Shutdown TPU system.
INFO:tensorflow:Loss for final step: 0.068892.
I0506 17:11:48.374086 140275764799360 estimator.py:371] Loss for final step: 0.068892.
INFO:tensorflow:training_loop marked as finished
I0506 17:11:48.374377 140275764799360 error_handling.py:101] training_loop marked as finished
INFO:tensorflow:*** Running evaluation ***
I0506 17:11:48.374493 140275764799360 main.py:288] *** Running evaluation ***
INFO:tensorflow:Calling model_fn.
I0506 17:11:48.549910 140275764799360 estimator.py:1148] Calling model_fn.
W0506 17:11:48.573835 140275764799360 ag_logging.py:146] Entity <function get_e

In [8]:
!python3 /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=False \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/AG1000/train_1000 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/AG1000/unsup5 \
  --eval_data_dir=gs://mlproj_0/data/proc_data/AG1000/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=ag1000 \
  --model_dir=gs://mlproj_0/ckpt/uda_t1000_u50k_tf9 \
  --max_seq_length=128 \
  --tpu_name=$TPU_NAME \
  --num_train_steps=40000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=4000 \
  --save_checkpoints_num=16 \
  --unsup_ratio=3 \
  --eval_batch_size=32 \
  --tsa=linear_schedule \
  --aug_ops=tf_idf-0.9 \
  --aug_copy=1 \
  --uda_coeff=1 \
  --train_batch_size=8 \




W0509 08:53:16.196018 139932010006400 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0509 08:53:16.196206 139932010006400 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0509 08:53:16.196375 139932010006400 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0509 08:53:16.353045 139932010006400 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0509 08:53:16.404841 139932010006400 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

In [9]:
!python3 /content/drive/My\ Drive/MLproj/main.py \
  --use_tpu=True \
  --do_train=False \
  --do_eval=True \
  --sup_train_data_dir=gs://mlproj_0/data/proc_data/AG1000/train_1000 \
  --unsup_data_dir=gs://mlproj_0/data/proc_data/AG1000/unsup4 \
  --eval_data_dir=gs://mlproj_0/data/proc_data/AG1000/dev \
  --bert_config_file=gs://mlproj_0/bert_pretrained/bert_base/bert_config.json \
  --vocab_file=gs://mlproj_0/bert_pretrained/bert_base/vocab.txt \
  --init_checkpoint=gs://mlproj_0/bert_pretrained/bert_base/bert_model.ckpt \
  --task_name=ag1000 \
  --model_dir=gs://mlproj_0/ckpt/uda_t1000_u40k_tf8 \
  --max_seq_length=128 \
  --tpu_name=$TPU_NAME \
  --num_train_steps=40000 \
  --learning_rate=2e-05 \
  --num_warmup_steps=4000 \
  --save_checkpoints_num=16 \
  --unsup_ratio=3 \
  --eval_batch_size=32 \
  --tsa=linear_schedule \
  --aug_ops=tf_idf-0.8 \
  --aug_copy=1 \
  --uda_coeff=1 \
  --train_batch_size=8 \




W0509 08:59:04.228083 140004943918976 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.


W0509 08:59:04.228288 140004943918976 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:177: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.


W0509 08:59:04.228455 140004943918976 module_wrapper.py:139] From /content/drive/My Drive/MLproj/bert/modeling.py:91: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.


W0509 08:59:04.434479 140004943918976 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:187: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.


W0509 08:59:04.522319 140004943918976 module_wrapper.py:139] From /content/drive/My Drive/MLproj/main.py:190: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.


W05

In [None]:
# from typing import List

# from google.cloud import storage


# def set_bucket_public_iam(
#     bucket_name: str = "gs://mlproj_0",
#     members: List[str] = ["allUsers"],
# ):
#     """Set a public IAM Policy to bucket"""
#     # bucket_name = "your-bucket-name"

#     storage_client = storage.Client()
#     bucket = storage_client.bucket(bucket_name)

#     policy = bucket.get_iam_policy()
#     policy.bindings.append(
#         {"role": "roles/storage.objectViewer", "members": members}
#     )

#     bucket.set_iam_policy(policy)

#     print("Bucket {} is now publicly readable".format(bucket.name))
# set_bucket_public_iam()