Skip to content
This repository has been archived by the owner on Jul 22, 2024. It is now read-only.

ICML 2021 - Voice2Series: Reprogramming Acoustic Models for Time Series Classification


Notifications You must be signed in to change notification settings


Repository files navigation


Voice2Series: Reprogramming Acoustic Models for Time Series Classification

  • We provide an end-to-end approach (Repro. layer) to reprogram on time series data on raw waveform with a differential mel-spectrogram layer from kapre.

  • No offiline acoustic feature extraction and all layers are differentiable.


Tensorflow 2.2 (CUDA=10.0) and Kapre 0.2.0.

  • PyTorch noted: Echo to many interests from the community, we will also provide Pytorch V2S layers and frameworks around this September, incoperating the new torch audio layers. Feel free to email the authors for further reprogramming collaboration.

  • option 1 (from yml)

conda env create -f V2S.yml
  • option 2 (from clean python 3.6)
pip install tensorflow-gpu==2.1.0
pip install kapre==0.2.0
pip install h5py==2.10.0


  • Random Mapping

Please also check the paper for actual validation details. Many Thanks!

python --dataset 0 --eps 5 --mod 0
  • Result
seg idx: 1 --> start: 5000, end: 5500
seg idx: 2 --> start: 10000, end: 10500
Tensor("AddV2_2:0", shape=(None, 16000, 1), dtype=float32)
--- Preparing Masking Matrix
Model: "model_1"
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 500, 1)]     0                                            
zero_padding1d (ZeroPadding1D)  (None, 16000, 1)     0           input_1[0][0]                    
tf_op_layer_AddV2 (TensorFlowOp [(None, 16000, 1)]   0           zero_padding1d[0][0]             
zero_padding1d_1 (ZeroPadding1D (None, 16000, 1)     0           input_1[0][0]                    
tf_op_layer_AddV2_1 (TensorFlow [(None, 16000, 1)]   0           tf_op_layer_AddV2[0][0]          
zero_padding1d_2 (ZeroPadding1D (None, 16000, 1)     0           input_1[0][0]                    
tf_op_layer_AddV2_2 (TensorFlow [(None, 16000, 1)]   0           tf_op_layer_AddV2_1[0][0]        
art_layer (ARTLayer)            (None, 16000, 1)     16000       tf_op_layer_AddV2_2[0][0]        
reshape_1 (Reshape)             (None, 16000)        0           art_layer[0][0]                  
model (Model)                   (None, 36)           1292911     reshape_1[0][0]                  
Total params: 1,308,911
Trainable params: 16,000
Non-trainable params: 1,292,911
Epoch 1/5
2021-09-21 00:39:41.269756: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-09-21 00:39:41.497716: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
113/113 [==============================] - 6s 49ms/step - loss: 5.0755 - accuracy: 0.9431 - val_loss: 3.7315 - val_accuracy: 0.9985
Epoch 2/5
113/113 [==============================] - 4s 39ms/step - loss: 3.1852 - accuracy: 0.9939 - val_loss: 2.7873 - val_accuracy: 0.9902
Epoch 3/5
113/113 [==============================] - 4s 39ms/step - loss: 2.5128 - accuracy: 0.9989 - val_loss: 2.2929 - val_accuracy: 0.9985
Epoch 4/5
113/113 [==============================] - 4s 39ms/step - loss: 2.1230 - accuracy: 0.9994 - val_loss: 1.9733 - val_accuracy: 0.9992
Epoch 5/5
113/113 [==============================] - 4s 38ms/step - loss: 1.8629 - accuracy: 0.9997 - val_loss: 1.7518 - val_accuracy: 1.0000
--- Train loss: 1.7529315948486328
- Train accuracy: 1.0
--- Test loss: 1.7516217231750488
- Test accuracy: 1.0
=== Best Val. Acc:  1.0  At Epoch of  4

  • Many-to-one Label Mapping
python --dataset 0 --eps 5 --mapping 3 --mod 1
  • Results
seg idx: 0 --> start: 0, end: 500
Tensor("AddV2:0", shape=(None, 16000, 1), dtype=float32)
--- Preparing Masking Matrix
Model: "model_1"
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 500, 1)]     0                                            
zero_padding1d (ZeroPadding1D)  (None, 16000, 1)     0           input_1[0][0]                    
tf_op_layer_AddV2 (TensorFlowOp [(None, 16000, 1)]   0           zero_padding1d[0][0]             
art_layer (ARTLayer)            (None, 16000, 1)     16000       tf_op_layer_AddV2[0][0]          
reshape_1 (Reshape)             (None, 16000)        0           art_layer[0][0]                  
model (Model)                   (None, 36)           1292911     reshape_1[0][0]                  
tf_op_layer_MatMul (TensorFlowO [(None, 6)]          0           model[1][0]                      
tf_op_layer_Shape (TensorFlowOp [(2,)]               0           tf_op_layer_MatMul[0][0]         
tf_op_layer_strided_slice (Tens [()]                 0           tf_op_layer_Shape[0][0]          
tf_op_layer_Reshape_2/shape (Te [(3,)]               0           tf_op_layer_strided_slice[0][0]  
tf_op_layer_Reshape_2 (TensorFl [(None, 2, 3)]       0           tf_op_layer_MatMul[0][0]         
tf_op_layer_Mean (TensorFlowOpL [(None, 2)]          0           tf_op_layer_Reshape_2[0][0]      
Total params: 1,308,911
Trainable params: 16,000
Non-trainable params: 1,292,911
Epoch 1/5
2021-09-21 01:23:21.163046: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
2021-09-21 01:23:21.389418: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
113/113 [==============================] - 5s 48ms/step - loss: 2.0503 - accuracy: 1.0000 - val_loss: 1.3729 - val_accuracy: 1.0000
Epoch 2/5
113/113 [==============================] - 4s 40ms/step - loss: 1.1730 - accuracy: 1.0000 - val_loss: 1.0234 - val_accuracy: 1.0000
Epoch 3/5
113/113 [==============================] - 4s 40ms/step - loss: 0.9352 - accuracy: 1.0000 - val_loss: 0.8614 - val_accuracy: 1.0000
Epoch 4/5
113/113 [==============================] - 4s 40ms/step - loss: 0.8044 - accuracy: 1.0000 - val_loss: 0.7538 - val_accuracy: 1.0000
Epoch 5/5
113/113 [==============================] - 4s 39ms/step - loss: 0.7154 - accuracy: 1.0000 - val_loss: 0.6810 - val_accuracy: 1.0000
--- Train loss: 0.680957019329071
- Train accuracy: 1.0
--- Test loss: 0.6809701919555664
- Test accuracy: 1.0
=== Best Val. Acc:  1.0  At Epoch of  0

Class Activation Mapping

python --dataset 5 --weight wNo5_map6-88-0.7662.h5 --mapping 6 --layer conv2d_1

Theoretical Discussion

  • For sliced wasserstein distance mapping and theoretical analysis, we use the POT package (JMLR 2021).

  • The population risk for the target task via reprogramming a K-way source neural network classifier is upper bounded by equation above.


    1. Tips for tuning the model?

I would recommend using different label mapping numbers for training. For instance, you could use --mapping 7 for ECG 5000 dataset. The dropout rate is also an important hyperparameter for tuning the testing loss. You could use a range between 0.2 to 0.5 with --dr 4 for 0.4 dropout rate.

    1. Masking the target sequence is important?

V2S mask is provided as an option, but the training script is not using the masking for forwarding passing. From our experiments, using or not using the masking only has small variants on the performance. This is not in conflict with the proposed theoretical analysis on learning target domain adaption.

    1. Can we use Voice2Series for other domains or collaberate with the team?

Yes, you are welcome. Please send an email to the author for potential collaberation.


  • Voice2Series: Reprogramming Acoustic Models for Time Series Classification

Please consider to reference the paper if you find this work helpful or relative to your research. The draft was done in Jan 2021; the project started in Sep 2020.

  title = 	 {Voice2Series: Reprogramming Acoustic Models for Time Series Classification},
  author =       {Yang, Chao-Han Huck and Tsai, Yun-Yun and Chen, Pin-Yu},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {11808--11819},
  year = 	 {2021},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},

Additional Questions

Please open an issue here for discussion. Thank you!


ICML 2021 - Voice2Series: Reprogramming Acoustic Models for Time Series Classification







No releases published


No packages published