# SpecAugment

Spectrogram Augmentation Paper

operates on the log mel spectrogram of the input audio, rather than the raw audio itself.

it directly acts on the log mel spectrogram as if it were an image, and does not require any additional data.

SpecAugment consists of three kinds of deformations of the log mel spectrogram. "The first is time warping, a deformation of the time-series in the time direction." The other two augmentations, inspired by "Cutout", proposed in computer vision[T. DeVries and G. Taylor "Improved Regularization of Convolutional Neural Networks with Cutout" in arXiv, 2017], are time and frequency masking, where we mask a block of consecutive time stops or mel frequency channels.

---------------------------------

## Time Warping 

-------------

Time Warping is applied via the function 'sparse_image_warp' of 'tensorflow'. Given a log mel spectrogram with $\tau$ time steps. A random point along the time axis (W, t - W) is to be warped either to the left or right by a distance 'w' chosen from a uniform distribution from 0 to the time warp parameter W along that line.

W : random point    
t : length of spectrogram

a : random start point   
b : destination point   

(처음 데이터)   
0 - ############################################################################ - t  (Original)<br>
(0부터 W 사이의 임의의 값 랜덤을 w에 지정), (양쪽을 w크기 만큼 줄임)  
0 - 00000w#############################################################t-w000000 - t  (Random range, w (0,W))<br>
(w ~ t-w 사이의 임의의 a 지정)   
0 - 00000w###############################a#############################t-w000000 - t  (a in (W ~ t-W))<br>
(a 주변으로 w 거리)   
0 - 00000w###########a-w#################a###############a+w###########t-w000000 - t  ()<br>
(사이의 임의의 b점 지정)   
0 - 00000w###########a-w#####b###########a###############a+w###########t-w000000 - t  ()<br>
(a점에서 b점으로 sparse_images_warp)    
0 - 00000w###########a-w#####b##<-warp-##a###############a+w###########t-w000000 - t  ()<br>




## Code    
from [github.DemisEom/SpecAugment](https://github.com/DemisEom/SpecAugment/blob/master/SpecAugment/sparse_image_warp_np.py)

plus+ [zcaceres/spec_augment](https://github.com/zcaceres/spec_augment)

### Main Function


-------------------------------------------------------

#### Import

In [None]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np
import scipy as sp
import skimage
from scipy.interpolate import interp2d
from skimage.transform import warp

#### Get grid locations

In [None]:
def _get_grid_locations(image_height, image_width):
  """Wrapper for np.meshgrid."""

  y_range = np.linspace(0, image_height - 1, image_height)
  x_range = np.linspace(0, image_width - 1, image_width)
  y_grid, x_grid = np.meshgrid(y_range, x_range, indexing='ij')
  return np.stack((y_grid, x_grid), -1)

#### Expand to minibatch

In [None]:
def _expand_to_minibatch(np_array, batch_size):
  """Tile arbitrarily-sized np_array to include new batch dimension."""
  tiles = [batch_size] + [1] * np_array.ndim
  return np.tile(np.expand_dims(np_array, 0), tiles)

#### Get boundary locations

In [None]:
def _get_boundary_locations(image_height, image_width, num_points_per_edge):
  """Compute evenly-spaced indices along edge of image."""
  y_range = np.linspace(0, image_height - 1, num_points_per_edge + 2)
  x_range = np.linspace(0, image_width - 1, num_points_per_edge + 2)
  ys, xs = np.meshgrid(y_range, x_range, indexing='ij')
  is_boundary = np.logical_or(
      np.logical_or(xs == 0, xs == image_width - 1),
      np.logical_or(ys == 0, ys == image_height - 1))
  return np.stack([ys[is_boundary], xs[is_boundary]], axis=-1)

#### Add zero flow controls at boundary

In [None]:
def _add_zero_flow_controls_at_boundary(control_point_locations,
                                        control_point_flows, image_height,
                                        image_width, boundary_points_per_edge):

  # batch_size = tensor_shape.dimension_value(control_point_locations.shape[0])
  batch_size = control_point_locations.shape[0]

  boundary_point_locations = _get_boundary_locations(image_height, image_width,
                                                     boundary_points_per_edge)

  boundary_point_flows = np.zeros([boundary_point_locations.shape[0], 2])

  type_to_use = control_point_locations.dtype
  # boundary_point_locations = constant_op.constant(
  #     _expand_to_minibatch(boundary_point_locations, batch_size),
  #     dtype=type_to_use)
  boundary_point_locations = _expand_to_minibatch(boundary_point_locations, batch_size)

  # boundary_point_flows = constant_op.constant(
  #     _expand_to_minibatch(boundary_point_flows, batch_size), dtype=type_to_use)
  boundary_point_flows = _expand_to_minibatch(boundary_point_flows, batch_size)

  # merged_control_point_locations = array_ops.concat(
  #     [control_point_locations, boundary_point_locations], 1)

  merged_control_point_locations = np.concatenate(
      [control_point_locations, boundary_point_locations], 1)

  # merged_control_point_flows = array_ops.concat(
  #     [control_point_flows, boundary_point_flows], 1)

  merged_control_point_flows = np.concatenate(
      [control_point_flows, boundary_point_flows], 1)

  return merged_control_point_locations, merged_control_point_flows

#### Sparese Image Warp

In [None]:
def sparse_image_warp_np(image,
                      source_control_point_locations,
                      dest_control_point_locations,
                      interpolation_order=2,
                      regularization_weight=0.0,
                      num_boundary_points=0):

  # image = ops.convert_to_tensor(image)
  # source_control_point_locations = ops.convert_to_tensor(
  #     source_control_point_locations)
  # dest_control_point_locations = ops.convert_to_tensor(
  #     dest_control_point_locations)

  control_point_flows = (
      dest_control_point_locations - source_control_point_locations)

  clamp_boundaries = num_boundary_points > 0
  boundary_points_per_edge = num_boundary_points - 1

  # batch_size, image_height, image_width, _ = image.get_shape().as_list()
  batch_size, image_height, image_width, _ = list(image.shape)

  # This generates the dense locations where the interpolant
  # will be evaluated.

  grid_locations = _get_grid_locations(image_height, image_width)

  flattened_grid_locations = np.reshape(grid_locations,
                                          [image_height * image_width, 2])

    # flattened_grid_locations = constant_op.constant(
    #     _expand_to_minibatch(flattened_grid_locations, batch_size), image.dtype)

  flattened_grid_locations = _expand_to_minibatch(flattened_grid_locations, batch_size)

  if clamp_boundaries:
    (dest_control_point_locations,
     control_point_flows) = _add_zero_flow_controls_at_boundary(
         dest_control_point_locations, control_point_flows, image_height,
         image_width, boundary_points_per_edge)

    # flattened_flows = interpolate_spline.interpolate_spline(
    #     dest_control_point_locations, control_point_flows,
    #     flattened_grid_locations, interpolation_order, regularization_weight)
  flattened_flows = sp.interpolate.spline(
        dest_control_point_locations, control_point_flows,
        flattened_grid_locations, interpolation_order, regularization_weight)

    # dense_flows = array_ops.reshape(flattened_flows,
    #                                 [batch_size, image_height, image_width, 2])
  dense_flows = np.reshape(flattened_flows,
                                    [batch_size, image_height, image_width, 2])

    # warped_image = dense_image_warp.dense_image_warp(image, dense_flows)
  warped_image = warp(image, dense_flows)

  return warped_image, dense_flows

#### Dense image warp

In [None]:
def dense_image_warp(image, flow):
    # batch_size, height, width, channels = (array_ops.shape(image)[0],
    #                                        array_ops.shape(image)[1],
    #                                        array_ops.shape(image)[2],
    #                                        array_ops.shape(image)[3])
    batch_size, height, width, channels = (np.shape(image)[0],
                                           np.shape(image)[1],
                                           np.shape(image)[2],
                                           np.shape(image)[3])

    # The flow is defined on the image grid. Turn the flow into a list of query
    # points in the grid space.
    # grid_x, grid_y = array_ops.meshgrid(
    #     math_ops.range(width), math_ops.range(height))
    # stacked_grid = math_ops.cast(
    #     array_ops.stack([grid_y, grid_x], axis=2), flow.dtype)
    # batched_grid = array_ops.expand_dims(stacked_grid, axis=0)
    # query_points_on_grid = batched_grid - flow
    # query_points_flattened = array_ops.reshape(query_points_on_grid,
    #                                            [batch_size, height * width, 2])
    grid_x, grid_y = np.meshgrid(
        np.range(width), np.range(height))
    stacked_grid = np.cast(
        np.stack([grid_y, grid_x], axis=2), flow.dtype)
    batched_grid = np.expand_dims(stacked_grid, axis=0)
    query_points_on_grid = batched_grid - flow
    query_points_flattened = np.reshape(query_points_on_grid,
                                        [batch_size, height * width, 2])
    # Compute values at the query points, then reshape the result back to the
    # image grid.
    interpolated = interp2d(image, query_points_flattened)
    interpolated = np.reshape(interpolated,
                              [batch_size, height, width, channels])
    return interpolated

In [None]:
def time_warp(spec, W=5):
    num_rows = spec.shape[1]
    spec_len = spec.shape[2]

    y = num_rows // 2
    horizontal_line_at_ctr = spec[0][y]
    # assert len(horizontal_line_at_ctr) == spec_len

    point_to_warp = horizontal_line_at_ctr[random.randrange(W, spec_len-W)]
    # assert isinstance(point_to_warp, torch.Tensor)

    # Uniform distribution from (0,W) with chance to be up to W negative
    dist_to_warp = random.randrange(-W, W)
    src_pts = torch.tensor([[[y, point_to_warp]]])
    dest_pts = torch.tensor([[[y, point_to_warp + dist_to_warp]]])
    warped_spectro, dense_flows = sparse_image_warp(spec, src_pts, dest_pts)
    return warped_spectro.squeeze(3)

## Frequency Masking
---------------------------------
Frequency masking is applied so that f consecutive mel frequency channels (f0, f0 + f) are masked, where f is first chosen from a uniform distribution from 0 to the frequency mask parameter F, and f0 is chosen from (0, v - f). v is the number of mel frequency channels.

f0 : random frequency point   
f  : masking length    
v  : end-point of frequency    
F  : Parameter

## Time Masking 
----------------------------------
Time masking is applied so that t consecutive time steps (t0, t0 + t) are masked, where t is first chosen from a uniform distribution form 0 to the time mask parameter T, and t0 is chosen from (0, $\tau$ - t).

t0 : random time point   
t : masking length   
$\tau$ : endpoint of time   
T : Parameter