Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[PERFORMANCE] [v1.x] Layer normalization code from Marian for CPU #19601

Merged
merged 13 commits into from
Jan 5, 2021

Commits on Nov 30, 2020

  1. Layer normalization code from Marian

    Kenneth Heafield committed Nov 30, 2020
    Configuration menu
    Copy the full SHA
    2f87167 View commit details
    Browse the repository at this point in the history
  2. Remove MKL version of LayerNorm.

    Experiment with OMP_NUM_THREADS=4, times in s, c5.12xlarge
    
    |batchxchanne| New code | MKL      |
    |    1x   32 | 0.0000288| 0.0000278|
    |  128x   32 | 0.0000308| 0.0000311|
    | 2560x   32 | 0.0000712| 0.0000672|
    | 4096x   32 | 0.0000946| 0.0000910|
    | 8192x   32 | 0.0001597| 0.0001523|
    |16384x   32 | 0.0002905| 0.0002619|
    |    1x   64 | 0.0000264| 0.0000256|
    |  128x   64 | 0.0000339| 0.0000330|
    | 2560x   64 | 0.0000829| 0.0000972|
    | 4096x   64 | 0.0001137| 0.0001356|
    | 8192x   64 | 0.0002027| 0.0002435|
    |16384x   64 | 0.0003715| 0.0004639|
    |    1x  128 | 0.0000262| 0.0000263|
    |  128x  128 | 0.0000325| 0.0000389|
    | 2560x  128 | 0.0001074| 0.0001580|
    | 4096x  128 | 0.0001505| 0.0002336|
    | 8192x  128 | 0.0002861| 0.0004481|
    |16384x  128 | 0.0005648| 0.0008613|
    |    1x  256 | 0.0000273| 0.0000276|
    |  128x  256 | 0.0000390| 0.0000431|
    | 2560x  256 | 0.0001533| 0.0002811|
    | 4096x  256 | 0.0002258| 0.0004300|
    | 8192x  256 | 0.0004300| 0.0008464|
    |16384x  256 | 0.0010436| 0.0017613|
    |    1x  512 | 0.0000256| 0.0000302|
    |  128x  512 | 0.0000408| 0.0000551|
    | 2560x  512 | 0.0002444| 0.0005225|
    | 4096x  512 | 0.0003828| 0.0008147|
    | 8192x  512 | 0.0008832| 0.0017192|
    |16384x  512 | 0.0058463| 0.0074497|
    |    1x  768 | 0.0000252| 0.0000308|
    |  128x  768 | 0.0000450| 0.0000676|
    | 2560x  768 | 0.0003440| 0.0007719|
    | 4096x  768 | 0.0005890| 0.0013346|
    | 8192x  768 | 0.0014946| 0.0026145|
    |16384x  768 | 0.0089495| 0.0113557|
    |    1x 1024 | 0.0000285| 0.0000308|
    |  128x 1024 | 0.0000487| 0.0000786|
    | 2560x 1024 | 0.0004614| 0.0010190|
    | 4096x 1024 | 0.0008083| 0.0017376|
    | 8192x 1024 | 0.0059020| 0.0075588|
    |16384x 1024 | 0.0116553| 0.0146855|
    
    Benchmark program
    ```python
    import mxnet as mx
    import time
    
    def time_procedure(shape, count):
      data = mx.nd.random_uniform(shape=shape, low=-1.0, high = 1.0)
      factors = mx.nd.random_uniform(shape=(shape[-1],))
      mx.nd.waitall()
      begin = time.time()
      for i in range(0, count):
        out = mx.nd.LayerNorm(data, factors, factors)
        mx.nd.waitall()
      return (time.time() - begin) / count
    
    count = 200
    
    for channel in [32, 64, 128, 256, 512, 768, 1024]:
      for batch in [1, 128, 2560, 4096, 8192, 16384]:
        s = (batch, channel)
        timing = time_procedure(s, count)
        print("{:5d}x{:5d} | {:.7f}".format(s[0], s[1], timing))
    ```
    Kenneth Heafield committed Nov 30, 2020
    Configuration menu
    Copy the full SHA
    95efe8f View commit details
    Browse the repository at this point in the history

Commits on Dec 4, 2020

  1. Enable pragma omp simd on MSVC

    Kenneth Heafield committed Dec 4, 2020
    Configuration menu
    Copy the full SHA
    40d3326 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'v1.x' into layernorm

    Kenneth Heafield committed Dec 4, 2020
    Configuration menu
    Copy the full SHA
    c6b653e View commit details
    Browse the repository at this point in the history

Commits on Dec 7, 2020

  1. Fix MSVC error C3016: 'j': index variable in OpenMP 'for' statement m…

    …ust have signed integral type
    Kenneth Heafield committed Dec 7, 2020
    Configuration menu
    Copy the full SHA
    3605226 View commit details
    Browse the repository at this point in the history
  2. Try to make MSVC happy since it doesn't have ssize_t

    Kenneth Heafield committed Dec 7, 2020
    Configuration menu
    Copy the full SHA
    dcb61aa View commit details
    Browse the repository at this point in the history
  3. Change gcc 8 PPA to ppa:jonathonf/gcc

    Kenneth Heafield committed Dec 7, 2020
    Configuration menu
    Copy the full SHA
    a11dc7e View commit details
    Browse the repository at this point in the history
  4. Merge PPA fix into layernorm

    Kenneth Heafield committed Dec 7, 2020
    Configuration menu
    Copy the full SHA
    5ae7ae4 View commit details
    Browse the repository at this point in the history

Commits on Dec 21, 2020

  1. Merge branch 'v1.x' into layernorm

    Kenneth Heafield committed Dec 21, 2020
    Configuration menu
    Copy the full SHA
    606759d View commit details
    Browse the repository at this point in the history
  2. Option to use MKL version requested by @samskalicky

    Kenneth Heafield committed Dec 21, 2020
    Configuration menu
    Copy the full SHA
    2d2a91e View commit details
    Browse the repository at this point in the history
  3. Fix order if MKL override is on

    Kenneth Heafield committed Dec 21, 2020
    Configuration menu
    Copy the full SHA
    e5093eb View commit details
    Browse the repository at this point in the history

Commits on Dec 28, 2020

  1. Have CI test MKL layer norm in build_ubuntu_cpu_mkl

    Kenneth Heafield committed Dec 28, 2020
    Configuration menu
    Copy the full SHA
    a566558 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'v1.x' of https://github.com/apache/incubator-mxnet into…

    … layernorm
    Kenneth Heafield committed Dec 28, 2020
    Configuration menu
    Copy the full SHA
    eb3d9d9 View commit details
    Browse the repository at this point in the history