Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Adam Optimizer Memory Leak in Scala #14080

Closed
satyakrishnagorti opened this issue Feb 6, 2019 · 3 comments · Fixed by #14372
Closed

Adam Optimizer Memory Leak in Scala #14080

satyakrishnagorti opened this issue Feb 6, 2019 · 3 comments · Fixed by #14372

Comments

@satyakrishnagorti
Copy link
Contributor

Description

Memory leak issue while using Adam optimizer with MXNet Scala Bindings. Running the code below will keep consuming more and more memory till you run out.

Steps to Reproduce

// Simple MLP network
def mlpNetwork(): Symbol = {
    val input = Symbol.Variable("data")
    val label = Symbol.Variable("label")
    val fc1 = Symbol.FullyConnected(name = "fc1")()(Map("data" -> input, "num_hidden" -> 128))
    val act1 = Symbol.Activation(name = "relu")()(Map("data" -> fc1, "act_type" -> "relu"))
    val fc2 = Symbol.FullyConnected(name = "fc2")()(Map("data" -> act1, "num_hidden" -> 1))
    val loss = Symbol.LinearRegressionOutput(name="loss")()(Map("data" -> fc2, "label" -> label))
    loss
  }

def getNDArrayIter: NDArrayIter = {
    val f = NDArray.zeros(100, 20, 20)
    val l = NDArray.zeros(100, 1)
    val data = Array(f)
    val labels = Array(l)
    val batchSize = 10
    val iter = new NDArrayIter(data, labels, batchSize)
    iter
  }

val net = mlpNetwork()
val iter = getNDArrayIter()
val optimizer = new Adam(0.001f, 0.9f, 0.999f, 1e-8f, 1 - 1e-8f, 0f, 10f, null);
val init = new Normal(0.01f);
val model = FeedForward.newBuilder(modelSpec)
                .setContext(Array(Context.gpu(0)))
                .setInitializer(init)
                .setNumEpoch(100000)
                .setOptimizer(optimizer)
                .setTrainData(iter)
                .setEvalData(iter)
                .build();

Issue

The issue is (I think) some temporary NDArrays are not getting disposed in Adam optimizer when using disposeDepsExcept.

The places exactly where the memory leak occurs is in 3 locations where the method disposeDepsExcept is used in Adam's update method.

Temporary Fix

Replace all the 3 lines that use disposeDepsExcept in update method of Adam.scala by explicitly disposing the temporary NDArrays that were created as shown below

Instead of the 3 following lines in Adam.scala

val meanT = (beta1t * mean + (1.0 - beta1t) * resdGrad)
      .disposeDepsExcept(mean, resdGrad)

val varianceT = (beta2 * variance + (1.0f - beta2) * resdGrad * resdGrad)
      .disposeDepsExcept(variance, resdGrad)

val step = (learningRate * meanT / (NDArray.sqrt(varianceT) + epsilon))
      .disposeDepsExcept(meanT, varianceT)

Replace it by:

val beta1Mean = beta1 * mean
    val beta1ResGrad = (1.0 - beta1t) * resdGrad
    val meanT = beta1Mean + beta1ResGrad
    // dipose temp NDArrays
    betaMean.dispose()
    betaResGrad.dispose()

val beta2Variance = beta2 * variance
    val beta2ResGrad = (1.0f - beta2) * resdGrad
    val beta2ResGradSquare = beta2ResGrad * resdGrad
    val varianceT = beta2Variance + beta2ResGradSquare
    // dipose temp NDArrays
    beta2Variance.dispose()
    beta2ResGrad.dispose()
    beta2ResGradSquare.dispose()


    val lrMeanT = learningRate * meanT
    val sqrtVar = NDArray.sqrt(varianceT)
    val sqrtVarPlusEpsilon = sqrtVar + epsilon
    val step = lrMeanT / sqrtVarPlusEpsilon
    // dipose temp NDArrays
    lrMeanT.dispose()
    sqrtVar.dispose()
    sqrtVarPlusEpsilon.dispose()

The above changes fixes things for now, but for some reason disposeDepsExcept is not doing its job in this case.

Environment info (Required)

----------Python Info----------
Version      : 3.7.1
Compiler     : GCC 7.3.0
Build        : ('default', 'Dec 14 2018 19:28:38')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 18.1
Directory    : /home/satya/anaconda3/lib/python3.7/site-packages/pip
----------MXNet Info-----------
Version      : 1.3.1
Directory    : /home/satya/Documents/workspace/mxnet_1.3.x/python/mxnet
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-4.4.0-141-generic-x86_64-with-debian-stretch-sid
system       : Linux
node         : DS5
release      : 4.4.0-141-generic
version      : #167-Ubuntu SMP Wed Dec 5 10:40:15 UTC 2018
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0405 sec, LOAD: 0.6186 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1403 sec, LOAD: 0.4726 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.2418 sec, LOAD: 0.4049 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0445 sec, LOAD: 0.1894 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0779 sec, LOAD: 0.2447 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0409 sec, LOAD: 0.0746 sec.

Package used (Python/R/Scala/Julia): Scala

For Scala user, please provide:

  1. Java version: 1.8.0_201
  2. Maven version: 3.6.0
  3. Scala runtime if applicable: 2.11.6

Build info (Required if built from source)

Compiler (gcc/clang/mingw/visual studio): gcc

MXNet commit hash: 96b4b6e

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Scala, Bug

@piyushghai
Copy link
Contributor

@satyakrishnagorti Here's a documentation on how to manage memory in MXNet Scala :

https://github.com/apache/incubator-mxnet/blob/master/scala-package/memory-management.md

The first method in this document details using : ResourceScope and can help in your use case.
Do try it out and let us know the results.

@andrewfayres
Copy link
Contributor

Looks like the Adam Optimizer needs to be updated to use ResourceScope. Should probably take a look at the other optimizers at the same time. Thanks for reporting this @satyakrishnagorti.

@mxnet-label-bot add [Scala, bug]

nswamy pushed a commit that referenced this issue Mar 28, 2019
* Fixes for memory leak when reshaping executor

* Fixed Adam Optimizer memory leak

* Cleanup for PR

* Added unit test for new ResourceScope method

* Removing import that was added by overzealous ide

* Add back in an import

* Added flags for executor to know whether or not it owns NDArrays for disposal

* Moving to ResourceScope.using implementation

* Changes to make ResourceScope.using work with existing scope

* Updating ResourceScope to work with existing scopes via usingIfScopeExists method

* Fix clojure unit tests

* Fixes to be compatibile with how clojure is using ResourceScope

* Removing some unnecessary changes

* Adding scope assertion in unit test
vdantu pushed a commit to vdantu/incubator-mxnet that referenced this issue Mar 31, 2019
…e#14372)

* Fixes for memory leak when reshaping executor

* Fixed Adam Optimizer memory leak

* Cleanup for PR

* Added unit test for new ResourceScope method

* Removing import that was added by overzealous ide

* Add back in an import

* Added flags for executor to know whether or not it owns NDArrays for disposal

* Moving to ResourceScope.using implementation

* Changes to make ResourceScope.using work with existing scope

* Updating ResourceScope to work with existing scopes via usingIfScopeExists method

* Fix clojure unit tests

* Fixes to be compatibile with how clojure is using ResourceScope

* Removing some unnecessary changes

* Adding scope assertion in unit test
lanking520 pushed a commit to lanking520/incubator-mxnet that referenced this issue Apr 1, 2019
…e#14372)

* Fixes for memory leak when reshaping executor

* Fixed Adam Optimizer memory leak

* Cleanup for PR

* Added unit test for new ResourceScope method

* Removing import that was added by overzealous ide

* Add back in an import

* Added flags for executor to know whether or not it owns NDArrays for disposal

* Moving to ResourceScope.using implementation

* Changes to make ResourceScope.using work with existing scope

* Updating ResourceScope to work with existing scopes via usingIfScopeExists method

* Fix clojure unit tests

* Fixes to be compatibile with how clojure is using ResourceScope

* Removing some unnecessary changes

* Adding scope assertion in unit test
ZhennanQin pushed a commit to ZhennanQin/incubator-mxnet that referenced this issue Apr 3, 2019
…e#14372)

* Fixes for memory leak when reshaping executor

* Fixed Adam Optimizer memory leak

* Cleanup for PR

* Added unit test for new ResourceScope method

* Removing import that was added by overzealous ide

* Add back in an import

* Added flags for executor to know whether or not it owns NDArrays for disposal

* Moving to ResourceScope.using implementation

* Changes to make ResourceScope.using work with existing scope

* Updating ResourceScope to work with existing scopes via usingIfScopeExists method

* Fix clojure unit tests

* Fixes to be compatibile with how clojure is using ResourceScope

* Removing some unnecessary changes

* Adding scope assertion in unit test
nswamy pushed a commit that referenced this issue Apr 5, 2019
* Fixes for memory leak when reshaping executor

* Fixed Adam Optimizer memory leak

* Cleanup for PR

* Added unit test for new ResourceScope method

* Removing import that was added by overzealous ide

* Add back in an import

* Added flags for executor to know whether or not it owns NDArrays for disposal

* Moving to ResourceScope.using implementation

* Changes to make ResourceScope.using work with existing scope

* Updating ResourceScope to work with existing scopes via usingIfScopeExists method

* Fix clojure unit tests

* Fixes to be compatibile with how clojure is using ResourceScope

* Removing some unnecessary changes

* Adding scope assertion in unit test
lanking520 added a commit that referenced this issue Apr 5, 2019
* Fixes for memory leak when reshaping executor

* Fixed Adam Optimizer memory leak

* Cleanup for PR

* Added unit test for new ResourceScope method

* Removing import that was added by overzealous ide

* Add back in an import

* Added flags for executor to know whether or not it owns NDArrays for disposal

* Moving to ResourceScope.using implementation

* Changes to make ResourceScope.using work with existing scope

* Updating ResourceScope to work with existing scopes via usingIfScopeExists method

* Fix clojure unit tests

* Fixes to be compatibile with how clojure is using ResourceScope

* Removing some unnecessary changes

* Adding scope assertion in unit test
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this issue Jun 23, 2019
…e#14372)

* Fixes for memory leak when reshaping executor

* Fixed Adam Optimizer memory leak

* Cleanup for PR

* Added unit test for new ResourceScope method

* Removing import that was added by overzealous ide

* Add back in an import

* Added flags for executor to know whether or not it owns NDArrays for disposal

* Moving to ResourceScope.using implementation

* Changes to make ResourceScope.using work with existing scope

* Updating ResourceScope to work with existing scopes via usingIfScopeExists method

* Fix clojure unit tests

* Fixes to be compatibile with how clojure is using ResourceScope

* Removing some unnecessary changes

* Adding scope assertion in unit test
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants