Out of Memory #12

0three · 2019-04-29T13:07:35Z

I run this command in my lab servers.
th main.lua --develop --name test-run --type float>
And I got error like this.
{
maxPoolStride : 2
noProgress : false
name : "test-run"
learningRate : 0.001
transmissionJPEGU_yc : 5
batchSize : 12
develop : true
optimType : "adam"
adversaryFeatureDepth : 64
messageLength : 30
transmissionCropout : 0.4
transmissionDropout : 0.4
transmissionJPEGQuality : 50
type : "float"
transmissionCropSize : 0.5
decoderConvolutions : 6
loadCheckpoint : ""
fixImage : false
encoderPreMessageConvolution : 3
noSave : false
seed : 1234
maxPoolWindowSize : 4
transmissionGaussianSigma : 2
small : false
encoderFeatureDepth : 64
confusionPer : 20
imageSize : 128
savePer : 20
imagePenaltyCoef : 1
testPer : 1
save : "checkpoints"
transmissionJPEGU_yd : 0
fixMessage : false
epochs : 200
decoderFeatureDepth : 64
transmissionNoiseType : "identity"
thin : false
transmissionJPEGCutoff : 5
transmissionJPEGU_uvd : 0
transmissionJPEGU_uvc : 3
small16 : false
transmissionOutsize : 128
transmissionCombinedRecipe : ""
adversary_gradient_scale : 0.1
adversaryConvolutions : 2
messagePenaltyCoef : 1
grayscale : false
transmissionConcatenatedRecipe : ""
encoderPostMessageConvolution : 1
randomImage : false
}
{
beta1 : 0.9
epsilon : 1e-08
learningRateDecay : 0
learningRate : 0.001
beta2 : 0.999
}
Loading training dataset
Accepting non-grayscale input
test-run: starting to train

epoch: 1

slurmstepd: error: Detected 1 oom-kill event(s) in step 10160.1 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: wmc-slave-g6: task 0: Out Of Memory

It looks like that I have no enough memory to run this. I just git clone these code and run the test command.

Could you plz share some requirements about this ? Thank you !

By the way, hope the pretrained models for research.

Thank you !

0three · 2019-04-29T13:08:13Z

My memory config.

cat /proc/meminfo
MemTotal: 131663888 kB
MemFree: 51571416 kB
MemAvailable: 95205204 kB
Buffers: 796904 kB
Cached: 44341812 kB
SwapCached: 2428 kB
Active: 62199800 kB
Inactive: 14218780 kB
Active(anon): 30307824 kB
Inactive(anon): 3407284 kB
Active(file): 31891976 kB
Inactive(file): 10811496 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 999420 kB
SwapFree: 949476 kB
Dirty: 148 kB
Writeback: 0 kB
AnonPages: 31277964 kB
Mapped: 2979180 kB
Shmem: 2435292 kB
Slab: 1860776 kB
SReclaimable: 1556396 kB
SUnreclaim: 304380 kB
KernelStack: 14688 kB
PageTables: 94932 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 66831364 kB
Committed_AS: 36091084 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 34013280 kB
DirectMap2M: 96712704 kB
DirectMap1G: 5242880 kB

0three · 2019-04-29T13:08:30Z

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

0three · 2019-04-29T13:08:39Z

Thank you !

ando-khachatryan · 2019-04-29T13:30:52Z

Hi,
I'm a bit confused, seems you are running the lua version. This repo is the pytorch implementation. The lua implementation is here: https://github.com/jirenz/HiDDeN

0three · 2019-04-29T13:45:55Z

I'm terribly sorry that I mistake the tow repos.

But there is also a question about the pytorch implementation.
In noise layer, the pytorch implementation use apply_conv() to simulate the JPEG compression in DCT fileds. However, the source code use a special way, which is by require Image and use Image.JPGCompression. It's strange and I don't find any mask in the source code(https://github.com/jirenz/HiDDeN).

Do you have any comments on this strange point?

ando-khachatryan · 2019-04-29T14:07:33Z

OK, I'm not the author of the paper, but I'll try to do my best.
The jpegn.lua file in their repo is non-differentiable (see the very first comment line in the source file). This layer is not used for training, but for verifying that the differentiable approximation of the jpeg is actually a good approximation. See Figure 5 and related explanations in their paper.
The differentiable approximations of the jpeg compression are defined in DCT_layer.lua. There still may be differences in implementation (theirs vs mine), but they seem to be doing the same thing.

0three · 2019-04-29T14:09:29Z

Thank you very much!

It solved an important issue for me.

Thanks again faithfully!!!

ando-khachatryan · 2019-04-29T14:10:33Z

You're welcome!

0three closed this as completed Apr 29, 2019

yamizi mentioned this issue Oct 1, 2019

Memory requirements for your experiments #15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of Memory #12

Out of Memory #12

0three commented Apr 29, 2019

0three commented Apr 29, 2019 •

edited

Loading

0three commented Apr 29, 2019

0three commented Apr 29, 2019

ando-khachatryan commented Apr 29, 2019

0three commented Apr 29, 2019

ando-khachatryan commented Apr 29, 2019 •

edited

Loading

0three commented Apr 29, 2019

ando-khachatryan commented Apr 29, 2019

Out of Memory #12

Out of Memory #12

Comments

0three commented Apr 29, 2019

0three commented Apr 29, 2019 • edited Loading

0three commented Apr 29, 2019

0three commented Apr 29, 2019

ando-khachatryan commented Apr 29, 2019

0three commented Apr 29, 2019

ando-khachatryan commented Apr 29, 2019 • edited Loading

0three commented Apr 29, 2019

ando-khachatryan commented Apr 29, 2019

0three commented Apr 29, 2019 •

edited

Loading

ando-khachatryan commented Apr 29, 2019 •

edited

Loading