Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loss progress #10

Closed
Sh0lim opened this issue Feb 3, 2018 · 9 comments
Closed

loss progress #10

Sh0lim opened this issue Feb 3, 2018 · 9 comments

Comments

@Sh0lim
Copy link

Sh0lim commented Feb 3, 2018

Hi Matteo,

I was wondering how was the loss progress during your training? For example, after 3 epochs loss didnt change at all for me. It is around 8.1 and not dropping.
I was wondering how much loss was for you when you finished training, and how much we can expect to be after 49 epochs?

All the best.

@matteo-dunnhofer
Copy link
Owner

Hello @Sh0lim

unfortunately I don't have the logs of the experiments anymore so I hope to remember right. I remember that at the beginning I had to have patience, after a few epochs the loss started decreasing. But you could try to lower the learning rate by a factor of 10 earlier.
At epoch 49 the loss should be around 2.7...something.

@Sh0lim
Copy link
Author

Sh0lim commented Feb 4, 2018

Yes, patience. :)
Anyway, that was useful. Thank you.

@donghn
Copy link

donghn commented Feb 11, 2018

@Sh0lim Have you finished your training, yet? Can you tell me about your result. I got a similar problem.

@Sh0lim
Copy link
Author

Sh0lim commented Feb 11, 2018

Hi @donghn,
i had problem with learning rate. I modify code a bit and there was bug. I was reducing learning rate by 10 every step, so learning rate was almost 0.
When I correct it, loss was doping down normally. Loss after 3 epoch was around 4.5, after 7 epochs was around 3. I started with learning rate 0.001. After 10 epochs i divide learning rate by 10. training is still running, I let you know what was the loss at the end. But I have old gpu, so I will not go to the end. Ill stop it after 20 epochs.

All the best.

@donghn
Copy link

donghn commented Feb 11, 2018

@Sh0lim Yes. Thank so much. !!

@donghn
Copy link

donghn commented Feb 13, 2018

@Sh0lim, @dontfollowmeimcrazy Hi you. Can you show me. How to run this program on GPU? I have installed tensorflow_GPU. So, when I run this program I checked my GPU and I found that the GPU was working for this training. But, It's so slow. It take over 12 hours for 1 epoch. thanks

@Sh0lim
Copy link
Author

Sh0lim commented Feb 13, 2018

try with this. gpu_memory_fraction goes up to 1.

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=gpu_memory_fraction)

sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options, allow_soft_placement=True, log_device_placement=False))

with sess.as_default():
...

Best,

M.

@briansune
Copy link

@Sh0lim may i ask about the details of your workable settings? learning rate test_step and batch_size etc.
This will really helpful, i also encounter some loss stuck issues.

@matteo-dunnhofer
Copy link
Owner

Hi guys

if you happen to find better hyper parameters just submit a PR. That would be appreciated! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants