Skip to content
This repository has been archived by the owner on Nov 8, 2018. It is now read-only.

Weights not being updated #30

Closed
rosswlewis opened this issue Jun 22, 2017 · 4 comments
Closed

Weights not being updated #30

rosswlewis opened this issue Jun 22, 2017 · 4 comments

Comments

@rosswlewis
Copy link

I'm following this notebook almost exactly. However, after I train my model, my weights aren't getting updated:

model.layers[0].get_weights()
trained_model.layers[0].get_weights()

Both of these give me:

[array([[-0.39513412, 0.26937097, -0.36478603, 0.30427128, -0.13985097,
-0.22316453, 0.13130313, -0.08426034],
[ 0.41418487, -0.46847233, 0.58078319, -0.63027477, -0.45647684,
-0.325973 , 0.22211522, 0.55291325],
[ 0.54379755, -0.30091569, -0.02049094, -0.4734239 , -0.41363743,
-0.38102722, -0.19341171, -0.36358535],
[-0.08354402, 0.39400059, 0.04485017, -0.1212253 , 0.07950532,
0.37202805, 0.30843312, -0.25526762]], dtype=float32),
array([ 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)]

Why is this?

@JoeriHermans
Copy link
Collaborator

Could you check the error log of your executors? Maybe the executors are not able to connect to the parameter server for some reason.

@rosswlewis
Copy link
Author

rosswlewis commented Jun 23, 2017

Thank you for your response. I'm using the datascience.ibm.com environment so debugging isn't ideal. I'm looking at my spark history and the only failures on this job were:

ExecutorLostFailure (executor 04397b5a-9f98-4655-9325-c3a6c21b93b5 exited caused by one of the running tasks) Reason: remote Rpc client disassociated.

A coworker had some success, and even after I reran their code directly I'm having this issue. Is there some configuration that I'm missing?

Feel free to answer on stack overflow.

@JoeriHermans
Copy link
Collaborator

My gut feeling tells me something went wrong with the parameter server connection. However, I can only validate this if you can provide me with the executor error log (which failed). So, if you have your Spark application page, click on the tab "Executors" -> (id of failed executor) -> Error log.

This log should hold an indication to what exactly happened.

@rosswlewis
Copy link
Author

Thank you so much for your responses! I was starting out using the Iris dataset, and it looks like I had the batch size too high and the epochs too low. I changed these parameters and I'm not getting a good model.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants