-
Notifications
You must be signed in to change notification settings - Fork 170
Weights not being updated #30
Comments
Could you check the error log of your executors? Maybe the executors are not able to connect to the parameter server for some reason. |
Thank you for your response. I'm using the datascience.ibm.com environment so debugging isn't ideal. I'm looking at my spark history and the only failures on this job were: ExecutorLostFailure (executor 04397b5a-9f98-4655-9325-c3a6c21b93b5 exited caused by one of the running tasks) Reason: remote Rpc client disassociated. A coworker had some success, and even after I reran their code directly I'm having this issue. Is there some configuration that I'm missing? Feel free to answer on stack overflow. |
My gut feeling tells me something went wrong with the parameter server connection. However, I can only validate this if you can provide me with the executor error log (which failed). So, if you have your Spark application page, click on the tab "Executors" -> (id of failed executor) -> Error log. This log should hold an indication to what exactly happened. |
Thank you so much for your responses! I was starting out using the Iris dataset, and it looks like I had the batch size too high and the epochs too low. I changed these parameters and I'm not getting a good model. |
I'm following this notebook almost exactly. However, after I train my model, my weights aren't getting updated:
model.layers[0].get_weights()
trained_model.layers[0].get_weights()
Both of these give me:
[array([[-0.39513412, 0.26937097, -0.36478603, 0.30427128, -0.13985097,
-0.22316453, 0.13130313, -0.08426034],
[ 0.41418487, -0.46847233, 0.58078319, -0.63027477, -0.45647684,
-0.325973 , 0.22211522, 0.55291325],
[ 0.54379755, -0.30091569, -0.02049094, -0.4734239 , -0.41363743,
-0.38102722, -0.19341171, -0.36358535],
[-0.08354402, 0.39400059, 0.04485017, -0.1212253 , 0.07950532,
0.37202805, 0.30843312, -0.25526762]], dtype=float32),
array([ 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)]
Why is this?
The text was updated successfully, but these errors were encountered: