Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network architecture dependence #1

Open
everthemore opened this issue Feb 5, 2021 · 3 comments
Open

Network architecture dependence #1

everthemore opened this issue Feb 5, 2021 · 3 comments

Comments

@everthemore
Copy link
Contributor

A simple feedforward network with 10 -> 100 -> 1 structure does a really good job already, so we don't even have to try more complicated network types. But we should:

  • See how low we can go with the first hidden layer (e.g. train using Early Stopping, and try 10, 20, 30, ..., 100 neurons).
  • See if adding another hidden layer makes things even better (or if it just makes training harder).
@Torbjorn-Rasmussen
Copy link
Collaborator

So i tried both of these tests, in summary:

the performance of the network is decent for # of hidden noddes down to 20.
a second hidden layer improved the network quite a lot. without further training because the early stopping criterion is reached.

for testing the # of hidden nodes:
I used EarlyStopping from keras, monitoring val_loss and patience=50, running 1000 epochs typically did not make it stop early.
These are plots of the loss as a function of hidden nodes, and a typical run.
hidden_nodes_test
one_hiddel_layer

For testing a second hidden layer, i made hidden layer equivalent to the first and trained on that with the same early stopping and epochs, it stopped early in most cases actually reducing the time spent training compared to single hidden layer. And simultaneously improved the performance on the test data.
this is a typical run of 2 hidden layers.
two_hidden_layer

@everthemore
Copy link
Contributor Author

Great, so two layers with the same number of neurons achieves a lower loss, better generalization and even trains faster? :)

@Torbjorn-Rasmussen
Copy link
Collaborator

i think the improvement in training time is not that much, but going from 200s to 120s is still an improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants