-
Notifications
You must be signed in to change notification settings - Fork 0
Meeting 1 December 2017
Decidetto edited this page Dec 1, 2017
·
3 revisions
- Mark Winands
- Joshua Scheidt
- Marciano Geijselaers
- Max Meijers
- Simon Craenen (late)
Absent: Timo Raff (overslept)
Explanation of how approach works
-
Autoencoder, how many hidden layers do we need..?
- Three values feel as too little
- Variational autoencoders
- Might want to run on GPU -> need CUDA and all
-
Python keras framework
- Already in place
-
Not using planning track software, because Kurt said so
- Easier to get states in planning though
- Might want to switch back to it
-
Problem with square distance going to origin
- Maybe rewrite on our own
- Send email, include Mark in CC
- With already modified code, working, to push the process along
- Maybe rewrite on our own
-
Accuracy still poor
- Don't know how many nodes needed
-
Connection between Java and Python
- Maven?
-
Good thing that we're using our own software, but that does mean that participation in the competition is not an option anymore.
- But we weren't planning on anymore anyway.
-
If needed, go from model <s,a,s',r> to <s,a,s,a>
-
Accuracy between 0.24 and 0.33.
- Not bad, but...:
- Mostly zeros and 121 in a few places.
-
Gradient normalisation: DEFINITELY something to implement!
- Have to be used to prevent "flying off the rails" of the values
-
Softmax, possible approach for action choice.
- All possible actions sum to 1 -> choose best from these
-
We're using regression, but that's expensive!
- Use classification if possible: alternative approach, (goes against Kurt?)
- Use MCTS player from planning track to observe and learn best actions for use in classification
- Put learned neural net back into MCTS, to influence exploration, and observe and learn more from it!
- When in certain state, give network which action would be best
- Check legality though!
- Alternative extension
- 5 nodes on input, 5 nodes on output, indicating whether an action is used.
- Instead of <s,a,r>, return <s,a> (This is not compatible with Q-learning anymore)
- Is a policy network, instead of value network we have now
- MCTS already used to gather data, so extend this
- Use classification if possible: alternative approach, (goes against Kurt?)
-
Aachen cluster?
- Simply used for optimising weights
- Can put computations onto alllllll CPUs on the cluster if multi-core
-
Report period 2?
- Not necessarily now, but it's good to write things down before we forget
- Problems we run into
- Choices we make to get around them
- Possible future research ideas
- Not necessarily now, but it's good to write things down before we forget