Skip to content

Meeting 1 December 2017

Decidetto edited this page Dec 1, 2017 · 3 revisions

Meeting Minutes - Project Group 8

Location:  2.002 Date:   1 December 2017
Time:   9:32 - 10:06

Attendance

  • Mark Winands
  • Joshua Scheidt
  • Marciano Geijselaers
  • Max Meijers
  • Simon Craenen (late)

Absent: Timo Raff (overslept)

Meeting:

Explanation of how approach works

  • Autoencoder, how many hidden layers do we need..?

    • Three values feel as too little
    • Variational autoencoders
    • Might want to run on GPU -> need CUDA and all
  • Python keras framework

    • Already in place
  • Not using planning track software, because Kurt said so

    • Easier to get states in planning though
    • Might want to switch back to it
  • Problem with square distance going to origin

    • Maybe rewrite on our own
      • Send email, include Mark in CC
      • With already modified code, working, to push the process along
  • Accuracy still poor

    • Don't know how many nodes needed
  • Connection between Java and Python

    • Maven?
  • Good thing that we're using our own software, but that does mean that participation in the competition is not an option anymore.

    • But we weren't planning on anymore anyway.
  • If needed, go from model <s,a,s',r> to <s,a,s,a>

  • Accuracy between 0.24 and 0.33.

    • Not bad, but...:
    • Mostly zeros and 121 in a few places.
  • Gradient normalisation: DEFINITELY something to implement!

    • Have to be used to prevent "flying off the rails" of the values
  • Softmax, possible approach for action choice.

    • All possible actions sum to 1 -> choose best from these
  • We're using regression, but that's expensive!

    • Use classification if possible: alternative approach, (goes against Kurt?)
      • Use MCTS player from planning track to observe and learn best actions for use in classification
      • Put learned neural net back into MCTS, to influence exploration, and observe and learn more from it!
      • When in certain state, give network which action would be best
        • Check legality though!
      • Alternative extension
        • 5 nodes on input, 5 nodes on output, indicating whether an action is used.
        • Instead of <s,a,r>, return <s,a> (This is not compatible with Q-learning anymore)
        • Is a policy network, instead of value network we have now
      • MCTS already used to gather data, so extend this
  • Aachen cluster?

    • Simply used for optimising weights
    • Can put computations onto alllllll CPUs on the cluster if multi-core
  • Report period 2?

    • Not necessarily now, but it's good to write things down before we forget
      • Problems we run into
      • Choices we make to get around them
      • Possible future research ideas

Clone this wiki locally