Skip to content

We apply the noisy cross-entropy method to the game of Tetris to demonstrate its efficiency.

Notifications You must be signed in to change notification settings

corentinpla/Learning-Tetris-Using-the-Noisy-Cross-Entropy-Method

Repository files navigation

Learning Tetris using the noisy Cross Entropy method

Project conducted as part of the second year Monte Carlo course at ENSAE. Most of the explanations belows are extracts from preexisting articles (see bibliography), however, the code and the results are ours.

Main idea

The cross-entropy method is an efficient and general optimization algorithm. However, its applicability in reinforcement learning seems to be limited although it is fast, because it often converges to suboptimal policies. A standard technique for preventing early convergence is to introduce noise. We apply the noisy cross-entropy method to the game of Tetris to demonstrate its efficiency.

Code

  • Tetris.py implements the Tetris game and the state-value functions.
  • Tetris_tuned.py proposes two more features to integrate in the state-value function (depht of the holes & numbers of lines with holes) as it is proposed by [2].
  • CE_method.py implements the classical cross-entropy method optimizer.
  • CE_method_with_noise.py implements the Noisy cross-entropy method (constant & decreasing noise).
  • Simulated_annealing.py implements the simulated annealing optimizer.

Tetris controller

Following the approach of Thiery and Scherrer [2], we shall learn state-value functions that are linear combination of 21 basis functions.

Feature Id Description Comments
Column height $h_p$ Height of the $p$ th column of the board There are $P$ such features where $P$ is the board width
Column difference $\Delta h_p$ Absolute difference $\mid h_p − h_{p+1} \mid$ between adjacent columns There are $P − 1$ such features where $P$ is the board width
Maximum height $L$ Maximum pile height Prevents from having a big pile
Holes $H$ Number of empty cells covered by at least one full cell Prevents from making holes

The value function to optimise:

$$ V_w(s):=\sum_{i=1}^{10} w_i h_i(s) + \sum_{i=1}^{9} w_{10+i}\Delta h_i(s) + w_{20}L + w_{21}H $$

where $s$ denotes a Tetris state and $w=(w_1,w_2,...,w_{21})$ the weight under which optimize.

One of our Tetris simulation for an optimised weight vector (for gif generator see Tetris.py) :

Optimizer

Cross entropy

Presentation

An interesting illustration to understand this principle for $w=(w_1,w_2)$ is proposed in this repo :

  • The red crosses : the 10 best vectors we select and use to estimate next round $\mathcal{N}\left(\mu, \sigma^2\right)$
  • The black dots : the next round 100 vectors we generate and test

Results

Cross entropy with constant noise

Presentation

Results

Cross entropy with decreasing noise

Presentation

Results

Further improvements

  • Try a two pieces Tetris controller
  • Use new features in the controller
  • Optimize the hyperparameters
  • Simulated annealing optimizer

Bibliography

Contacts

About

We apply the noisy cross-entropy method to the game of Tetris to demonstrate its efficiency.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages