# Training a Rubix Cube Solver via Reinforcement Learning: Part 2

## Improving solver with Monte Carlo Tree Search (MCTS)

In Solving the Rubik’s Cube Without Human Knowledge [1], the authors combine Deep Q-Learning with Monte Carlo Tree Search to achieve impressive performance with their Rubix Cube solver. This approach is adopted for this solver to improve its performance as well.

## Training with a GPU on AWS

In order to improve the performance of the Rubix Cube Solver, its neural network must be trained for a longer time and attempt to solve many more cubes. For example, the network trained in [1] is exposed to roughly ~8 billion examples of a shuffled cube. In comparison, the networks trained in part one only saw at 1000 examples of a shuffled cube. 


As it is impractical to train the solver for extended periods on my MacBook, a cloud computing solution is instead used for longer training cycles, specifically Amazon Web Services (AWS).

### Steps for training Rubix Cube Solver on AWS

#### 1) Launch an EC2 instance

        Recommended AMI: TensorFlow from NVIDIA AMI (https://aws.amazon.com/marketplace/pp/B07S2Z9N33)
        
        Recommended Instance Type: g4dn.xlarge (one of the least-expensive GPU-enabled instance types)
        
#### 2) SSH into instance 

    ssh -i <private-key.pem> ubuntu@<instance Public DNS (IPv4)>
        
#### 3) Clone repo onto EC2 instance

    git clone https://github.com/MattD18/rubix-cube.git 

#### 4) Pull Tensorflow v2.0 with GPU Docker Image

    docker pull tensorflow/tensorflow:2.0.0-gpu-py3
    
#### 5) Run training script on detached docker container

    docker run -v /home/ubuntu/:/home -d -w /home/rubix-cube/scripts --gpus all -it --rm
    tensorflow/tensorflow:2.0.0-gpu-py3 python train.py
    
    Note: Running the training script on a detached docker container allows the user to disconnect from their ssh 
    session with instance during training.
    
#### 7) Set-Up AWS CLI 

    sudo apt install awscli 
    aws configure
    
#### 6) Upload results to S3 bucket from repo root directory

    ./save_results.sh


### Note on spot pricing

AWS allows users to request EC2 instances at their current "spot prices" which are often substanially cheaper than the offered "on-demand" price [2]. Spot-instances consequently provide a much more cost effective resource for training models. However, since AWS has the ability to terminate a spot instance based on current demand for the instance type, the training procedure must be fault tolerant.

For the training of the Rubix Cube solver, fault tolerance during training is done by checkpointing the networks weights every 500 epochs and saving both the checkpointed weights as well as the training logs (loss and validation accuracy) to an S3 bucket every 2 hours.
        


## Source

[1] https://arxiv.org/pdf/1805.07470.pdf

[2] https://aws.amazon.com/ec2/spot/pricing/