Need to implement GPU back-end to improve training and inference speed. Back-end can be implemented using CUDA or OpenAcc.