Overview
This project implements a self-learning blackjack player using the Markov Decision Process (MDP). The game environment is developed in Python, and the basic strategy learning is achieved through the Q-learning method. The experiment focuses on observing the growth of the win rate and changes in policy under various rules and hyperparameters.