This project implements core Markov Decision Process (MDP) and Reinforcement Learning techniques in the classic UC Berkeley Pac-Man environment. Using value iteration, the agent first learns an optimal policy offline by repeatedly evaluating state utilities and computing the best action for each state based on expected long-term reward. The implementation includes full updates of state values and Q-values using transition probabilities, rewards, and discounting, enabling the agent to solve the Gridworld MDP and behave optimally after sufficient iterations.
In addition to planning-based methods, the project extends into model-free reinforcement learning through Q-learning. The Q-learning agent updates its action–value estimates directly from experience using exploration, learning rate, and discounted future rewards. Once trained, the same implementation seamlessly transfers to Pac-Man, where the agent learns effective strategies by playing thousands of training episodes. Together, these two components demonstrate the contrast between MDP planning and experiential learning, showcasing foundational AI techniques used in optimal decision-making and autonomous game-playing.