This repo contains my completed pre-screening exercise for the Reinforcement Learning Open Source Fest 2021.
Analysis how non-stationarity affects different Contextual Bandit algorithms
Changing the reward distribution over time and adding varying noise
Comparing the results of different exploration algorithms
My code in based on the Simulating Content Personalization with Contextual Bandits Vowpal Wabbit tutorial https://vowpalwabbit.org/tutorials/cb_simulation.html.