<a href="https://colab.research.google.com/github/ZiminPark/recsim/blob/master/recsim/colab/RecSim_Overview.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Running RecSim
In this Colab we explore how to train and evaulate an agent within RecSim using the provided environments and clarify some basic concepts along the way. 


# RecSim at a Glance
RecSim is a configurable platform for simulating a recommendation system environment in which a recommender agent interacts with a corpus of documents (or recommendable items) and a set of users, in a natural but abstract fashion, to support the development of new recommendation algorithms.
At its core, a RecSim simulation consists of running the following event loop for some fixed number of sessions (episodes):



![RecSim at a glance](https://github.com/google-research/recsim/blob/master/recsim/colab/figures/recsim_at_a_glance.png?raw=true)



```
for episode in [1,...,number_of_episodes]:
  user = sample_user()
  recommended_slate = null
  while session_not_over:
    user_response = user_responds_to_recommendation(recommended_slate)
    available_documents = sample_documents_from_database()
    recommended_slate = agent_step(available_documents, user_response)
```

The document database (document model), user model, and recommender agent each have various internal components, and we will discuss how to design and implement them in later colabs ([Developing an Environment](RecSim_Developing_an_Environment.ipynb), [Developing an Agent](RecSim_Developing_an_Agent.ipynb)). For now, we will see how to set up one of the ready-made environments that ship with RecSim in order to run a simulation.


In [None]:
!pip install --upgrade --no-cache-dir recsim
# Load the TensorBoard notebook extension
%load_ext tensorboard

In [None]:
import numpy as np
import tensorflow as tf

In RecSim, user model과 document model이 OpenAI Gym-style environment. <br> 
이 노트에서는, "Interest Evolution" environment used in [Ie et al.](https://arxiv.org/abs/1905.12767)과 full Slate-Q agent을 사용할 것이다.<br>
둘 다 RecSim에 구현되어 있음. <br>

In [None]:
from recsim.environments import interest_evolution
from recsim.agents import full_slate_q_agent
from recsim.simulator import runner_lib

#Creating an Agent

[Dopamine](https://github.com/google/dopamine)처럼, RecSim은  environment creation function과 agent creation function을 인풋으로 받는다. <br>

환경 변화는 creation function에서 처리하고 우리의 관심은 agent에 한정하자.<br>

A create_agent function은 tensorflow session과 training/eval flag, Tensorflow summary writer, environment object 을 받는다. <br>


 Slate-Q의 경우 action과 observation spaces을 환경으로 부터 받고 agent constructor에 넘겨준다. 




In [None]:
def create_agent(sess, environment, eval_mode, summary_writer=None):
  kwargs = {
      'observation_space': environment.observation_space,
      'action_space': environment.action_space,
      'summary_writer': summary_writer,
      'eval_mode': eval_mode,
  }
  return full_slate_q_agent.FullSlateQAgent(sess, **kwargs)

#Training and Evaluating the Agent in a Simulation Loop
Before we run the agent, we need to set up a few environment parameters. These are the bare minimum:
* *slate_size* sets the size of the set of elements presented to the user;
* *num_candidates* specifies the number of documents present in the document database at any given time;
* *resample_documents* specifies whether the set of candidates should be resampled between time steps according to the document distribution (more on this in [later notebooks](RecSim_Developing_an_Environment.ipynb)).
* finally, we set the random seed.

In [None]:
seed = 0
np.random.seed(seed)
env_config = {
  'num_candidates': 10,
  'slate_size': 2,
  'resample_documents': True,
  'seed': seed,
  }

Once we've created a dictionary of these, we can run training, specifying additionally the number of training steps, number of iterations and a directory in which to checkpoint the agent.


In [None]:
tmp_base_dir = '/tmp/recsim/'
runner = runner_lib.TrainRunner(
    base_dir=tmp_base_dir,
    create_agent_fn=create_agent,
    env=interest_evolution.create_environment(env_config),
    episode_log_file="",
    max_training_steps=50,
    num_iterations=10)
runner.run_experiment()

After training is finished, we can run a separate simulation to evaluate the agent's performance. 

In [None]:
  runner = runner_lib.EvalRunner(
      base_dir=tmp_base_dir,
      create_agent_fn=create_agent,
      env=interest_evolution.create_environment(env_config),
      max_eval_episodes=5,
      test_mode=True)
  runner.run_experiment()

The cumulative reward across the training episodes will be stored in *base_dir/eval/*. However, RecSim also exports a more detailed set of summaries, including environment specific ones, that can be visualized in a Tensorboard. 

In [None]:
#@title Tensorboard
%tensorboard --logdir=/tmp/recsim/


## References
[SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets. IJCAI 2019: 2592-2599](https://arxiv.org/abs/1905.12767)