# Two state MDP for MANP

We consider the problem of MANP (*M*aximizing *A*verage *N*umber of *P*ages read in multi-page publications) as a simple MDP.<br>
Content producer is an active agent that takes actions, content consumer (also called visitor, user) is  a part of the environment and his reactions to agent's actions define the dynamics of the environment. <br>

We can roughly describe the process in the following way:<br>
* On each step the agent shows to the user one page $c_i \in C =\{c_1,\dots,c_n\}$ .<br>
* After seeing $c_i$ the user **either** proceeds and see the next page **or** leaves. Sometimes we say that the user presses "next" or "leave" button.<br>


#### The example of presenting  the problem as state machine when there are 4 pages $C=\{c_1,c_2,c_3,c_4\}$: 

Each action "*show page x*" can traverse the system to the final state ( the user leaved **after** seeing page x ) with probability $P_x$ **or** traverse the system to the state when the user wants to continue to see the pages with probability $1 - P_x$. There is one special action "*stop showing pages*" which always traverse the system to the final state. 
Red arrows denote the transition of the process to the final state - the visitor leaved.<br>
<img src="Simple_MDP_includes/Two_states.png" style="width:450px;height:400px"/>

<hr style="border:2px solid gray"> </hr>

## MDP definition <br>

 1. **State space** $S = \{s_{next},s_{leaved}\}$ : 
    * $s_{next}$ is a state when the user wants to see the next page.
    * $s_{next}$ is an initial state.
    * $s_{leaved}$ is a state when the user leaved. It is a final state.
 <br> <br> 
 2.  **Action space** $A=\{a_{c_1}\dots a_{c_n}\} \cup \{a_{stop}\}$ :
    * Action $a_{c_k}$ denotes the action of presenting to the visitor page $c_k \in B$.
    * Action $a_{stop}$ denotes the action "stop showing pages"
 <br><br>
 3.  **Reward function** $R(a,s,s')$ for $a \in A\;s,s'\in S$ :
    * Reward function depends only on the resulting state $s'$.
    * If the agent traversed to $s_{leaved}$, the reward is zero. Otherwise it's 1.
 <br><br>
 4. **Transition function** $T(a,s,s')$ for $a \in A\;s,s'\in S$ :
     * Depends only on action $a$ i.e  only on the current page shown to the user.<br>
     

<hr style="border:2px solid gray"> </hr>
* Example of transition matrix for action "show page 1", when $p_1$ is a probability of leaving of page 1 for the current user:
    <table style="border-collapse:collapse;border-spacing:0" class="tg"><thead><tr><th style="border-color:inherit;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;font-weight:normal;overflow:hidden;padding:10px 5px;text-align:center;vertical-align:top;word-break:normal"><span style="font-weight:bold;text-decoration:underline">Action:</span><br><span style="font-weight:bold;text-decoration:underline">show page</span><br><span style="font-weight:bold">1</span></th><th style="border-color:inherit;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:20px;font-weight:normal;overflow:hidden;padding:10px 5px;text-align:center;vertical-align:top;word-break:normal" colspan="3"><span style="font-weight:bold">Next state: </span><br><span style="font-weight:bold">s'</span></th></tr></thead><tbody><tr><td style="border-color:inherit;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:20px;overflow:hidden;padding:10px 5px;text-align:center;vertical-align:top;word-break:normal" rowspan="3"><br><span style="font-weight:bold">Current</span><br><span style="font-weight:bold">state:</span><br><span style="font-weight:bold">s</span><br></td><td style="border-color:inherit;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;text-align:left;vertical-align:top;word-break:normal"></td><td style="border-color:inherit;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;text-align:left;vertical-align:top;word-break:normal">s_next</td><td style="border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;text-align:left;vertical-align:top;word-break:normal">s_leaved</td></tr><tr><td style="border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;text-align:left;vertical-align:top;word-break:normal">s_next</td><td style="border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;text-align:center;vertical-align:top;word-break:normal">1 - p_1</td><td style="border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;text-align:center;vertical-align:top;word-break:normal">p_1</td></tr><tr><td style="border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;text-align:left;vertical-align:top;word-break:normal">s_leaved</td><td style="border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;text-align:center;vertical-align:top;word-break:normal">0</td><td style="border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;text-align:center;vertical-align:top;word-break:normal">1</td></tr></tbody></table>

##  MDP Implementation
<a id='impl'></a>

In [1]:
# File with modules to use
include("Simple_MDP_includes/modules_to_use.jl")

### Basic objects
Two basic objects of the model are Page and User_simple .<br>

User_simple  object will encapsulate the particular user's behavior. Although the user is a part of an environment in this model, and it may be fully described by the transition function of the process , we decided to represent it as a separate object.<br>

User_simple  key attribute is userPreferencesFunction function. Given the current page shown her, the function returns the distribution on the possible user's reactions (leaving and proceeding).<br>

Page is the object that represents the unit of content which the user consumes (watches).

In [2]:
struct Page
   id :: Int64 
end

pages_collection = nothing

struct User_simple 
   id :: Int64
   userPreferencesFunction:: Any   # Recieve pages visited, current page shown
end;



### MDP basic elements
<a id='elem'></a>
In this section we define basic elements of the MDP such as actions , states and the mdp object itself.<br>

#### States and Actions

The <span style='background:#D3D3D3'> Action_simple</span>  object represents the action of presenting to the user a particular unit of content (aka page). There is one additional action with show_page attribute equals to "nothing", this is an action of stopping showing the pages to the user.  

The <span style='background:#D3D3D3'> State_simple</span> object specifies state. There are only two states in the MDP, so the only attribute defines whether it's a final state or not.

In [3]:
struct Action_simple
   show_page :: Union{Page,Nothing}           # The page that is presented to the user
end

struct State_simple                             
    if_terminal :: Bool
end 

#### MDP object
As a part of the POMDPs interface specifications , we should define an MDP object type <span style='background:#D3D3D3'> ContentProducerMdp </span>  that inherits from the abstract type MDP parametrized with the type of actions' and states' object. The structure contains the parameters of the MDP environment for MANP. It is passed as a first argument to all the POMDPs interface's functions. 

It will contain a current user that determine the process dynamics, array of content units (pages) as well as tools for tracking and statistics. 

In addition we define <span style='background:#D3D3D3'> MdpStatistics1 </span>, the object that aggregates the statistics for the concrete <span style='background:#D3D3D3'> ContentProducerMdp </span>.

In [4]:
mutable struct MdpStatistics1
    current_user_path :: Array{Page,1} 
    nexts_per_page :: Array{Int64,1}
    leaves_per_page :: Array{Int64,1}
    total_pages_seen :: Int64
    users_number :: Int64
    pages_seen_per_user :: Array{Int64,1}
    page_fialures_num :: Int64
    
end

mutable struct ContentProducerMdp1 <: MDP{State_simple,Action_simple}
    pages :: Array{Page,1}
    current_user :: User_simple
    statistics :: MdpStatistics1
    statistics_on :: Bool            # Whether the statistics should be collected
                                     # (Not neccessary for value iteration for example)
    
end


##### Auxiliary functions specified in POMDPs interface

In [5]:
POMDPs.discount(mdp::ContentProducerMdp1) = 1

POMDPs.initialstate(mdp::ContentProducerMdp1)= SparseCat([State_simple(false)],[1])

POMDPs.isterminal(mdp::ContentProducerMdp1,s::State_simple)= return s.if_terminal

#### Transition function

In current less complicated formulation of the problem, the transition distribution depends only on the action ( and not on the state ). The user decision whether to proceed or to leave explained only by page that was presented him last.

The transition distribution is defined by the probability of the user to leave each page. The transition function queries the current user to get it (userPreferencesFunction attribute of User_simple). <br>
 

In [6]:
function POMDPs.transition(mdp::ContentProducerMdp1,state::State_simple,act::Action_simple)

    current_page = act.show_page   
    
    # Attemp to take action in the terminal state
    if POMDPs.isterminal(mdp,state)
       return SparseCat([State_simple(true)], [1.0]) 
    end
    
    # Stop showing pages to the user 
    if current_page === nothing
        return SparseCat([State_simple(true)], [1.0]) 
    end
    
    # Getting user's leaving probability
    leaving_probab = mdp.current_user.userPreferencesFunction(current_page)
    
    # Return distribution object on possible states
    return SparseCat([State_simple(false), State_simple(true)],[1-leaving_probab,leaving_probab])
end


#### Reward function

The reward function depends on the destination state $s'$. If $s'$ is a final state, it means that the user leaved after seeing the last content unit and the reward is zero, and if $s'$ is non-terminal state then the user pressed "next" and the reward is 1.

In [7]:
function POMDPs.reward(mdp::ContentProducerMdp1, state::State_simple, act::Action_simple, stateP::State_simple)
    
    # Collect statistics
    RecordStatistics(mdp, state, stateP, act)
    # The user pressed "next"     
    if !POMDPs.isterminal(mdp, stateP)    
        return 1
    # The user pressed "leave"
    else
        return 0
    end

end


##### Auxiliary functions 

In [8]:
include("Simple_MDP_includes/functions.jl")

## Tests of the MDP
1. Create Pages objects.
2. Create User with preferences (the probability of leaving)  $1 - \frac{x-1}{N}$ , when $N$ is a number of pages and $x$ is a page id.<br> 
3. Create and initialize the statistics collector object for MDP<br> Creating MDP object.
4. Create a Random Policy ( from POMDPs package )<br> The out of the box Random policy can show the same page more than once. 

5. Create a Simulator ( from POMDPs package )
6. Run several iteration of the simulation

In [10]:

pages_number = 10

pages1 = CreatePages(pages_number)

# Creating user preferences function
User_simple1 = User_simple(1,
    userPreferencesCreate(pages1, collect(1:-(1/pages_number):1/pages_number))
             );

# MDP and statistics initialization 
statistics1 = InitStatistics1(length(pages1))
mdp1 = ContentProducerMdp1(pages1, User_simple1, statistics1, true);

# We select a random policy that is provided by POMDPs package to perform an initial testing for the model.
policy1 = RandomPolicy(mdp1);

# The most simple POMDPs simulator.
rs = RolloutSimulator();

for _ in 1:5
    println("\n\n=========== Simulation starts ===============")
    r = simulate(rs,mdp1,policy1)
    println("\033[1m Reward \033[0m : $(r)")
end




The agent showed page : Page(5)
The user leaved.
[1m[4mSummary [0m : 
 Pages seen: Page[Page(5)] 
 Number of pages seen: 1
[1m Reward [0m : 0.0


The agent showed page : Page(10)
The agent showed page : Page(7)
The agent showed page : Page(4)
The agent showed page : Page(8)
The user leaved.
[1m[4mSummary [0m : 
 Pages seen: Page[Page(10), Page(7), Page(4), Page(8)] 
 Number of pages seen: 4
[1m Reward [0m : 3.0


The agent showed page : Page(2)
The user leaved.
[1m[4mSummary [0m : 
 Pages seen: Page[Page(2)] 
 Number of pages seen: 1
[1m Reward [0m : 0.0


The agent showed page : Page(6)
The agent showed page : Page(7)
The agent stopped showing the pages.
Summary: 
 Pages seen: Page[Page(6), Page(7)] 
 Number of pages seen: 2
[1m Reward [0m : 2.0


The agent showed page : Page(1)
The user leaved.
[1m[4mSummary [0m : 
 Pages seen: Page[Page(1)] 
 Number of pages seen: 1
[1m Reward [0m : 0.0
