Trajectory Recommendation with Active Learning
===================

# 1. Problem Formulation

Formulate trajectory recommendation as a contextual multi-armed bandit problem, the trade-off is recommendation with available POI/user features vs. query user preference.

1) Arm: transition from current POI$_i$ to another POI$_j$, (allow sub-tours?)

2) Context: transition features including POI category transition probabilities, POI popularity transition probabilities, distance of POI pair transition probabilities and user specific features (visit duration/frequency), available in [this notebook ](./tour_MC.ipynb).

3) Reward: a F1-scored based metric, e.g. let the trajectory up to time $t$ (after choosing a certain arm) is `[p1, p2, p3]`, ground truth trajectories `[p1, p2, p3]` and `[p1, p2, p4]`, the reward is computed as 
\begin{equation}
max\{F1([p1, p2, p3], [p1, p2, p3]), F1([p1, p2, p3], [p1, p2, p4])\} = max\{1.0, 0.667\} = 1.0
\end{equation}

4) Regret: regret after $T$ trials is defined as 
\begin{equation}
R(T) = \textbf{E}\left[\sum_t^T(\text{optimal reward})_t\right] - \textbf{E}\left[\sum_t^T(\text{actual reward})_t\right]
\end{equation}

5) Sampling strategies: 
1. $\epsilon$-greedy: choose arm (i.e. transition) of highest log likelihood with probability 1-$\epsilon$, choose a random arm with probability $\epsilon$.

1. Thompson sampling: design a prior of arm's reward (uniform? gaussian? ...), compute likelihood like [this approach](./tour_MC.ipynb#sec4).

1. UCB-type strategies such as LinUCB introduced in [this paper](http://www.research.rutgers.edu/~lihong/pub/Li10Contextual.pdf).

6) Query: query users which POIs they prefer or what the characteristics (category/popularity/distance from current place) of POIs they prefer (show selected photos of these POIs?)