Trajectory Recommendation with Active Learning
===================

# 1. Problem Formulation

## 1.1 Active Learning Formulation

Formulate trajectory recommendation as an active learning problem.
- Example: a tuple (user, trajectory)
- Label of example: binary
   - Observed tuples (user, trajectory) (i.e. those we extracted from data) are labelled as positive
   - Unobserved tuples (e.g. trajectories generated on the fly such as synthesize trajectories by enumeration) are unlabelled
   - Tuples chosen to query a user will be labelled as positive/negative if the feedback from the user is positive/negative

Query strategies: 
given (start, end) and trajectory length $l$ for a specific user $u$, we'll select an example for user $u$ to label.
- Compute the likelihood of trajectories that satisfy the (start, end) and length constraint, 
  sort these trajectories by likelihood
- Choose a trajectory from the top $K$ (e.g. 5) trajectories with probability proportional to its likelihood

**TODO**: utilise user-specific features when selecting trajectory for user $u$ to label if there are trajectories of user $u$ in our dataset.

## 1.2 Bandit Formulation

Formulate trajectory recommendation as a contextual multi-armed bandit problem, the trade-off is recommendation with available POI/user features vs. query user preference.

1) Arm: transition from current POI$_i$ to another POI$_j$, (allow sub-tours?)

2) Context: transition features including POI category transition probabilities, POI popularity transition probabilities, distance of POI pair transition probabilities and user specific features (visit duration/frequency), available in [this notebook ](./traj_feature.ipynb).

3) Reward: 
- [F1-scored](./ijcai15.ipynb#sec2.1) based metric, e.g. let the trajectory up to time $t$ (after choosing a certain arm) is `[p1, p2, p3]`, ground truth trajectories `[p1, p2, p3]` and `[p1, p2, p4]`, the reward is computed as 
\begin{equation}
max\{F1([p1, p2, p3], [p1, p2, p3]), F1([p1, p2, p3], [p1, p2, p4])\} = max\{1.0, 0.667\} = 1.0
\end{equation}

- Visiting order based metric, e.g. let the trajectory up to time $t$ (after choosing a certain arm) is `[p1, p3, p2]`, ground truth trajectories `[p1, p2, p3]` and `[p1, p2, p4]`, 
with respect to trajectory `[p1, p2, p3]`, pair `(p1, p3)` in `[p1, p3, p2]` count because `p3` is visited after `p1`,
similarly, pair `(p1, p2)` count but pair `(p3, p2)` does not count because `p3` is not visited before `p2` in trajectory `[p1, p2, p3]`, so the total number of pairs that counts is `2` and there are `3` pairs in total, the metric here is `2/3`. Similarly, the metric with respect to trajectory `[p1, p2, p4]` is `1/3` because there is only one pair, i.e. `(p1, p2)` among the `3` pairs count. The reward is computed as 
\begin{equation}
max\{\frac{2}{3}, \frac{1}{3}\} = \frac{2}{3}
\end{equation}

- Metrics incorporating POI category, POI popularity, and the travel distance from the current POI, e.g. let the trajectory up to time $t$ (after choosing a certain arm) is `[p1, p3]`, and ground truth trajectory `[p1, p4]`, 
if `p3` and `p4` are of the same category as well as similar popularity (i.e. belong to the same popularity class after discretizing the POI popularity) and similar distance (i.e. same distance class after discretization) from the current POI, then the reward of arm `(p1, p3)` would be high.

4) Regret: regret after $T$ trials is defined as 
\begin{equation}
R(T) = \textbf{E}\left[\sum_t^T(\text{optimal reward})_t\right] - \textbf{E}\left[\sum_t^T(\text{actual reward})_t\right]
\end{equation}

5) Sampling strategies: 
1. $\epsilon$-greedy: choose arm (i.e. transition) of highest log likelihood with probability 1-$\epsilon$, choose a random arm with probability $\epsilon$.

1. Thompson sampling: design a prior of arm's reward (uniform? gaussian? ...), compute likelihood like [this approach](./traj_MC.ipynb#sec4), an interesting [paper](http://www.research.rutgers.edu/~lihong/pub/Chapelle12Empirical.pdf).

1. UCB-type strategies such as LinUCB introduced in [this paper](http://www.research.rutgers.edu/~lihong/pub/Li10Contextual.pdf).

6) Query: 
- query the user the selected arm, or
- query users which POIs they prefer or what the characteristics (category/popularity/distance from current place) of POIs they prefer (show selected photos of these POIs?)

## 3. Baseline

Given (start, end) and trajectory length $l$ for a specific user $u$, we'll select an example for user $u$ to label.
- Compute the likelihood of trajectories that satisfy the (start, end) and length constraint, 
  sort these trajectories by likelihood
- Recommend a trajectory from the top $K$ (e.g. 5) trajectories with probability proportional to its likelihood

**TODO**: utilise user-specific features when recommending trajectory for user $u$ if there are trajectories of user $u$ in our dataset.

# 2. Experiment for choosing formulation