# Trajectory Recommendation - Structured SVM

Table of contents:
1. [Description of structured prediction](#1.-Description-of-structured-prediction)
1. [Inference](#2.-Inference)
 1. [Brute force search](#2.1-Brute-force-search)
 1. [Greedy search](#2.2-Greedy-search)
 1. [The Viterbi algorithm](#2.3-The-Viterbi-algorithm)
 1. [The list Viterbi algorithm](#2.4-The-list-Viterbi-algorithm)
 1. [Integer linear programming](#2.5-Integer-linear-programming)
1. [Structured SVM](#3.-Structured-SVM)

In [None]:
#% matplotlib inline

import os, sys, time, pickle, tempfile
import math, random, itertools
import pandas as pd
import numpy as np
import heapq as hq
from scipy.optimize import minimize

from sklearn.preprocessing import MinMaxScaler, StandardScaler, MaxAbsScaler

from pystruct.models import StructuredModel
from pystruct.learners import OneSlackSSVM

from joblib import Parallel, delayed
import cython
import pulp

```dat_ix``` is required in notebook ```shared.ipynb```.

In [None]:
dat_ix = 0

Run notebook ```shared.ipynb```.

In [None]:
%run 'shared.ipynb'

Hyperparameters.

In [None]:
N_JOBS = 6         # number of parallel jobs
USE_GUROBI = False # whether to use GUROBI as ILP solver
ABS_SCALER = True  # feature scaling, True: MaxAbsScaler, False: MinMaxScaler #False: StandardScaler
C_SET = [0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30, 100, 300, 1000]  # regularisation parameter
MC_PORTION = 0.1   # the portion of data that sampled by Monte-Carlo cross-validation
MC_NITER = 5       # number of iterations for Monte-Carlo cross-validation

Pickle files for saving results.

# 1. Description of structured prediction

## 1.1 Structured Predition using PyStruct

We will analyse the process of using structured SVM to training a CRF and make preditions on new instances.

Recall that the 1-slack formulation (with margin rescaling) of structured SVM is

\begin{align}
\min_{\mathbf{w}, \xi \ge 0} & \frac{1}{2} \mathbf{w}^T \mathbf{w} + C \xi \\
s.t. \forall(\bar{y}_1, \dots, \bar{y}_n) \in \mathcal{Y}^n: & 
\frac{1}{n} \mathbf{w}^T \sum_{i=1}^n \left( \Psi(x_i, y_i) - \Psi(x_i, \bar{y}_i) \right) \ge 
\frac{1}{n} \sum_{i=1}^n \Delta(y_i, \bar{y}_i) - \xi
\end{align}

Where 
- $\mathbf{w}$ is the parameter vector
- $\Psi(x_i, y_i)$ is the joint feature (vector) related to example $x_i$ and its label $y_i$
- The size of $\mathbf{w}$ is the same as $\Psi(x_i, y_i)$
- $\Delta(\centerdot)$ is the loss function, here we use Hamming loss, i.e., per-variable 0-1 loss, as indicated by function [loss()](https://github.com/pystruct/pystruct/blob/master/pystruct/models/base.py) and [fit()](https://github.com/pystruct/pystruct/blob/master/pystruct/learners/one_slack_ssvm.py).
- $n$ is the total number of training examples, $C$ is the regularisation parameter, $\xi$ is the slack variable

### 1.1.1 Basics

Before introducing the training and prediction procedure, we define some concepts that will be used later.
- `n_states`: #states for all variables, (this is the total number of unique POIs in training set here).
- `n_features`: #features per node, (this is the number of POI features, i.e., the ranking probabilities of all POIs).
- `n_edges`: #edges in each training/test example, (this is the number of POIs in a trajectory).
- `n_edge_features`: #features per edge, (this is the number of features for each transition, 
   i.e., the out-going transition probabilities to all POIs).
- $x$ is made up of three parts: (`node_features`, `edges`, `edge_features`).
- `node_features`: `n_nodes` $\times$ `n_features`
- `edge_features`: `n_edges` $\times$ `n_edge_features`
- `edges`: `n_edges` $\times$ $2$, e.g. for trajectory `[3, 1, 2]` and `[5, 9, 6]`, their `edges` are the same matrix
   `[[0, 1], [1, 2]]`.

For [EdgeFeatureGraphCRF](https://pystruct.github.io/generated/pystruct.models.EdgeFeatureGraphCRF.html), the pairwise potentials are asymmetric and shared over all edges, and the size of 
- **Parameter vector $\mathbf{w}$: `n_states` $\times$ `n_features` $+$ `n_edge_features` $\times$ `(n_states)`$^2$**
- The first part of $\mathbf{w}$, let's call it **`unary_params`**: `n_states` $\times$ `n_features`, is the parameters 
  used to compute unary potentials.
- The second part of $\mathbf{w}$, let's call it **`pairwise_params`**: `n_edge_features` $\times$ `(n_states)`$^2$, 
  is the parameters used to compute pairwise potentials. 

### 1.1.2 Training

#### Compute the joint feature vector $\Psi(x, y)$

When training a CRF using [OneSlackSSVM](https://pystruct.github.io/generated/pystruct.learners.OneSlackSSVM.html), we need to compute the joint feature vector $\Psi(x, y)$ for each training example, it is computed 
(for EdgeFeatureGraphCRF in PyStruct) as follows:

**Unary part of $\Psi(x, y)$**:
- make one-hot encoding of $y$, its size: `n_nodes` $\times$ `n_states`
- *value*: $y^T \times$ `node_features`
- *dimension*: `(n_nodes` $\times$ `n_states)`$^T$ $\times$ `(n_nodes` $\times$ `n_features)` 
  $\to$ `(n_states` $\times$ `n_features)`

**Pairwise Part of $\Psi(x, y)$**:
- make one-hot encoding of `edges`, its size: `n_edges` $\times$ `(n_states)`$^2$
- *value*: `edge_features`$^T$ $\times$ `edges`
- *dimension*: `(n_edges` $\times$ `n_edge_features)`$^T$ $\times$ `(n_edges` $\times$ `(n_states)`$^2$

Then for each training example, $\Psi(x_i, y_i)$ = `[unary part, pairwise part]`, solve the above QP problem (1-slack formulation) to get a parameter vector $\mathbf{w}$.

### 1.1.3 Prediction

As a trajectory is chain structured, so we use `max-product` belief propagation (Viterbi algorithm in this case) to do inference in the trained CRF.  
To predict the label of a new instance $x$, we need to compute the unary potential and pairwise potential of $x$.

**Unary potential:** 
- *value*: `node_features` $\times$ `(unary_params)`$^T$ (first part of $\mathbf{w}$)
- *dimension*: `(n_nodes` $\times$ `n_features)` $\times$ `(n_states` $\times$ `n_features)`$^T$ $\to$ 
  `(n_nodes` $\times$ `n_states)`

**Pairwise potential:**
- *value*: `edge_features` $\times$ `pairwise_params` (second part of $\mathbf{w}$)
- *dimension*: `(n_edges` $\times$ `n_edge_features)` $\times$ `(n_edge_features` $\times$ `n_states`$^2$ `)`, 
  reshape to `(n_edges` $\times$ `n_states` $\times$ `n_states)`

With unary potential and pairwise potential computed, as we could know from `edges` that our example $x$ is chain structured, so we do inference using Viterbi algorithm to compute the most likely label of $x$.

## 1.2 Node Features - POI/Query Specific Features

For a trajectory `[start, ..., end]`, the features used to train/test are those that used to rank POIs.

[PyStruct](https://pystruct.github.io/) assumes that [label `y` is a discrete vector](https://pystruct.github.io/intro.html) and [pystruct.learners assume labels `y` are integers starting with `0`](https://github.com/pystruct/pystruct/issues/114), concretely,
- values in label vector $y$ should satisfy $y_i \in Y$, 
  where $Y$ is the **index** of a discrete value space, and the index starts at 0.
- label vector $y$ will be [transformed to one hot encoding (see function `joint_feature()`)](https://github.com/pystruct/pystruct/blob/master/pystruct/models/graph_crf.py).

For example, if labels in training set is `[[1, 2], [0, 4, 9]]`, 
then it will cause an index out of bounds error as pystruct did something like this,
1. construct an discrete value space: 
   - `set([1, 2] + [0, 4, 9]) -> {0, 1, 2, 4, 9}`
   - `size({0, 1, 2, 4, 9}) = 5`
1. convert labels using one hot encoding: 
   - label vector `[1, 2]` will be converted to a matrix of shape $2 \times 5$,
     with cells at `(0, 1), (1, 2)` set to `1` and others set to `0`.
   - label vector `[0, 4, 9]` will be converted to a matrix of shape $3 \times 5$,
     with cells at `(0, 0)`, `(1, 4)`, **`(2, 9)` INDEX_OUT_OF_BOUNDS** set to `1` and others set to `0`.

Thus need to build a mapping for POIs: *POI_ID $\to$ POI_INDEX* with POIs in trajectories in training set, also a map of the reverse direction.

### 1.2.1 Feature Scaling

Scale the joint features (when training) linearly to `[-1, 1]`, i.e., for feature $x$, we fit a linear function
\begin{equation}
    f(x) = ax + b 
\end{equation}
such that 
\begin{equation}
    a x_\texttt{max} + b = +1
\end{equation}
\begin{equation}
    a x_\texttt{min} + b = -1
\end{equation}

Solve the above linear equations result in a function
\begin{equation}
    f(x) = -1 + \frac{2(x-x_\texttt{min})}
                     {x_\texttt{max} - x_\texttt{min}}
\end{equation}
This approach is used by [libsvm and ranksvm](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/ranksvm/liblinear-ranksvm-2.1.zip), one can find the code at lines `349-383` in `svm-scale.c` (function `output` and `output_target`).

In addition, for features with uniform values, we set them to `0`, i.e., 
\begin{equation}
    \textbf{if}~ x_\texttt{max} == x_\texttt{min} ~\textbf{then}~ f(x) = 0
\end{equation}

In [None]:
%%cython
#%%cython -a
cimport numpy as np # for np.ndarray
import numpy as np # for np.shape

cpdef tuple scale_features_linear(
    np.ndarray[dtype=np.float64_t, ndim=2] node_features, 
    np.ndarray[dtype=np.float64_t, ndim=3] edge_features, 
    np.ndarray[dtype=np.float64_t, ndim=2] node_max, 
    np.ndarray[dtype=np.float64_t, ndim=2] node_min, 
    np.ndarray[dtype=np.float64_t, ndim=3] edge_max, 
    np.ndarray[dtype=np.float64_t, ndim=3] edge_min):
    
    # DEBUG
    #return (node_features, edge_features)
    
    assert(np.shape(node_features) == np.shape(node_max) == np.shape(node_min))
    assert(np.shape(edge_features) == np.shape(edge_max) == np.shape(edge_min))
    
    # x_max == x_min means feature x has uniform (such as constant) values, i.e. x == x_max == x_min
    #node_delta = node_max - node_min
    #edge_delta = edge_max - edge_min
    #node_delta[np.abs(node_delta) < 1e-9] = 1.
    #edge_delta[np.abs(edge_delta) < 1e-9] = 1.
    #return (2 * np.divide(x0-node_min, node_delta) - 1, 2 * np.divide(x1-edge_min, edge_delta) - 1)

    #TODO: loop-over each element using cython
    # max <=1 and min >= -1 and x in [-1, 1], no need to scale
    # max == min, set x = 0
    # boolean features, no scaling
    cdef int I, J, K, M, N
    M, N = np.shape(node_features)
    for m in range(M):
        for n in range(N):
            # skip features distributed in [-1, 1] and single-valued features 
            if (node_max[m, n] > 1. or node_min[m, n] < -1.) and node_max[m, n] - node_min[m, n] > 1e-6:
                    node_features[m, n] = 2. * (node_features[m,n]-node_min[m,n]) / (node_max[m,n]-node_min[m,n]) - 1
                         
            #if node_max[m, n] < 1.1 and node_min[m, n] > -1.1 and -1.1 < node_features[m, n] < 1.1: continue
            #elif np.fabs(node_max[m, n] - node_min[m, n]) < 1e-9: node_features[m, n] = 0. #continue
            #else: node_features[m, n] = 2. * (node_features[m,n]-node_min[m,n]) / (node_max[m,n]-node_min[m,n]) - 1
            
    I, J, K = np.shape(edge_features)
    for i in range(I):
        for j in range(J):
            for k in range(K):
                #if edge_max[i,j,k] < 1.1 and edge_min[i,j,k] > -1.1 and -1.1 < edge_features[i,j,k] < 1.1: continue
                #elif np.fabs(edge_max[i,j,k] - edge_min[i,j,k]) < 1e-9: edge_features[i,j,k] = 0. #continue
                #else:edge_features[i,j,k]=2.*(edge_features[i,j,k]-edge_min[i,j,k])/(edge_max[i,j,k]-edge_min[i,j,k])-1
                if (edge_max[i,j,k] > 1. or edge_min[i,j,k] < -1.) and edge_max[i,j,k] - edge_min[i,j,k] > 1e-6:
                        edge_features[i,j,k] = \
                        2. * (edge_features[i,j,k] - edge_min[i,j,k]) / (edge_max[i,j,k] - edge_min[i,j,k]) - 1
    return (node_features, edge_features)


cpdef scale_vector(np.ndarray[dtype=np.float64_t, ndim=1] features_unscaled, 
                              np.ndarray[dtype=np.float64_t, ndim=1] features_max, 
                              np.ndarray[dtype=np.float64_t, ndim=1] features_min):
    # DEBUG
    return features_unscaled

    assert(np.shape(features_unscaled) == np.shape(features_max) == np.shape(features_min))
    cdef int N, n
    N = np.shape(features_unscaled)[0]
    feature_scaled = np.zeros(N, dtype=np.float64)
    
    for n in range(N):
        if (features_max[n] > 1. or features_min[n] < -1.) and features_max[n] - features_min[n] > 1e-6:
            feature_scaled[n] = -1 + 2. * (features_unscaled[n]-features_min[n]) / (features_max[n]-features_min[n])
    return feature_scaled


cpdef tuple scale_features_norm(
    np.ndarray[dtype=np.float64_t, ndim=2] node_features, 
    np.ndarray[dtype=np.float64_t, ndim=3] edge_features, 
    np.ndarray[dtype=np.float64_t, ndim=2] node_mean, 
    np.ndarray[dtype=np.float64_t, ndim=2] node_std, 
    np.ndarray[dtype=np.float64_t, ndim=3] edge_mean, 
    np.ndarray[dtype=np.float64_t, ndim=3] edge_std):
    
    assert(np.shape(node_features) == np.shape(node_mean) == np.shape(node_std))
    assert(np.shape(edge_features) == np.shape(edge_mean) == np.shape(edge_std))
    
    #return (np.divide(x0-node_means, node_stds), np.divide(x1-edge_means, edge_stds))
    
    cdef int I, J, K, M, N
    cdef int i, j, k, m, n
    M, N = np.shape(node_features)
    for m in range(M):
        for n in range(N):
            # skip single-valued features
            if np.fabs(node_std[m, n]) > 1e-6:
                node_features[m, n] = (node_features[m, n] - node_mean[m, n]) / node_std[m, n]
            
    I, J, K = np.shape(edge_features)
    for i in range(I):
        for j in range(J):
            for k in range(K):
                if np.fabs(edge_std[i, j, k]) > 1e-6:
                    edge_features[i, j, k] = (edge_features[i, j, k] - edge_mean[i, j, k]) / edge_std[i, j, k]
    
    return (node_features, edge_features)


cpdef tuple build_joint_feature(np.ndarray[dtype=np.float64_t, ndim=2] node_features,
                                np.ndarray[dtype=np.float64_t, ndim=3] edge_features,
                                np.ndarray[dtype=np.float64_t, ndim=2] node_max, 
                                np.ndarray[dtype=np.float64_t, ndim=2] node_min, 
                                np.ndarray[dtype=np.float64_t, ndim=3] edge_max, 
                                np.ndarray[dtype=np.float64_t, ndim=3] edge_min,
                                np.ndarray[dtype=np.long_t, ndim=1] y):
    cdef int j
    L = np.shape(y)[0]

    unary_features = np.zeros(np.shape(node_features), dtype=np.float)
    pw_features = np.zeros(np.shape(edge_features), dtype=np.float)
    
    unary_features[y[0], :] = node_features[y[0], :]
    
    for j in range(L-1):
        ss, tt = y[j], y[j+1]
        unary_features[tt, :] = node_features[tt, :]
        pw_features[ss, tt, :] = edge_features[ss, tt, :]

    # feature scaling here
    ret = scale_features_linear(unary_features, pw_features,
                                node_max=node_max, node_min=node_min,
                                edge_max=edge_max, edge_min=edge_min)
    return (ret[0], ret[1])

Sanity check.

In [None]:
nf = np.array([[1, 2, 3], [4, 5, 6]]).astype(np.float)
ef = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]).astype(np.float)
nmax = np.array([[2, 2, 3], [5, 5, 6]]).astype(np.float)
nmin = np.array([[1, 1, 2], [3, 3, 3]]).astype(np.float)
emax = np.array([[[2, 2], [4, 4]], [[6, 6], [9, 8]]]).astype(np.float)
emin = np.array([[[1, 1], [2, 2]], [[3, 3], [3, 3]]]).astype(np.float)

In [None]:
#scale_features_linear(nf, ef, nmax, nmin, emax, emin)

# 2. Inference

Inference for SSVM: loss-augmented inference for cutting-plane training and inference for prediction.

Examples for sanity check.

In [None]:
M0, L0 = 5, 3
w_u = np.array([1, 2, 3, 2, 3]).reshape((M0, 1))
f_u = np.array([2, 1, 1, 3, 1]).reshape((M0, 1))
w_p = np.array([1,1,1,1,3, 1,1,1,2,1, 1,3,1,1,1, 2,1,1,1,1, 1,1,3,1,1]).reshape((M0, M0, 1))
f_p = np.array([1,2,1,1,1, 1,1,1,1,3, 2,1,1,1,1, 1,1,3,1,1, 1,1,1,2,1]).reshape((M0, M0, 1))
ps0, y_true0 = 1, [1, 0, 2]

In [None]:
M0, L0 = 6, 4
w_u = np.array([1, 1, 1, 2, 1, 2]).reshape((M0, 1))
f_u = np.array([2, 1, 1, 2, 1, 1]).reshape((M0, 1))
w_p = np.array([1,1,1,1,3,2, 1,1,1,2,1,1, 1,3,1,1,1,2, 2,1,1,1,1,1, 1,1,3,1,1,1, 1,2,1,1,2,1]).reshape((M0, M0, 1))
f_p = np.array([1,2,1,1,1,1, 1,1,1,1,3,2, 2,1,1,1,1,2, 1,1,3,1,1,1, 1,1,1,2,1,1, 2,1,1,2,1,1]).reshape((M0, M0, 1))
ps0, y_true0 = 1, [1, 2, 0, 5]

## 2.1 Brute force search

Inference using **brute force search** (for sanity check).

In [None]:
def do_inference_bruteForce(ps, L, M, unary_params, pw_params, unary_features, pw_features, y_true=None):
    assert(L > 1)
    assert(L <= M)
    assert(ps >= 0)
    assert(ps < M)
    
    Cu = np.zeros(M, dtype=np.float)      # unary_param[p] x unary_features[p]
    Cp = np.zeros((M, M), dtype=np.float) # pw_param[pi, pj] x pw_features[pi, pj]
    # a intermediate POI should NOT be the start POI, NO self-loops
    for pi in range(M):
        Cu[pi] = np.dot(unary_params[pi, :], unary_features[pi, :]) # if pi != ps else -np.inf
        for pj in range(M):
            Cp[pi, pj] = -np.inf if (pj == ps or pi == pj) else np.dot(pw_params[pi, pj, :], pw_features[pi, pj, :])
            
    Q = []
    for x in itertools.permutations([p for p in range(M) if p != ps], L-1):
        #print([ps] + list(x))
        y = [ps] + list(x)
        score = 0
        for j in range(1, L):
            score += Cp[y[j-1], y[j]] + Cu[y[j]]
        if y_true is not None:
            score += np.sum(np.asarray(y) != np.asarray(y_true))
        priority = -score
        hq.heappush(Q, (priority, y))
    
    k = 20
    while k > 0 and len(Q) > 0:
        priority, pathwalk = hq.heappop(Q)
        print(pathwalk, -priority)
        k -= 1

In [None]:
#do_inference_bruteForce(ps0, L0, M0, w_u, w_p, f_u, f_p)

## 2.2 Greedy search

Inference using **greedy search** (baseline).

In [None]:
def do_inference_greedy(ps, L, M, unary_params, pw_params, unary_features, pw_features, y_true=None):
    assert(L > 1)
    assert(L <= M)
    assert(ps >= 0)
    assert(ps < M)
    
    Cu = np.zeros(M, dtype=np.float)      # unary_param[p] x unary_features[p]
    Cp = np.zeros((M, M), dtype=np.float) # pw_param[pi, pj] x pw_features[pi, pj]
    # a intermediate POI should NOT be the start POI, NO self-loops
    for pi in range(M):
        Cu[pi] = np.dot(unary_params[pi, :], unary_features[pi, :]) # if pi != ps else -np.inf
        for pj in range(M):
            Cp[pi, pj] = -np.inf if (pj == ps or pi == pj) else np.dot(pw_params[pi, pj, :], pw_features[pi, pj, :])
    
    y_hat = [ps]
    
    for t in range(1, L):
        candidate_points = [p for p in range(M) if p not in y_hat]
        p = y_hat[-1]
        maxix = np.argmax([Cp[p, p1] + Cu[p1] + float(p1 != y_true[t]) if y_true is not None else \
                           Cp[p, p1] + Cu[p1] for p1 in candidate_points])
        y_hat.append(candidate_points[maxix])
        
    return np.asarray(y_hat)

In [None]:
#print(do_inference_greedy(ps0, L0, M0, w_u, w_p, f_u, f_p))
#print(do_inference_greedy(ps0, L0, M0, w_u, w_p, f_u, f_p, y_true=y_true0))

## 2.3 The Viterbi algorithm

Inference using **the Viterbi algorithm**.

In [None]:
def do_inference_viterbi(ps, L, M, unary_params, pw_params, unary_features, pw_features, y_true=None):
    assert(L > 1)
    assert(L <= M)
    assert(ps >= 0)
    assert(ps < M)
    
    Cu = np.zeros(M, dtype=np.float)      # unary_param[p] x unary_features[p]
    Cp = np.zeros((M, M), dtype=np.float) # pw_param[pi, pj] x pw_features[pi, pj]
    # a intermediate POI should NOT be the start POI, NO self-loops
    for pi in range(M):
        Cu[pi] = np.dot(unary_params[pi, :], unary_features[pi, :]) # if pi != ps else -np.inf
        for pj in range(M):
            Cp[pi, pj] = -np.inf if (pj == ps or pi == pj) else np.dot(pw_params[pi, pj, :], pw_features[pi, pj, :])
    
    A = np.zeros((L-1, M), dtype=np.float)     # scores matrix
    B = np.ones((L-1, M), dtype=np.int) * (-1) # backtracking pointers
    
    for p in range(M): # ps--p
        A[0, p] = Cp[ps, p] + Cu[p]
        #if y_true is not None and p != ps: A[0, p] += float(p != y_true[1])/L  # loss term: normalised
        if y_true is not None and p != ps: A[0, p] += float(p != y_true[1])
        B[0, p] = ps

    for t in range(0, L-2):
        for p in range(M):
            #loss = float(p != y_true[l+2])/L if y_true is not None else 0  # loss term: normlised
            loss = float(p != y_true[t+2]) if y_true is not None else 0
            scores = [A[t, p1] + Cp[p1, p] + Cu[p] for p1 in range(M)] # ps~~p1--p
            maxix = np.argmax(scores)
            A[t+1, p] = scores[maxix] + loss
            #B[l+1, p] = np.array(range(N))[maxix]
            B[t+1, p] = maxix

    y_hat = [np.argmax(A[L-2, :])]
    p, t = y_hat[-1], L-2
    while t >= 0:
        y_hat.append(B[t, p])
        p, t = y_hat[-1], t-1
    y_hat.reverse()

    return np.asarray(y_hat)

In [None]:
#print(do_inference_viterbi(ps0, L0, M0, w_u, w_p, f_u, f_p))
#print(do_inference_viterbi(ps0, L0, M0, w_u, w_p, f_u, f_p, y_true=y_true0))

## 2.4 The list Viterbi algorithm

Inference using **the List Viterbi algorithm**, which *sequentially* find the (k+1)-th best path/walk given the 1st, 2nd, ..., k-th best paths/walks.

Reference papers:
- [*Sequentially finding the N-Best List in Hidden Markov Models*](http://www.eng.biu.ac.il/~goldbej/papers/ijcai01.pdf), Dennis Nilsson and Jacob Goldberger, IJCAI 2001.
- [*A tutorial on hidden Markov models and selected applications in speech recognition*](http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf), L.R. Rabiner, Proceedings of the IEEE, 1989.

Implementation is adapted from the above references.

In [None]:
class HeapItem:  # an item in heapq (min-heap)
    def __init__(self, priority, task):
        self.priority = priority
        self.task = task
        self.string = str(priority) + ': ' + str(task)
        
    def __lt__(self, other):
        return self.priority < other.priority
    
    def __repr__(self):
        return self.string
    
    def __str__(self):
        return self.string

In [None]:
def do_inference_listViterbi(ps, L, M, unary_params, pw_params, unary_features, pw_features, y_true=None, debug=False):
    assert(L > 1)
    assert(M >= L)
    assert(ps >= 0)
    assert(ps < M)
    
    Cu = np.zeros(M, dtype=np.float)      # unary_param[p] x unary_features[p]
    Cp = np.zeros((M, M), dtype=np.float) # pw_param[pi, pj] x pw_features[pi, pj]
    
    # a intermediate POI should NOT be the start POI, NO self-loops
    for pi in range(M):
        Cu[pi] = np.dot(unary_params[pi, :], unary_features[pi, :]) # if pi != ps else -np.inf
        for pj in range(M):
            Cp[pi, pj] = -np.inf if (pj == ps or pi == pj) else np.dot(pw_params[pi, pj, :], pw_features[pi, pj, :])
            
    # forward-backward procedure: adapted from the Rabiner paper
    Alpha = np.zeros((L, M), dtype=np.float)  # alpha_t(p_i)
    Beta  = np.zeros((L, M), dtype=np.float)  # beta_t(p_i)
    
    for pj in range(M): Alpha[1, pj] = Cp[ps, pj] + Cu[pj]
    for t in range(2, L):
        for pj in range(M):
            Alpha[t, pj] = np.max([Alpha[t-1, pi] + Cp[pi, pj] + Cu[pj] for pi in range(M)])
            
    for t in range(L-1, 1, -1):
        for pi in range(M):
            Beta[t-1, pi] = np.max([Cp[pi, pj] + Cu[pj] + Beta[t, pj] for pj in range(M)])        
    Beta[0, ps] = np.max([Cp[ps, pj] + Cu[pj] + Beta[1, pj] for pj in range(M)])
    
    Fp = np.zeros((L-1, M, M), dtype=np.float)  # f_{t, t+1}(p, p')
    
    for t in range(L-1):
        for pi in range(M):
            for pj in range(M):
                Fp[t, pi, pj] = Alpha[t, pi] + Cp[pi, pj] + Cu[pj] + Beta[t+1, pj]
                
    # identify the best path/walk: adapted from the IJCAI01 paper
    y_best = np.ones(L, dtype=np.int) * (-1)
    y_best[0] = ps
    maxix = np.argmax(Fp[0, ps, :])  # the start POI is specified
    y_best[1] = maxix
    for t in range(2, L): 
        y_best[t] = np.argmax(Fp[t-1, y_best[t-1], :])
        
    Q = []  # priority queue (min-heap)
    maxIter = np.power(M,L-1) - np.prod([M-kx for kx in range(1,L)]) + 1 #gauranteed to find a path in maxIter iterations
    if debug == True: maxIter = np.min([maxIter, 200]); print('#iterations:', maxIter) 
        
    # heap item for the best path/walk
    priority, partition_index, exclude_set = -np.max(Alpha[L-1, :]), None, set()  # -1 * score as priority
    hq.heappush(Q, HeapItem(priority, (y_best, partition_index, exclude_set)))
    
    histories = set()
        
    k = 0
    while len(Q) > 0 and k < maxIter:
        #print('------------------\n', Q, '\n------------------')
        hitem = hq.heappop(Q)
        k_priority, (k_best, k_partition_index, k_exclude_set) = hitem.priority, hitem.task
        k += 1
        
        histories.add(''.join([str(x) + ',' for x in k_best]))
        #print(k, len(histories))
        
        #print('pop:', k_priority, k_best, k_partition_index, k_exclude_set)
        if debug == True: 
            print(k_best, -k_priority)
        else:
            if len(set(k_best)) == L: return k_best
            
        
        # identify the (k+1)-th best path/walk given the 1st, 2nd, ..., k-th best: adapted from the IJCAI01 paper
        partition_index_start = 1
        if k_partition_index is not None:
            assert(k_partition_index > 0)
            assert(k_partition_index < L)
            partition_index_start = k_partition_index
            
        for parix in range(partition_index_start, L):    
            new_exclude_set = set({k_best[parix]})
            if parix == partition_index_start:
                new_exclude_set = new_exclude_set | k_exclude_set
            
            new_best = np.ones(L, dtype=np.int) * (-1)
            for pk in range(parix):
                new_best[pk] = k_best[pk]
            
            candidate_points = [p for p in range(M) if p not in new_exclude_set]
            if len(candidate_points) == 0: continue
            candidate_maxix = np.argmax([Fp[parix-1, k_best[parix-1], p] for p in candidate_points])
            new_best[parix] = candidate_points[candidate_maxix]
            
            for pk in range(parix+1, L):
                new_best[pk] = np.argmax([Fp[pk-1, new_best[pk-1], p] for p in range(M)])
            
            new_priority = Fp[parix-1, k_best[parix-1], new_best[parix]]
            if k_partition_index is not None:
                new_priority += (-k_priority) - Fp[parix-1, k_best[parix-1], k_best[parix]]
            new_priority *= -1.0  # NOTE: -np.inf - np.inf + np.inf = nan
            
            #if debug == True and np.isnan(new_priority):
            #    print(Fp[parix-1,k_best[parix-1],new_best[parix]], (-k_priority), \
            #          Fp[parix-1,k_best[parix-1],k_best[parix]])
            #    print(Fp[parix-1,k_best[parix-1],new_best[parix]] - k_priority - \
            #          Fp[parix-1,k_best[parix-1],k_best[parix]])
                
            #print(' '*3, 'push:', new_priority, new_best, parix, new_exclude_set)
            
            hq.heappush(Q, HeapItem(new_priority, (new_best, parix, new_exclude_set)))
            #print('------------------\n', Q, '\n------------------')
    if debug == True: print('#iterations: %d, #distinct_trajectories: %d' % (k, len(histories)))

In [None]:
#do_inference_listViterbi(ps0, L0, M0, w_u, w_p, f_u, f_p, debug=True)

In [None]:
#print(do_inference_listViterbi(ps0, L0, M0, w_u, w_p, f_u, f_p))

## 2.5 Integer linear programming

Inference using **integer linear programming (ILP)** to avoid sub-tours.

In [None]:
def do_inference_ILP(ps, L, M, unary_params, pw_params, unary_features, pw_features, y_true=None):
    assert(L > 1)
    assert(L <= M)
    assert(ps >= 0)
    assert(ps < M)
    p0 = str(ps)
    
    #print('===:', p0)
    
    pois = [str(p) for p in range(M)] # create a string list for each POI
    pb = pulp.LpProblem('Inference_ILP', pulp.LpMaximize) # create problem
    # visit_i_j = 1 means POI i and j are visited in sequence
    visit_vars = pulp.LpVariable.dicts('visit', (pois, pois), 0, 1, pulp.LpInteger) 
    # isend_l = 1 means POI l is the END POI of trajectory
    isend_vars = pulp.LpVariable.dicts('isend', pois, 0, 1, pulp.LpInteger) 
    # a dictionary contains all dummy variables
    dummy_vars = pulp.LpVariable.dicts('u', [x for x in pois if x != p0], 2, M, pulp.LpInteger)
    
    # add objective
    objlist = []
    for pi in pois:     # from
        for pj in pois: # to
            objlist.append(visit_vars[pi][pj] * (np.dot(unary_params[int(pj)], unary_features[int(pj)]) + \
                                                 np.dot(pw_params[int(pi), int(pj)], pw_features[int(pi), int(pj)])))
    if y_true is not None: # Loss: normalised number of mispredicted POIs, Hamming loss is non-linear of 'visit'
        objlist.append(1)
        for j in range(M):
            pj = pois[j]
            for k in range(1, L): 
                pk = str(y_true[k])
                #objlist.append(-1.0 * visit_vars[pj][pk] / L) # loss term: normalised
                objlist.append(-1.0 * visit_vars[pj][pk])
    pb += pulp.lpSum(objlist), 'Objective'
    
    # add constraints, each constraint should be in ONE line
    pb += pulp.lpSum([visit_vars[pi][pi] for pi in pois]) == 0, 'NoSelfLoops'
    pb += pulp.lpSum([visit_vars[p0][pj] for pj in pois]) == 1, 'StartAt_p0'
    pb += pulp.lpSum([visit_vars[pi][p0] for pi in pois]) == 0, 'NoIncoming_p0'
    pb += pulp.lpSum([visit_vars[pi][pj] for pi in pois for pj in pois]) == L-1, 'Length'
    pb += pulp.lpSum([isend_vars[pi] for pi in pois]) == 1, 'OneEnd'
    pb += isend_vars[p0] == 0, 'StartNotEnd'
    
    for pk in [x for x in pois if x != p0]:
        pb += pulp.lpSum([visit_vars[pi][pk] for pi in pois]) == isend_vars[pk] + \
              pulp.lpSum([visit_vars[pk][pj] for pj in pois if pj != p0]), 'ConnectedAt_' + pk
        pb += pulp.lpSum([visit_vars[pi][pk] for pi in pois]) <= 1, 'Enter_' + pk + '_AtMostOnce'
        pb += pulp.lpSum([visit_vars[pk][pj] for pj in pois if pj != p0]) + isend_vars[pk] <= 1, \
              'Leave_' + pk + '_AtMostOnce'
    for pi in [x for x in pois if x != p0]:
        for pj in [y for y in pois if y != p0]:
            pb += dummy_vars[pi] - dummy_vars[pj] + 1 <= (M - 1) * (1 - visit_vars[pi][pj]), \
                    'SubTourElimination_' + pi + '_' + pj
    #pb.writeLP("traj_tmp.lp")
    
    # solve problem: solver should be available in PATH
    if USE_GUROBI == True:
        gurobi_options = [('TimeLimit', '7200'), ('Threads', str(N_JOBS)), ('NodefileStart', '0.2'), ('Cuts', '2')]
        pb.solve(pulp.GUROBI_CMD(path='gurobi_cl', options=gurobi_options)) # GUROBI
    else:
        pb.solve(pulp.COIN_CMD(path='cbc', options=['-threads', str(N_JOBS), '-strategy', '1', '-maxIt', '2000000']))#CBC
    visit_mat = pd.DataFrame(data=np.zeros((len(pois), len(pois)), dtype=np.float), index=pois, columns=pois)
    isend_vec = pd.Series(data=np.zeros(len(pois), dtype=np.float), index=pois)
    for pi in pois:
        isend_vec.loc[pi] = isend_vars[pi].varValue
        for pj in pois: visit_mat.loc[pi, pj] = visit_vars[pi][pj].varValue
    #visit_mat.to_csv('visit.csv')

    # build the recommended trajectory
    recseq = [p0]
    while True:
        pi = recseq[-1]
        pj = visit_mat.loc[pi].idxmax()
        value = visit_mat.loc[pi, pj]
        #print(value, int(round(value)))
        #print(recseq)
        assert(int(round(value)) == 1)
        recseq.append(pj)
        if len(recseq) == L: 
            assert(int(round(isend_vec[pj])) == 1)
            #print('===:', recseq, ':====')
            return np.asarray([int(x) for x in recseq])

In [None]:
#print(do_inference_ILP(ps0, L0, M0, w_u, w_p, f_u, f_p))
#print(do_inference_ILP(ps0, L0, M0, w_u, w_p, f_u, f_p, y_true=y_true0))

# 3. Structured SVM

In [None]:
class MyModel(StructuredModel):
    
    def __init__(self, n_states=None, n_features=None, n_edge_features=None, inference_fun=do_inference_listViterbi):
        self.inference_method = 'customized'
        self.inference_fun = inference_fun
        self.class_weight = None
        self.inference_calls = 0
        self.n_states = n_states
        self.n_features = n_features
        self.n_edge_features = n_edge_features
        self._set_size_joint_feature()
        self._set_class_weight()

        
    def _set_size_joint_feature(self):
        if None not in [self.n_states, self.n_features, self.n_edge_features]:
            self.size_joint_feature = self.n_states * self.n_features + \
                                      self.n_states * self.n_states * self.n_edge_features
   

    def loss(self, y, y_hat):
        #return np.mean(np.asarray(y) != np.asarray(y_hat))     # hamming loss (normalised)
        return np.sum(np.asarray(y) != np.asarray(y_hat))     # hamming loss
        #return loss_F1(y, y_hat)      # F1 loss
        #return loss_pairsF1(y, y_hat) # pairsF1 loss
        #return loss_pairsF1(np.array(y), np.array(y_hat)) # pairsF1 loss

    
    def initialize(self, X, Y):
        assert(len(X) == len(Y))
        n_features = X[0][0].shape[1]
        if self.n_features is None: 
            self.n_features = n_features
        else:
            assert(self.n_features == n_featurees)

        n_states = len(np.unique(np.hstack([y.ravel() for y in Y])))
        if self.n_states is None: 
            self.n_states = n_states
        else:
            assert(self.n_states == n_states)
            
        n_edge_features = X[0][1].shape[2]
        if self.n_edge_features is None:
            self.n_edge_features = n_edge_features
        else:
            assert(self.n_edge_features == n_edge_features)
            
        self._set_size_joint_feature()
        self._set_class_weight()
        
        # joint feature scaling
        #n_samples = len(Y)
        #node_features_all = np.zeros((n_samples, self.n_states, self.n_features), dtype=np.float)
        #edge_features_all = np.zeros((n_samples, self.n_states, self.n_states, self.n_edge_features), dtype=np.float)
        #for ii in range(n_samples):
        #    x0, x1, y = X[ii][0], X[ii][1], Y[ii]
        #    node_features_all[ii, y[0], :] = x0[y[0], :]
        #    for jj in range(len(y)-1):
        #        ss, tt = y[jj], y[jj+1]
        #        node_features_all[ii, tt, :] = x0[tt, :]
        #        edge_features_all[ii, ss, tt, :] = x1[ss, tt, :]
        
        #node_max = np.max(node_features_all, axis=0)
        #node_min = np.min(node_features_all, axis=0)
        #edge_max = np.max(edge_features_all, axis=0)
        #edge_min = np.min(edge_features_all, axis=0)
        #assert(node_max.shape == (self.n_states, self.n_features))
        #assert(node_min.shape == (self.n_states, self.n_features))
        #assert(edge_max.shape == (self.n_states, self.n_states, self.n_edge_features))
        #assert(edge_min.shape == (self.n_states, self.n_states, self.n_edge_features))
        
        #node_mean = np.mean(node_features_all, axis=0)
        #edge_mean = np.mean(edge_features_all, axis=0)
        #node_std = np.std(node_features_all, axis=0)
        #edge_std = np.std(edge_features_all, axis=0)
        #assert(node_mean.shape == (self.n_states, self.n_features))
        #assert(node_std.shape  == (self.n_states, self.n_features))
        #assert(edge_mean.shape == (self.n_states, self.n_states, self.n_edge_features))
        #assert(edge_std.shape  == (self.n_states, self.n_states, self.n_edge_features))
        
        # save for scaling test data
        #self.node_max = node_max; self.node_min = node_min
        #self.edge_max = edge_max; self.edge_min = edge_min
        #self.node_mean = node_mean; self.node_std = node_std
        #self.edge_mean = edge_mean; self.edge_std = edge_std
        
        # scaling features
        #for ii in range(n_samples):
        #    unaries, pw = scale_features_linear(X[ii][0], X[ii][1],
        #                                        node_max=self.node_max, node_min=self.node_min,
        #                                        edge_max=self.edge_max, edge_min=self.edge_min)
        #    X[ii] = (unaries, pw, X[ii][2])
        

    def __repr__(self):
        return ("%s(n_states: %d, inference_method: %s, n_features: %d, n_edge_features: %d)"
                % (type(self).__name__, self.n_states, self.inference_method, self.n_features, self.n_edge_features))
    
    
    def joint_feature(self, x, y):
        assert(not isinstance(y, tuple))
        unary_features = x[0] # unary features of all POIs: n_POIs x n_features
        pw_features = x[1]    # pairwise features of all transitions: n_POIs x n_POIs x n_edge_features
        query = x[2]          # query = (startPOI, length)
        n_nodes = query[1]
        
        #print('y:', y)
        
        #assert(unary_features.ndim == 2)
        #assert(pw_features.ndim == 3)
        #assert(len(query) == 3)
        assert(n_nodes == len(y))
        assert(unary_features.shape == (self.n_states, self.n_features))
        assert(pw_features.shape == (self.n_states, self.n_states, self.n_edge_features))
        
        node_features = np.zeros((self.n_states, self.n_features), dtype=np.float)
        edge_features = np.zeros((self.n_states, self.n_states, self.n_edge_features), dtype=np.float)
        
        node_features[y[0], :] = unary_features[y[0], :]
        for j in range(len(y)-1):
            ss, tt = y[j], y[j+1]
            node_features[tt, :] = unary_features[tt, :]
            edge_features[ss, tt, :] = pw_features[ss, tt, :]
        
        # sum node/edge features after scaling: 
        # equivalent to share parameters between features of different POIs/transitions
        joint_feature_vector = np.hstack([node_features.ravel(), edge_features.ravel()])
        
        return joint_feature_vector
            
    
    def loss_augmented_inference(self, x, y, w, relaxed=None):
        #print('loss_augmented_inference:', y)
        # inference procedure for training: (x, y) from training set (with features already scaled)
        #
        # argmax_y_hat np.dot(w, joint_feature(x, y_hat)) + loss(y, y_hat)
        # 
        # the loss function should be decomposible in order to use Viterbi decoding, here we use Hamming loss
        #
        # x[0]: (unscaled) unary features of all POIs: n_POIs x n_features
        # x[1]: (unscaled) pairwise features of all transitions: n_POIs x n_POIs x n_edge_features
        # x[2]: query = (startPOI, length)
        unary_features = x[0]
        pw_features = x[1]
        query = x[2]
        
        assert(unary_features.ndim == 2)
        assert(pw_features.ndim == 3)
        assert(len(query) == 2)
        
        ps = query[0]
        L = query[1]
        M = unary_features.shape[0]  # total number of POIs
        
        self._check_size_w(w)
        unary_params = w[:self.n_states * self.n_features].reshape((self.n_states, self.n_features))
        pw_params = w[self.n_states * self.n_features:].reshape((self.n_states, self.n_states, self.n_edge_features))
        
        y_hat = do_inference_viterbi(ps, L, M, unary_params, pw_params, unary_features, pw_features, y_true=y)
        
        #y_hat = do_inference_ILP(ps, L, N, unary_params, pw_params, unary_features, pw_features, y_true=y)
        #assert(len(y_hat) == len(set(y_hat)))
        
        return y_hat

    
    def inference(self, x, w, relaxed=False, return_energy=False):
        #print('inference')
        # inference procedure for testing: x from test set (features needs to be scaled)
        #
        # argmax_y np.dot(w, joint_feature(x, y))
        #
        # x[0]: (unscaled) unary features of all POIs: n_POIs x n_features
        # x[1]: (unscaled) pairwise features of all transitions: n_POIs x n_POIs x n_edge_features
        # x[2]: query = (startPOI, length)
        unary_features = x[0]
        pw_features = x[1]
        query = x[2]
        
        assert(unary_features.ndim == 2)
        assert(pw_features.ndim == 3)
        assert(len(query) == 2)
        
        ps = query[0]
        L = query[1]
        M = unary_features.shape[0]  # total number of POIs
        
        self._check_size_w(w)
        unary_params = w[:self.n_states * self.n_features].reshape((self.n_states, self.n_features))
        pw_params = w[self.n_states * self.n_features:].reshape((self.n_states, self.n_states, self.n_edge_features))
        
        #y_pred = do_inference_viterbi(ps, L, M, unary_params, pw_params, unary_features, pw_features)
        #y_pred = do_inference_listViterbi(ps, L, M, unary_params, pw_params, unary_features, pw_features)
        #y_pred = do_inference_ILP(ps, L, M, unary_params, pw_params, unary_features, pw_features)
        #assert(len(y_pred) == len(set(y_pred)))
        y_pred = self.inference_fun(ps, L, M, unary_params, pw_params, unary_features, pw_features)
        
        return y_pred

Compute node features (singleton).

In [None]:
def calc_node_features(startPOI, nPOI, poi_ix, poi_info, poi_clusters, cats, clusters):
    """
    Generate feature vectors for all POIs given query (startPOI, nPOI)
    """
    assert(isinstance(cats, list))
    assert(isinstance(clusters, list))
    
    columns = DF_COLUMNS[3:]
    poi_distmat = POI_DISTMAT
    query_id_dict = QUERY_ID_DICT
    key = (p0, trajLen) = (startPOI, nPOI)
    assert(key in query_id_dict)
    assert(p0 in poi_info.index)
    
    # DEBUG: use uniform node features
    nrows = len(poi_ix)
    ncols = len(columns) + len(cats) + len(clusters) - 2
    #return np.ones((nrows, ncols), dtype=np.float)
    #return np.zeros((nrows, ncols), dtype=np.float)
    
    poi_list = poi_ix
    df_ = pd.DataFrame(index=np.arange(len(poi_list)), columns=columns)
        
    for i in range(df_.index.shape[0]):
        poi = poi_list[i]
        lon, lat = poi_info.loc[poi, 'poiLon'], poi_info.loc[poi, 'poiLat']
        pop, nvisit = poi_info.loc[poi, 'popularity'], poi_info.loc[poi, 'nVisit']
        cat, cluster = poi_info.loc[poi, 'poiCat'], poi_clusters.loc[poi, 'clusterID']
        duration = poi_info.loc[poi, 'avgDuration']
        idx = df_.index[i]
        df_.set_value(idx, 'category', tuple((cat == np.array(cats)).astype(np.int) * 2 - 1))
        df_.set_value(idx, 'neighbourhood', tuple((cluster == np.array(clusters)).astype(np.int) * 2 - 1))
        df_.loc[idx, 'popularity'] = LOG_SMALL if pop < 1 else np.log10(pop)
        df_.loc[idx, 'nVisit'] = LOG_SMALL if nvisit < 1 else np.log10(nvisit)
        df_.loc[idx, 'avgDuration'] = LOG_SMALL if duration < 1 else np.log10(duration)
        df_.loc[idx, 'trajLen'] = trajLen
        df_.loc[idx, 'sameCatStart'] = 1 if cat == poi_all.loc[p0, 'poiCat'] else -1
        df_.loc[idx, 'distStart'] = poi_distmat.loc[poi, p0]
        df_.loc[idx, 'diffPopStart'] = pop - poi_info.loc[p0, 'popularity']
        df_.loc[idx, 'diffNVisitStart'] = nvisit - poi_info.loc[p0, 'nVisit']
        df_.loc[idx, 'diffDurationStart'] = duration - poi_info.loc[p0, 'avgDuration']
        df_.loc[idx, 'sameNeighbourhoodStart'] = 1 if cluster == poi_clusters.loc[p0, 'clusterID'] else -1
    
    # features other than category and neighbourhood
    X = df_[list(set(df_.columns) - {'category', 'neighbourhood'})].values  
    
    # boolean features: category (+1, -1)
    cat_features = np.vstack([list(df_.loc[x, 'category']) for x in df_.index])
    
    # boolean features: neighbourhood (+1, -1)
    neigh_features = np.vstack([list(df_.loc[x, 'neighbourhood']) for x in df_.index])
    
    return np.hstack([X, cat_features, neigh_features]).astype(np.float)

Compute edge features (transiton / pairwise).

In [None]:
def calc_edge_features(trajid_list, poi_ix, traj_dict, poi_info):    
    feature_names = ['poiCat', 'popularity', 'nVisit', 'avgDuration', 'clusterID']
    n_features = len(feature_names)
    
    # DEBUG: use uniform edge features
    #return np.ones((len(poi_ix), len(poi_ix), n_features), dtype=np.float)
    #return np.zeros((len(poi_ix), len(poi_ix), n_features), dtype=np.float)
    
    transmat_cat                        = gen_transmat_cat(trajid_list, traj_dict, poi_info)
    transmat_pop,      logbins_pop      = gen_transmat_pop(trajid_list, traj_dict, poi_info)
    transmat_visit,    logbins_visit    = gen_transmat_visit(trajid_list, traj_dict, poi_info)
    transmat_duration, logbins_duration = gen_transmat_duration(trajid_list, traj_dict, poi_info)
    transmat_neighbor, poi_clusters     = gen_transmat_neighbor(trajid_list, traj_dict, poi_info)
    
    poi_features = pd.DataFrame(data=np.zeros((len(poi_ix), len(feature_names))), \
                                columns=feature_names, index=poi_ix)
    poi_features.index.name = 'poiID'
    poi_features['poiCat'] = poi_info.loc[poi_ix, 'poiCat']
    poi_features['popularity'] = np.digitize(poi_info.loc[poi_ix, 'popularity'], logbins_pop)
    poi_features['nVisit'] = np.digitize(poi_info.loc[poi_ix, 'nVisit'], logbins_visit)
    poi_features['avgDuration'] = np.digitize(poi_info.loc[poi_ix, 'avgDuration'], logbins_duration)
    poi_features['clusterID'] = poi_clusters.loc[poi_ix, 'clusterID']
    
    edge_features = np.zeros((len(poi_ix), len(poi_ix), n_features), dtype=np.float64)
    
    for j in range(len(poi_ix)): # NOTE: POI order
        pj = poi_ix[j]
        cat, pop = poi_features.loc[pj, 'poiCat'], poi_features.loc[pj, 'popularity']
        visit, cluster = poi_features.loc[pj, 'nVisit'], poi_features.loc[pj, 'clusterID']
        duration = poi_features.loc[pj, 'avgDuration']
        
        for k in range(len(poi_ix)): # NOTE: POI order
            pk = poi_ix[k]
            edge_features[j, k, :] = np.log10( np.array(
                    [transmat_cat.loc[cat, poi_features.loc[pk, 'poiCat']], \
                     transmat_pop.loc[pop, poi_features.loc[pk, 'popularity']], \
                     transmat_visit.loc[visit, poi_features.loc[pk, 'nVisit']], \
                     transmat_duration.loc[duration, poi_features.loc[pk, 'avgDuration']], \
                     transmat_neighbor.loc[cluster, poi_features.loc[pk, 'clusterID']]] ) )
    return edge_features

In [None]:
class SSVM:
    def __init__(self, C=1.0, inference_fun=do_inference_listViterbi, debug=False):
        assert(C > 0)
        self.C = C
        self.inference_fun = inference_fun
        self.debug = debug
        self.trained = False
        
        if ABS_SCALER == True:
            self.scaler = MaxAbsScaler(copy=False)
        else:
            self.scaler = MinMaxScaler(feature_range=(-1,1), copy=False)
            #self.scaler = StandardScaler(copy=False)
        

        
    def train(self, trajid_set_train):
        self.poi_info = calc_poi_info(list(trajid_set_train), traj_all, poi_all)

        # build POI_ID <--> POI__INDEX mapping for POIs used to train CRF
        # which means only POIs in traj such that len(traj) >= 2 are included
        poi_set = set()
        for x in trajid_set_train:
            if len(traj_dict[x]) >= 2:
                poi_set = poi_set | set(traj_dict[x])
        self.poi_ix = sorted(poi_set)
        self.poi_id_dict, self.poi_id_rdict = dict(), dict()
        for idx, poi in enumerate(self.poi_ix):
            self.poi_id_dict[poi] = idx
            self.poi_id_rdict[idx] = poi

        # generate training data
        train_traj_list = [traj_dict[k] for k in trajid_set_train if len(traj_dict[k]) >= 2]
        node_features_list = Parallel(n_jobs=N_JOBS)\
                             (delayed(calc_node_features)\
                              (tr[0], len(tr), self.poi_ix, self.poi_info, poi_clusters=POI_CLUSTERS, \
                               cats=POI_CAT_LIST, clusters=POI_CLUSTER_LIST) for tr in train_traj_list)
        self.edge_features = calc_edge_features(list(trajid_set_train), self.poi_ix, traj_dict, self.poi_info)

        # feature scaling
        # should each example be flattened to one vector before scaling?
        self.fdim = node_features_list[0].shape
        X_node_all = np.vstack(node_features_list)
        #print(self.fdim)
        #print(X_node_all.shape)
        #X_node_all = X_node_all.reshape(len(node_features_list), -1) # flatten every example to a vector
        X_node_all = self.scaler.fit_transform(X_node_all)
        X_node_all = X_node_all.reshape(-1, self.fdim[0], self.fdim[1])

        assert(len(train_traj_list) == X_node_all.shape[0])
        X_train = [(X_node_all[k, :, :], \
                    self.edge_features.copy(), \
                    (self.poi_id_dict[train_traj_list[k][0]], len(train_traj_list[k]))) \
                   for k in range(len(train_traj_list))]
        y_train = [np.array([self.poi_id_dict[k] for k in tr]) for tr in train_traj_list]
        assert(len(X_train) == len(y_train))

        # train
        sm = MyModel(inference_fun=self.inference_fun)
        verbose = 5 if self.debug == True else 0
        self.osssvm = OneSlackSSVM(model=sm, C=self.C, n_jobs=N_JOBS, verbose=verbose)
        self.osssvm.fit(X_train, y_train, initialize=True)
        self.trained = True
        print('SSVM training finished.')
        


    def predict(self, startPOI, nPOI):
        assert(self.trained == True)
        if startPOI not in self.poi_ix: return None
        X_node_test = calc_node_features(startPOI, nPOI, self.poi_ix, self.poi_info, poi_clusters=POI_CLUSTERS, \
                                         cats=POI_CAT_LIST, clusters=POI_CLUSTER_LIST)

        # feature scaling
        # should each example be flattened to one vector before scaling?
        #X_node_test = X_node_test.reshape(1, -1) # flatten test example to a vector
        X_node_test = self.scaler.transform(X_node_test)
        #X_node_test = X_node_test.reshape(self.fdim)

        X_test = [(X_node_test, self.edge_features, (self.poi_id_dict[startPOI], nPOI))]
        y_hat = self.osssvm.predict(X_test)

        return np.array([self.poi_id_rdict[x] for x in y_hat[0]])

Nested cross-validation with Monte-Carlo cross-validation as inner loop.

In [None]:
inference_methods = [do_inference_greedy, do_inference_viterbi, do_inference_listViterbi, do_inference_ILP]
methods_suffix = ['greedy', 'viterbi', 'listViterbi', 'ILP']

In [None]:
method_ix = 0

In [None]:
recdict_ssvm = dict()
cnt = 1
keys = sorted(TRAJ_GROUP_DICT.keys())
mc_portion = 0.1

# outer loop to evaluate the test performance by cross validation
for i in range(len(keys)):
    ps, L = keys[i]
    
    best_C = 1
    #best_F1 = 0; best_pF1 = 0
    best_Tau = 0
    keys_cv = keys[:i] + keys[i+1:]
    
    # tune regularisation constant C
    for ssvm_C in C_SET:
        print('\n--------------- try_C: %f ---------------\n' % ssvm_C); sys.stdout.flush() 
        F1_ssvm = []; pF1_ssvm = []; Tau_ssvm = []        
        
        # inner loop to evaluate the performance of a model with a specified C by Monte-Carlo cross validation
        for j in range(MC_NITER):
            while True: # make sure the start POI in test set are also in training set
                rand_ix = np.arange(len(keys_cv)); np.random.shuffle(rand_ix)
                test_ix = rand_ix[:int(MC_PORTION*len(rand_ix))]
                assert(len(test_ix) > 0)
                trajid_set_train = set(trajid_set_all) - TRAJ_GROUP_DICT[keys[i]]
                for j in test_ix: 
                    trajid_set_train = trajid_set_train - TRAJ_GROUP_DICT[keys_cv[j]]
                poi_set = set()
                for tid in trajid_set_train: poi_set = poi_set | set(traj_dict[tid])
                good_partition = True
                for j in test_ix: 
                    if keys_cv[j][0] not in poi_set: good_partition = False; break
                if good_partition == True: break

            # train
            ssvm = SSVM(C=ssvm_C, inference_fun=inference_methods[method_ix])#, debug=True)
            ssvm.train(trajid_set_train)
            
            # test
            for j in test_ix:
                ps_cv, L_cv = keys_cv[j]
                y_hat = ssvm.predict(ps_cv, L_cv)
                if y_hat is not None:
                    F1, pF1, tau = evaluate(y_hat, TRAJ_GROUP_DICT[keys_cv[j]])
                    F1_ssvm.append(F1); pF1_ssvm.append(pF1); Tau_ssvm.append(tau)
        
        #mean_F1 = np.mean(F1_ssvm); mean_pF1 = np.mean(pF1_ssvm)
        mean_Tau = np.mean(Tau_ssvm)
        print('mean_Tau: %.3f' % mean_Tau)
        if mean_Tau > best_Tau:
            best_Tau = mean_Tau
            best_C = ssvm_C
    print('\n--------------- %d/%d, Query: (%d, %d), Best_C: %f ---------------\n' % (cnt, len(keys), ps, L, best_C))
    sys.stdout.flush()
    
    # train model using all examples in training set and measure performance on test set
    trajid_set_train = set(trajid_set_all) - TRAJ_GROUP_DICT[keys[i]]
    ssvm = SSVM(C=best_C)
    if ssvm.train(trajid_set_train) == True:
        y_hat = ssvm.predict(ps, L)
        if y_hat is not None:
            recdict_ssvm[(ps, L)] = {'PRED': y_hat, 'W': ssvm.osssvm.w, 'C': ssvm.C}
        
    cnt += 1; #print_progress(cnt, len(keys)); sys.stdout.flush()

In [None]:
F1_ssvm = []; pF1_ssvm = []; tau_ssvm = []
for key in sorted(recdict_ssvm.keys()):
    F1, pF1, tau = evaluate(recdict_ssvm[key]['PRED'], TRAJ_GROUP_DICT[key])
    F1_ssvm.append(F1); pF1_ssvm.append(pF1); tau_ssvm.append(tau)
print('SSVM: F1 (%.3f, %.3f), pairsF1 (%.3f, %.3f), Tau (%.3f, %.3f)' % \
      (np.mean(F1_ssvm), np.std(F1_ssvm)/np.sqrt(len(F1_ssvm)), \
       np.mean(pF1_ssvm), np.std(pF1_ssvm)/np.sqrt(len(pF1_ssvm)), \
       np.mean(tau_ssvm), np.std(tau_ssvm)/np.sqrt(len(tau_ssvm))))

In [None]:
fssvm = os.path.join(data_dir, 'ssvm-' + methods_suffix[method_ix] + '-' + dat_suffix[dat_ix] + '.pkl')
fssvm

In [None]:
pickle.dump(recdict_ssvm, open(fssvm, 'bw'))