# Trajectory Recommendation - Multi-label Structured SVM

Table of contents:
1. [Description of multi-label SSVM](#1.-Description-of-multi-label-SSVM)
1. [Inference](#2.-Inference)
 1. [Brute force search](#2.1-Brute-force-search)
 1. [The list Viterbi algorithm](#2.2-The-list-Viterbi-algorithm)
1. [Structured SVM](#3.-Structured-SVM)

In [None]:
#% matplotlib inline

import os, sys, time, pickle, tempfile
import math, random, itertools
import pandas as pd
import numpy as np
import heapq as hq
from scipy.optimize import minimize

from sklearn.preprocessing import MinMaxScaler, StandardScaler, MaxAbsScaler

from pystruct.models import StructuredModel
from pystruct.learners import OneSlackSSVM

from joblib import Parallel, delayed
import cython
import pulp
import cvxopt

In [None]:
random.seed(1234554321)
np.random.seed(123456789)
cvxopt.base.setseed(123456789)

```dat_ix``` is required in notebook ```shared.ipynb```.

In [None]:
dat_ix = 0

Run notebook ```shared.ipynb```.

In [None]:
%run 'shared.ipynb'

Hyperparameters.

In [None]:
N_JOBS = 6         # number of parallel jobs
USE_GUROBI = False # whether to use GUROBI as ILP solver
ABS_SCALER = False  # feature scaling, True: MaxAbsScaler, False: MinMaxScaler #False: StandardScaler
C_SET = [0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30, 100, 300, 1000, 3000]  # regularisation parameter
MC_PORTION = 0.1   # the portion of data that sampled by Monte-Carlo cross-validation
MC_NITER = 5       # number of iterations for Monte-Carlo cross-validation
SSVM_SHARE_PARAMS = False  # share params among POIs/transitions in SSVM
LVITERBI_MAXITER = 1e6  # maximum number of iterations in the list Viterbi algorithm

# 1. Description of multi-label SSVM

The n-slack formulation of multi-label structured SVM:

\begin{align}
\min_{\mathbf{w}, \xi_{ij} \ge 0} ~& \frac{1}{2} \mathbf{w}^T \mathbf{w} + \frac{C}{n} \sum_{i=1}^n \sum_{j=1}^{m_i} \xi_{ij} \\
s.t. ~& \langle \mathbf{w}, \Psi(x_i, y_{ij}) \rangle - \langle \mathbf{w}, \Psi(x_i, \bar{y} \rangle \ge 
\Delta(y_{ij}, \bar{y}) - \xi_{ij},~ \bar{y} \in \mathcal{Y}_i,~ j=1,\dots,m_i
\end{align}

Where 
- $\mathbf{w}$ is the parameter vector
- $m_i$ is the number ground truth in training set that conform to query $x_i$
- $\xi_{ij}$ is the slack variable for the $j$-th ground truth of query $x_i$ 
- $\Psi(x_i, y_{ij})$ is the joint feature (vector) related to example $x_i$ and its label $y_{ij}$
- $\mathcal{Y}_i = \mathcal{Y} \setminus \{y_{ij}\}_{j=1}^{m_i}$ where $\mathcal{Y}$ is the set of all possible labels that conform to query $x_i$
- $\Delta(\centerdot)$ is the loss function, here we use Hamming loss, i.e., per-variable 0-1 loss, as indicated by function [loss()](https://github.com/pystruct/pystruct/blob/master/pystruct/models/base.py) and [fit()](https://github.com/pystruct/pystruct/blob/master/pystruct/learners/one_slack_ssvm.py)
- $n$ is the total number of training examples, $C$ is the regularisation parameter

# 2. Inference

Inference for SSVM: loss-augmented inference for cutting-plane training and inference for prediction.

Examples for sanity check.

In [None]:
M0, L0 = 5, 3
w_u = np.array([1, 2, 3, 2, 3]).reshape((M0, 1))
f_u = np.array([2, 1, 1, 3, 1]).reshape((M0, 1))
w_p = np.array([1,1,1,1,3, 1,1,1,2,1, 1,3,1,1,1, 2,1,1,1,1, 1,1,3,1,1]).reshape((M0, M0, 1))
f_p = np.array([1,2,1,1,1, 1,1,1,1,3, 2,1,1,1,1, 1,1,3,1,1, 1,1,1,2,1]).reshape((M0, M0, 1))
ps0, y_true0 = 1, [1, 0, 2]
y_true_list0 = [[1, 0, 2], [1, 3, 2]]

In [None]:
M0, L0 = 6, 4
w_u = np.array([1, 1, 1, 2, 1, 2]).reshape((M0, 1))
f_u = np.array([2, 1, 1, 2, 1, 1]).reshape((M0, 1))
w_p = np.array([1,1,1,1,3,2, 1,1,1,2,1,1, 1,3,1,1,1,2, 2,1,1,1,1,1, 1,1,3,1,1,1, 1,2,1,1,2,1]).reshape((M0, M0, 1))
f_p = np.array([1,2,1,1,1,1, 1,1,1,1,3,2, 2,1,1,1,1,2, 1,1,3,1,1,1, 1,1,1,2,1,1, 2,1,1,2,1,1]).reshape((M0, M0, 1))
ps0, y_true0 = 1, [1, 2, 0, 5]
y_true_list0 = [[1, 2, 0, 5], [1, 3, 2, 5], [1, 3, 2, 0], [1, 4, 3, 0], [1, 4, 3, 2], [1, 5, 3, 0], [1, 5, 3, 2]]

## 2.1 Brute force search

Inference using **brute force search** (for sanity check).

In [None]:
def do_inference_bruteForce(ps, L, M, unary_params, pw_params, unary_features, pw_features, 
                            y_true=None, y_true_list=None, debug=False):
    assert(L > 1)
    assert(L <= M)
    assert(ps >= 0)
    assert(ps < M)
    if y_true is not None: assert(y_true_list is not None and type(y_true_list) == list)
    
    Cu = np.zeros(M, dtype=np.float)      # unary_param[p] x unary_features[p]
    Cp = np.zeros((M, M), dtype=np.float) # pw_param[pi, pj] x pw_features[pi, pj]
    # a intermediate POI should NOT be the start POI, NO self-loops
    for pi in range(M):
        Cu[pi] = np.dot(unary_params[pi, :], unary_features[pi, :]) # if pi != ps else -np.inf
        for pj in range(M):
            Cp[pi, pj] = -np.inf if (pj == ps or pi == pj) else np.dot(pw_params[pi, pj, :], pw_features[pi, pj, :])
    
    max_score = 0
    y_best = None
    for x in itertools.permutations([p for p in range(M) if p != ps], int(L-1)):
        y = [ps] + list(x)
        score = 0
        
        if y_true is not None and np.any([np.all(np.array(y) == np.asarray(yj)) for yj in y_true_list]) == True: continue
        
        for j in range(1, L): score += Cp[y[j-1], y[j]] + Cu[y[j]]
        if y_true is not None: score += np.sum(np.asarray(y) != np.asarray(y_true))
        
        if score > max_score:
            max_score = score
            y_best = y
    if debug == True: print(max_score)
    return y_best

In [None]:
do_inference_bruteForce(ps0, L0, M0, w_u, w_p, f_u, f_p)

In [None]:
do_inference_bruteForce(ps0, L0, M0, w_u, w_p, f_u, f_p, y_true=y_true0, y_true_list=y_true_list0)

## 2.2 The list Viterbi algorithm

Inference using **the List Viterbi algorithm**, which *sequentially* find the (k+1)-th best path/walk given the 1st, 2nd, ..., k-th best paths/walks.

Reference papers:
- [*Sequentially finding the N-Best List in Hidden Markov Models*](http://www.eng.biu.ac.il/~goldbej/papers/ijcai01.pdf), Dennis Nilsson and Jacob Goldberger, IJCAI 2001.
- [*A tutorial on hidden Markov models and selected applications in speech recognition*](http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf), L.R. Rabiner, Proceedings of the IEEE, 1989.

Implementation is adapted from the above references.

In [None]:
class HeapItem:  # an item in heapq (min-heap)
    def __init__(self, priority, task):
        self.priority = priority
        self.task = task
        self.string = str(priority) + ': ' + str(task)
        
    def __lt__(self, other):
        return self.priority < other.priority
    
    def __repr__(self):
        return self.string
    
    def __str__(self):
        return self.string

In [None]:
def do_inference_listViterbi(ps, L, M, unary_params, pw_params, unary_features, pw_features, 
                             y_true=None, y_true_list=None, debug=False):
    assert(L > 1)
    assert(M >= L)
    assert(ps >= 0)
    assert(ps < M)
    if y_true is not None: assert(y_true_list is not None and type(y_true_list) == list)
    
    Cu = np.zeros(M, dtype=np.float)      # unary_param[p] x unary_features[p]
    Cp = np.zeros((M, M), dtype=np.float) # pw_param[pi, pj] x pw_features[pi, pj]
    
    # a intermediate POI should NOT be the start POI, NO self-loops
    for pi in range(M):
        Cu[pi] = np.dot(unary_params[pi, :], unary_features[pi, :]) # if pi != ps else -np.inf
        for pj in range(M):
            Cp[pi, pj] = -np.inf if (pj == ps or pi == pj) else np.dot(pw_params[pi, pj, :], pw_features[pi, pj, :])
            
    # forward-backward procedure: adapted from the Rabiner paper
    Alpha = np.zeros((L, M), dtype=np.float)  # alpha_t(p_i)
    Beta  = np.zeros((L, M), dtype=np.float)  # beta_t(p_i)
    
    for pj in range(M): Alpha[1, pj] = Cp[ps, pj] + Cu[pj] + (0 if y_true is None else float(pj != y_true[1]))
    for t in range(2, L):
        for pj in range(M): # ps~~pi--pj
            loss = 0 if y_true is None else float(pj != y_true[t])  # pi varies, pj fixed
            Alpha[t, pj] = loss + np.max([Alpha[t-1, pi] + Cp[pi, pj] + Cu[pj] for pi in range(M)])
    
    for pi in range(M): Beta[L-1, pi] = 0 if y_true is None else float(pi != y_true[L-1])
    for t in range(L-1, 1, -1):
        for pi in range(M): # ps~~pi--pj
            loss = 0 if y_true is None else float(pi != y_true[t-1])  # pi fixed, pj varies
            Beta[t-1, pi] = loss + np.max([Cp[pi, pj] + Cu[pj] + Beta[t, pj] for pj in range(M)])
    Beta[0, ps] = np.max([Cp[ps, pj] + Cu[pj] + Beta[1, pj] for pj in range(M)])
    
    Fp = np.zeros((L-1, M, M), dtype=np.float)  # f_{t, t+1}(p, p')
    
    for t in range(L-1):
        for pi in range(M):
            for pj in range(M):
                Fp[t, pi, pj] = Alpha[t, pi] + Cp[pi, pj] + Cu[pj] + Beta[t+1, pj]
                
    # identify the best path/walk: adapted from the IJCAI01 paper
    y_best = np.ones(L, dtype=np.int) * (-1)
    y_best[0] = ps
    y_best[1] = np.argmax(Fp[0, ps, :])  # the start POI is specified
    for t in range(2, L): 
        y_best[t] = np.argmax(Fp[t-1, y_best[t-1], :])
        
    Q = []  # priority queue (min-heap)
    maxIter = LVITERBI_MAXITER
    with np.errstate(invalid='raise'):  # deal with overflow
        try: maxIter = np.power(M, L-1) - np.prod([M-kx for kx in range(1,L)]) + 1 + \
                       (0 if y_true is None else len(y_true_list))
        except: maxIter = LVITERBI_MAXITER
    if debug == True: maxIter = np.min([maxIter, 200]); print('#iterations:', maxIter)
    else: maxIter = np.min([maxIter, LVITERBI_MAXITER])
        
    # heap item for the best path/walk
    priority, partition_index, exclude_set = -np.max(Alpha[L-1, :]), None, set()  # -1 * score as priority
    hq.heappush(Q, HeapItem(priority, (y_best, partition_index, exclude_set)))
    
    if debug == True: histories = set()
        
    k = 0; y_last = None
    while len(Q) > 0 and k < maxIter:
        #print('------------------\n', Q, '\n------------------')
        hitem = hq.heappop(Q)
        k_priority, (k_best, k_partition_index, k_exclude_set) = hitem.priority, hitem.task
        k += 1; y_last = k_best
        
        if debug == True: 
            histories.add(''.join([str(x) + ',' for x in k_best]))
            #print(k, len(histories))
            #print('pop:', k_priority, k_best, k_partition_index, k_exclude_set)
            print(k_best, -k_priority)
        else:
            if len(set(k_best)) == L:
                if y_true is None:
                    if debug == True: print(-k_priority)
                    return k_best
                else: # return k_best if it is NOT one of the ground truth labels
                    if not np.any([np.all(np.asarray(k_best) == np.asarray(yj)) for yj in y_true_list]):
                        if debug == True: print(-k_priority)
                        return k_best
            
        
        # identify the (k+1)-th best path/walk given the 1st, 2nd, ..., k-th best: adapted from the IJCAI01 paper
        partition_index_start = 1
        if k_partition_index is not None:
            assert(k_partition_index > 0)
            assert(k_partition_index < L)
            partition_index_start = k_partition_index
            
        for parix in range(partition_index_start, L):    
            new_exclude_set = set({k_best[parix]})
            if parix == partition_index_start:
                new_exclude_set = new_exclude_set | k_exclude_set
            
            new_best = np.ones(L, dtype=np.int) * (-1)
            for pk in range(parix):
                new_best[pk] = k_best[pk]
            
            candidate_points = [p for p in range(M) if p not in new_exclude_set]
            if len(candidate_points) == 0: continue
            candidate_maxix = np.argmax([Fp[parix-1, k_best[parix-1], p] for p in candidate_points])
            new_best[parix] = candidate_points[candidate_maxix]
            
            for pk in range(parix+1, L):
                new_best[pk] = np.argmax([Fp[pk-1, new_best[pk-1], p] for p in range(M)])
            
            new_priority = Fp[parix-1, k_best[parix-1], new_best[parix]]
            if k_partition_index is not None:
                new_priority += (-k_priority) - Fp[parix-1, k_best[parix-1], k_best[parix]]
            new_priority *= -1.0  # NOTE: -np.inf - np.inf + np.inf = nan
            
            #if debug == True and np.isnan(new_priority):
            #    print(Fp[parix-1,k_best[parix-1],new_best[parix]], (-k_priority), \
            #          Fp[parix-1,k_best[parix-1],k_best[parix]])
            #    print(Fp[parix-1,k_best[parix-1],new_best[parix]] - k_priority - \
            #          Fp[parix-1,k_best[parix-1],k_best[parix]])   
            #    print(' '*3, 'push:', new_priority, new_best, parix, new_exclude_set)
            
            hq.heappush(Q, HeapItem(new_priority, (new_best, parix, new_exclude_set)))
            #print('------------------\n', Q, '\n------------------')
    if debug == True: print('#iterations: %d, #distinct_trajectories: %d' % (k, len(histories)))
    if k >= maxIter: 
        sys.stderr.write('WARN: reaching max number of iterations, NO optimal solution found, return the last one.\n')
    if len(Q) == 0:
        sys.stderr.write('WARN: empty queue, return the last one\n')
    return y_last

In [None]:
#do_inference_listViterbi(ps0, L0, M0, w_u, w_p, f_u, f_p, debug=True)

In [None]:
#do_inference_listViterbi(ps0, L0, M0, w_u, w_p, f_u, f_p)

In [None]:
do_inference_listViterbi(ps0, L0, M0, w_u, w_p, f_u, f_p, y_true=y_true0, y_true_list=y_true_list0)

**sanity check using random weights**

In [None]:
M0 = 90
w_u = np.random.rand(M0).reshape(M0, 1)
f_u = np.random.rand(M0).reshape(M0, 1)
w_p = np.random.rand(M0*M0).reshape(M0, M0, 1)
f_p = np.random.rand(M0*M0).reshape(M0, M0, 1)
ps0 = np.random.choice(np.arange(M0))
L0 = np.random.choice(np.arange(2, 6))
indices0 = [x for x in range(M0) if x != ps0]; np.random.shuffle(indices0)
y_true0 = [ps0] + indices0[:L0-1]
y_true_list0 = [y_true0]
for j in range(6):
    np.random.shuffle(indices0); y_true_list0.append([ps0] + indices0[:L0-1])

In [None]:
#print('M: %d\nQuery: (%d, %d)' % (M0, ps0, L0))
#print('w_u:', w_u)
#print('f_u:', f_u)
#print('w_p:', w_p)
#print('f_p:', f_p)

In [None]:
#do_inference_bruteForce(ps0, L0, M0, w_u, w_p, f_u, f_p)

In [None]:
#do_inference_listViterbi(ps0, L0, M0, w_u, w_p, f_u, f_p)

In [None]:
#do_inference_bruteForce(ps0, L0, M0, w_u, w_p, f_u, f_p, y_true=y_true0, y_true_list=y_true_list0)

In [None]:
#do_inference_listViterbi(ps0, L0, M0, w_u, w_p, f_u, f_p, y_true=y_true0, y_true_list=y_true_list0)

# 3. Structured SVM

In [None]:
class MyModel(StructuredModel):
    
    def __init__(self, n_states=None, n_features=None, n_edge_features=None, 
                 inference_fun=do_inference_listViterbi, share_params=SSVM_SHARE_PARAMS):
        self.inference_method = 'customized'
        self.inference_fun = inference_fun
        self.class_weight = None
        self.inference_calls = 0
        self.n_states = n_states
        self.n_features = n_features
        self.n_edge_features = n_edge_features
        self.share_params = share_params
        self._set_size_joint_feature()
        self._set_class_weight()

        
    def _set_size_joint_feature(self):
        if None not in [self.n_states, self.n_features, self.n_edge_features]:
            if self.share_params == True: # share params among POIs/transitions
                self.size_joint_feature = self.n_features + self.n_edge_features
            else:
                self.size_joint_feature = self.n_states * self.n_features + \
                                          self.n_states * self.n_states * self.n_edge_features
   

    def loss(self, y, y_hat):
        #return np.mean(np.asarray(y) != np.asarray(y_hat))     # hamming loss (normalised)
        return np.sum(np.asarray(y) != np.asarray(y_hat))     # hamming loss
        #return loss_F1(y, y_hat)      # F1 loss
        #return loss_pairsF1(y, y_hat) # pairsF1 loss
        #return loss_pairsF1(np.array(y), np.array(y_hat)) # pairsF1 loss

    
    def initialize(self, X, Y):
        assert(len(X) == len(Y))
        n_features = X[0][0].shape[1]
        if self.n_features is None: 
            self.n_features = n_features
        else:
            assert(self.n_features == n_featurees)

        n_states = len(np.unique(np.hstack([y.ravel() for y in Y])))
        if self.n_states is None: 
            self.n_states = n_states
        else:
            assert(self.n_states == n_states)
            
        n_edge_features = X[0][1].shape[2]
        if self.n_edge_features is None:
            self.n_edge_features = n_edge_features
        else:
            assert(self.n_edge_features == n_edge_features)
            
        self._set_size_joint_feature()
        self._set_class_weight()
        
        self.traj_group_dict = dict()
        for i in range(len(X)):
            query = X[i][2]
            if query in self.traj_group_dict: self.traj_group_dict[query].append(Y[i])
            else: self.traj_group_dict[query] = [Y[i]]
        
        

    def __repr__(self):
        return ("%s(n_states: %d, inference_method: %s, n_features: %d, n_edge_features: %d)"
                % (type(self).__name__, self.n_states, self.inference_method, self.n_features, self.n_edge_features))
    
    
    def joint_feature(self, x, y):
        assert(not isinstance(y, tuple))
        unary_features = x[0] # unary features of all POIs: n_POIs x n_features
        pw_features = x[1]    # pairwise features of all transitions: n_POIs x n_POIs x n_edge_features
        query = x[2]          # query = (startPOI, length)
        n_nodes = query[1]
        
        #print('y:', y)
        
        #assert(unary_features.ndim == 2)
        #assert(pw_features.ndim == 3)
        #assert(len(query) == 3)
        assert(n_nodes == len(y))
        assert(unary_features.shape == (self.n_states, self.n_features))
        assert(pw_features.shape == (self.n_states, self.n_states, self.n_edge_features))
        
        if self.share_params == True:
            node_features = np.zeros((self.n_features), dtype=np.float)
            edge_features = np.zeros((self.n_edge_features), dtype=np.float)
            node_features = unary_features[y[0], :]
            for j in range(len(y)-1):
                ss, tt = y[j], y[j+1]
                node_features = node_features + unary_features[tt, :]
                edge_features = edge_features + pw_features[ss, tt, :]
        else: 
            node_features = np.zeros((self.n_states, self.n_features), dtype=np.float)
            edge_features = np.zeros((self.n_states, self.n_states, self.n_edge_features), dtype=np.float)
            node_features[y[0], :] = unary_features[y[0], :]
            for j in range(len(y)-1):
                ss, tt = y[j], y[j+1]
                node_features[tt, :] = unary_features[tt, :]
                edge_features[ss, tt, :] = pw_features[ss, tt, :]

        # sum node/edge features after scaling: 
        # equivalent to share parameters between features of different POIs/transitions
        joint_feature_vector = np.hstack([node_features.ravel(), edge_features.ravel()])
        
        return joint_feature_vector
            
    
    def loss_augmented_inference(self, x, y, w, relaxed=None):
        #print('loss_augmented_inference:', y)
        # inference procedure for training: (x, y) from training set (with features already scaled)
        #
        # argmax_y_hat np.dot(w, joint_feature(x, y_hat)) + loss(y, y_hat)
        # 
        # the loss function should be decomposible in order to use Viterbi decoding, here we use Hamming loss
        #
        # x[0]: (unscaled) unary features of all POIs: n_POIs x n_features
        # x[1]: (unscaled) pairwise features of all transitions: n_POIs x n_POIs x n_edge_features
        # x[2]: query = (startPOI, length)
        unary_features = x[0]
        pw_features = x[1]
        query = x[2]
        
        assert(unary_features.ndim == 2)
        assert(pw_features.ndim == 3)
        assert(len(query) == 2)
        
        ps = query[0]
        L = query[1]
        M = unary_features.shape[0]  # total number of POIs
        
        self._check_size_w(w)
        if self.share_params == True:
            unary_params = w[:self.n_features]
            pw_params = w[self.n_features:].reshape(self.n_edge_features)
            # duplicate params so that inference procedures work the same way no matter params shared or not
            unary_params = np.tile(unary_params, (self.n_states, 1))
            pw_params = np.tile(pw_params, (self.n_states, self.n_states, 1))
        else:
            unary_params = w[:self.n_states * self.n_features].reshape((self.n_states, self.n_features))
            pw_params = w[self.n_states * self.n_features:].reshape((self.n_states, self.n_states, self.n_edge_features))
        
        #y_hat = do_inference_bruteForce(ps, L, M, unary_params, pw_params, unary_features, pw_features, 
        #                                y_true=y, y_true_list=self.traj_group_dict[query])
        y_hat = do_inference_listViterbi(ps, L, M, unary_params, pw_params, unary_features, pw_features, 
                                         y_true=y, y_true_list=self.traj_group_dict[query])
        return y_hat

    
    def inference(self, x, w, relaxed=False, return_energy=False):
        #print('inference')
        # inference procedure for testing: x from test set (features needs to be scaled)
        #
        # argmax_y np.dot(w, joint_feature(x, y))
        #
        # x[0]: (unscaled) unary features of all POIs: n_POIs x n_features
        # x[1]: (unscaled) pairwise features of all transitions: n_POIs x n_POIs x n_edge_features
        # x[2]: query = (startPOI, length)
        unary_features = x[0]
        pw_features = x[1]
        query = x[2]
        
        assert(unary_features.ndim == 2)
        assert(pw_features.ndim == 3)
        assert(len(query) == 2)
        
        ps = query[0]
        L = query[1]
        M = unary_features.shape[0]  # total number of POIs
        
        self._check_size_w(w)
        if self.share_params == True:
            unary_params = w[:self.n_features]
            pw_params = w[self.n_features:].reshape(self.n_edge_features)
            # duplicate params so that inference procedures work the same way no matter params shared or not
            unary_params = np.tile(unary_params, (self.n_states, 1))
            pw_params = np.tile(pw_params, (self.n_states, self.n_states, 1))
        else:
            unary_params = w[:self.n_states * self.n_features].reshape((self.n_states, self.n_features))
            pw_params = w[self.n_states * self.n_features:].reshape((self.n_states, self.n_states, self.n_edge_features))
        
        #y_pred = do_inference_listViterbi(ps, L, M, unary_params, pw_params, unary_features, pw_features)
        #assert(len(y_pred) == len(set(y_pred)))
        y_pred = self.inference_fun(ps, L, M, unary_params, pw_params, unary_features, pw_features)
        
        return y_pred

Compute node features (singleton).

In [None]:
def calc_node_features(startPOI, nPOI, poi_ix, poi_info, poi_clusters, cats, clusters):
    """
    Generate feature vectors for all POIs given query (startPOI, nPOI)
    """
    assert(isinstance(cats, list))
    assert(isinstance(clusters, list))
    
    columns = DF_COLUMNS[3:]
    poi_distmat = POI_DISTMAT
    p0, trajLen = startPOI, nPOI
    assert(p0 in poi_info.index)
    
    # DEBUG: use uniform node features
    nrows = len(poi_ix)
    ncols = len(columns) + len(cats) + len(clusters) - 2
    #return np.ones((nrows, ncols), dtype=np.float)
    #return np.zeros((nrows, ncols), dtype=np.float)
    
    poi_list = poi_ix
    df_ = pd.DataFrame(index=poi_list, columns=columns)
        
    for poi in poi_list:
        lon, lat = poi_info.loc[poi, 'poiLon'], poi_info.loc[poi, 'poiLat']
        pop, nvisit = poi_info.loc[poi, 'popularity'], poi_info.loc[poi, 'nVisit']
        cat, cluster = poi_info.loc[poi, 'poiCat'], poi_clusters.loc[poi, 'clusterID']
        duration = poi_info.loc[poi, 'avgDuration']
        idx = poi
        df_.set_value(idx, 'category', tuple((cat == np.array(cats)).astype(np.int) * 2 - 1))
        df_.set_value(idx, 'neighbourhood', tuple((cluster == np.array(clusters)).astype(np.int) * 2 - 1))
        df_.loc[idx, 'popularity'] = LOG_SMALL if pop < 1 else np.log10(pop)
        df_.loc[idx, 'nVisit'] = LOG_SMALL if nvisit < 1 else np.log10(nvisit)
        df_.loc[idx, 'avgDuration'] = LOG_SMALL if duration < 1 else np.log10(duration)
        df_.loc[idx, 'trajLen'] = trajLen
        df_.loc[idx, 'sameCatStart'] = 1 if cat == poi_all.loc[p0, 'poiCat'] else -1
        df_.loc[idx, 'distStart'] = poi_distmat.loc[poi, p0]
        df_.loc[idx, 'diffPopStart'] = pop - poi_info.loc[p0, 'popularity']
        df_.loc[idx, 'diffNVisitStart'] = nvisit - poi_info.loc[p0, 'nVisit']
        df_.loc[idx, 'diffDurationStart'] = duration - poi_info.loc[p0, 'avgDuration']
        df_.loc[idx, 'sameNeighbourhoodStart'] = 1 if cluster == poi_clusters.loc[p0, 'clusterID'] else -1
        
    # features other than category and neighbourhood
    feature_name = ['popularity', 'nVisit', 'avgDuration', 'trajLen', 'sameCatStart', 'distStart', 
                    'diffPopStart', 'diffNVisitStart', 'diffDurationStart', 'sameNeighbourhoodStart']
    #X = df_[sorted(set(df_.columns) - {'category', 'neighbourhood'})].values
    X = df_[feature_name].values
    
    # boolean features: category (+1, -1)
    cat_features = np.vstack([list(df_.loc[x, 'category']) for x in df_.index])
    
    # boolean features: neighbourhood (+1, -1)
    neigh_features = np.vstack([list(df_.loc[x, 'neighbourhood']) for x in df_.index])
    
    return np.hstack([cat_features, neigh_features, X]).astype(np.float)

Compute edge features (transiton / pairwise).

In [None]:
def calc_edge_features(trajid_list, poi_ix, traj_dict, poi_info):    
    feature_names = ['poiCat', 'popularity', 'nVisit', 'avgDuration', 'clusterID']
    n_features = len(feature_names)
    
    # DEBUG: use uniform edge features
    #return np.ones((len(poi_ix), len(poi_ix), n_features), dtype=np.float)
    #return np.zeros((len(poi_ix), len(poi_ix), n_features), dtype=np.float)
    
    transmat_cat                        = gen_transmat_cat(trajid_list, traj_dict, poi_info)
    transmat_pop,      logbins_pop      = gen_transmat_pop(trajid_list, traj_dict, poi_info)
    transmat_visit,    logbins_visit    = gen_transmat_visit(trajid_list, traj_dict, poi_info)
    transmat_duration, logbins_duration = gen_transmat_duration(trajid_list, traj_dict, poi_info)
    transmat_neighbor, poi_clusters     = gen_transmat_neighbor(trajid_list, traj_dict, poi_info)
    
    poi_features = pd.DataFrame(data=np.zeros((len(poi_ix), len(feature_names))), \
                                columns=feature_names, index=poi_ix)
    poi_features.index.name = 'poiID'
    poi_features['poiCat'] = poi_info.loc[poi_ix, 'poiCat']
    poi_features['popularity'] = np.digitize(poi_info.loc[poi_ix, 'popularity'], logbins_pop)
    poi_features['nVisit'] = np.digitize(poi_info.loc[poi_ix, 'nVisit'], logbins_visit)
    poi_features['avgDuration'] = np.digitize(poi_info.loc[poi_ix, 'avgDuration'], logbins_duration)
    poi_features['clusterID'] = poi_clusters.loc[poi_ix, 'clusterID']
    
    edge_features = np.zeros((len(poi_ix), len(poi_ix), n_features), dtype=np.float64)
    
    for j in range(len(poi_ix)): # NOTE: POI order
        pj = poi_ix[j]
        cat, pop = poi_features.loc[pj, 'poiCat'], poi_features.loc[pj, 'popularity']
        visit, cluster = poi_features.loc[pj, 'nVisit'], poi_features.loc[pj, 'clusterID']
        duration = poi_features.loc[pj, 'avgDuration']
        
        for k in range(len(poi_ix)): # NOTE: POI order
            pk = poi_ix[k]
            edge_features[j, k, :] = np.log10( np.array(
            #edge_features[j, k, :] = np.array(
                    [transmat_cat.loc[cat, poi_features.loc[pk, 'poiCat']], \
                     transmat_pop.loc[pop, poi_features.loc[pk, 'popularity']], \
                     transmat_visit.loc[visit, poi_features.loc[pk, 'nVisit']], \
                     transmat_duration.loc[duration, poi_features.loc[pk, 'avgDuration']], \
                     transmat_neighbor.loc[cluster, poi_features.loc[pk, 'clusterID']]] ) )
    return edge_features

In [None]:
class SSVM:
    def __init__(self, inference_fun=do_inference_listViterbi, C=1.0, poi_info=None, debug=False):
        assert(C > 0)
        self.C = C
        self.inference_fun = inference_fun
        self.debug = debug
        self.trained = False
        
        if poi_info is None:
            self.poi_info = None
        else:
            self.poi_info = poi_info
        
        if ABS_SCALER == True:
            self.scaler_node = MaxAbsScaler(copy=False)
            self.scaler_edge = MaxAbsScaler(copy=False)
        else:
            self.scaler_node = MinMaxScaler(feature_range=(-1,1), copy=False)
            self.scaler_edge = MinMaxScaler(feature_range=(-1,1), copy=False)
            #self.scaler = StandardScaler(copy=False)  

        
    def train(self, trajid_set_train):
        if self.poi_info is None:
            self.poi_info = calc_poi_info(list(trajid_set_train), traj_all, poi_all)

        # build POI_ID <--> POI__INDEX mapping for POIs used to train CRF
        # which means only POIs in traj such that len(traj) >= 2 are included
        poi_set = {p for tid in trajid_set_train for p in traj_dict[tid] if len(traj_dict[tid]) >= 2}
        #poi_set = set()
        #for x in trajid_set_train:
        #    if len(traj_dict[x]) >= 2:
        #        poi_set = poi_set | set(traj_dict[x])
        self.poi_ix = sorted(poi_set)
        self.poi_id_dict, self.poi_id_rdict = dict(), dict()
        for idx, poi in enumerate(self.poi_ix):
            self.poi_id_dict[poi] = idx
            self.poi_id_rdict[idx] = poi

        # generate training data
        train_traj_list = [traj_dict[k] for k in trajid_set_train if len(traj_dict[k]) >= 2]
        node_features_list = Parallel(n_jobs=N_JOBS)\
                             (delayed(calc_node_features)\
                              (tr[0], len(tr), self.poi_ix, self.poi_info, poi_clusters=POI_CLUSTERS, \
                               cats=POI_CAT_LIST, clusters=POI_CLUSTER_LIST) for tr in train_traj_list)
        edge_features = calc_edge_features(list(trajid_set_train), self.poi_ix, traj_dict, self.poi_info)
        #print(edge_features)

        # feature scaling: node features
        # should each example be flattened to one vector before scaling?
        self.fdim_node = node_features_list[0].shape
        X_node_all = np.vstack(node_features_list)
        #print(self.fdim)
        #print(X_node_all.shape)
        #X_node_all = X_node_all.reshape(len(node_features_list), -1) # flatten every example to a vector
        X_node_all = self.scaler_node.fit_transform(X_node_all)
        X_node_all = X_node_all.reshape(-1, self.fdim_node[0], self.fdim_node[1])
        
        # feature scaling: edge features
        fdim_edge = edge_features.shape
        edge_features = self.scaler_edge.fit_transform(edge_features.reshape(fdim_edge[0]*fdim_edge[1], -1))
        self.edge_features = edge_features.reshape(fdim_edge)
        #print('---------------------------------------\n\n'); print(edge_features)

        assert(len(train_traj_list) == X_node_all.shape[0])
        X_train = [(X_node_all[k, :, :], \
                    self.edge_features.copy(), \
                    (self.poi_id_dict[train_traj_list[k][0]], len(train_traj_list[k]))) \
                   for k in range(len(train_traj_list))]
        y_train = [np.array([self.poi_id_dict[k] for k in tr]) for tr in train_traj_list]
        assert(len(X_train) == len(y_train))

        # train
        sm = MyModel(inference_fun=self.inference_fun)
        if self.debug == True: print('C:', self.C)
        verbose = 1 if self.debug == True else 0
        self.osssvm = OneSlackSSVM(model=sm, C=self.C, n_jobs=N_JOBS, verbose=verbose)
        try:
            self.osssvm.fit(X_train, y_train, initialize=True)
            self.trained = True 
            print('SSVM training finished.')
        #except ValueError:
        except:
            self.trained = False
            sys.stderr.write('SSVM training FAILED.\n')
        return self.trained


    def predict(self, startPOI, nPOI):
        assert(self.trained == True)
        if startPOI not in self.poi_ix: return None
        X_node_test = calc_node_features(startPOI, nPOI, self.poi_ix, self.poi_info, poi_clusters=POI_CLUSTERS, \
                                         cats=POI_CAT_LIST, clusters=POI_CLUSTER_LIST)

        # feature scaling
        # should each example be flattened to one vector before scaling?
        #X_node_test = X_node_test.reshape(1, -1) # flatten test example to a vector
        X_node_test = self.scaler_node.transform(X_node_test)
        #X_node_test = X_node_test.reshape(self.fdim)

        X_test = [(X_node_test, self.edge_features, (self.poi_id_dict[startPOI], nPOI))]
        y_hat = self.osssvm.predict(X_test)

        return np.array([self.poi_id_rdict[x] for x in y_hat[0]])

Nested cross-validation with Monte-Carlo cross-validation as inner loop.

In [None]:
inference_methods = [do_inference_bruteForce, do_inference_listViterbi]
methods_suffix = ['bruteForce', 'listViterbi']

In [None]:
method_ix = 1

In [None]:
recdict_ssvm = dict()
cnt = 1
keys = sorted(TRAJ_GROUP_DICT.keys())

# outer loop to evaluate the test performance by cross validation
for i in range(len(keys)):
    ps, L = keys[i]
    best_C = 1
    #best_F1 = 0; best_pF1 = 0
    best_Tau = 0
    keys_cv = keys[:i] + keys[i+1:]
    
    # use all training+validation set to compute POI features, 
    # make sure features do NOT change for training and validation
    trajid_set_i = set(trajid_set_all) - TRAJ_GROUP_DICT[keys[i]]
    poi_info_i = calc_poi_info(list(trajid_set_i), traj_all, poi_all)
    
    # tune regularisation constant C
    for ssvm_C in C_SET:
        print('\n--------------- try_C: %f ---------------\n' % ssvm_C); sys.stdout.flush() 
        F1_ssvm = []; pF1_ssvm = []; Tau_ssvm = []        
        
        # inner loop to evaluate the performance of a model with a specified C by Monte-Carlo cross validation
        for j in range(MC_NITER):
            poi_list = []
            while True: # make sure the start POI in test set are also in training set
                rand_ix = np.arange(len(keys_cv)); np.random.shuffle(rand_ix)
                test_ix = rand_ix[:int(MC_PORTION*len(rand_ix))]
                assert(len(test_ix) > 0)
                trajid_set_train = set(trajid_set_all) - TRAJ_GROUP_DICT[keys[i]]
                for j in test_ix: 
                    trajid_set_train = trajid_set_train - TRAJ_GROUP_DICT[keys_cv[j]]
                poi_set = set()
                for tid in trajid_set_train: poi_set = poi_set | set(traj_dict[tid])
                good_partition = True
                for j in test_ix: 
                    if keys_cv[j][0] not in poi_set: good_partition = False; break
                if good_partition == True: 
                    poi_list = sorted(poi_set)
                    break

            # train
            ssvm = SSVM(inference_fun=inference_methods[method_ix], C=ssvm_C, poi_info=poi_info_i.loc[poi_list].copy())
            if ssvm.train(trajid_set_train) == True:            
                for j in test_ix: # test
                    ps_cv, L_cv = keys_cv[j]
                    y_hat = ssvm.predict(ps_cv, L_cv)
                    if y_hat is not None:
                        F1, pF1, tau = evaluate(y_hat, TRAJ_GROUP_DICT[keys_cv[j]])
                        F1_ssvm.append(F1); pF1_ssvm.append(pF1); Tau_ssvm.append(tau)
            else: 
                for j in test_ix:
                    F1_ssvm.append(0); pF1_ssvm.append(0); Tau_ssvm.append(0)
        
        #mean_F1 = np.mean(F1_ssvm); mean_pF1 = np.mean(pF1_ssvm)
        mean_Tau = np.mean(Tau_ssvm)
        print('mean_Tau: %.3f' % mean_Tau)
        if mean_Tau > best_Tau:
            best_Tau = mean_Tau
            best_C = ssvm_C
    print('\n--------------- %d/%d, Query: (%d, %d), Best_C: %f ---------------\n' % (cnt, len(keys), ps, L, best_C))
    sys.stdout.flush()
    
    # train model using all examples in training set and measure performance on test set
    ssvm = SSVM(inference_fun=inference_methods[method_ix], C=best_C, poi_info=poi_info_i)#, debug=True)
    if ssvm.train(trajid_set_i) == True:
        y_hat = ssvm.predict(ps, L)
        print(cnt, y_hat)
        if y_hat is not None:
            recdict_ssvm[(ps, L)] = {'PRED': y_hat, 'W': ssvm.osssvm.w, 'C': ssvm.C}
        
    cnt += 1; #print_progress(cnt, len(keys)); sys.stdout.flush()

In [None]:
F1_ssvm = []; pF1_ssvm = []; tau_ssvm = []
for key in sorted(recdict_ssvm.keys()):
    F1, pF1, tau = evaluate(recdict_ssvm[key]['PRED'], TRAJ_GROUP_DICT[key])
    F1_ssvm.append(F1); pF1_ssvm.append(pF1); tau_ssvm.append(tau)
print('SSVM: F1 (%.3f, %.3f), pairsF1 (%.3f, %.3f), Tau (%.3f, %.3f)' % \
      (np.mean(F1_ssvm), np.std(F1_ssvm)/np.sqrt(len(F1_ssvm)), \
       np.mean(pF1_ssvm), np.std(pF1_ssvm)/np.sqrt(len(pF1_ssvm)), \
       np.mean(tau_ssvm), np.std(tau_ssvm)/np.sqrt(len(tau_ssvm))))

In [None]:
fssvm = os.path.join(data_dir, 'ssvm-' + methods_suffix[method_ix] + '-' + dat_suffix[dat_ix] + '.pkl')
pickle.dump(recdict_ssvm, open(fssvm, 'bw'))