# Lab Book for Trajectory Recommendation Experiments

---------------------

## Multi-label SSVM

**Experiment**
- Implement multi-label SSVM, and compare the performance with vanilla SSVM
- Use the list Viteri algorithm to do both loss_augmented_inference and inference for prediction
- Leave-one-out evaluation on Osaka dataset
- This experiment is performed in notebook `ssvm_ml.ipynb`

**Conclusion**
- Sample results on Osaka dataset
 - `Tie-Log`: share parameters among POIs and transitions, and take the log of (factorised) transition probabilities
 - `Tie-NoLog`: share parameters among POIs and transitions, but do NOT take the log of (factorised) transition probabilities
 - `NoTie-Log`: do NOT share parameters among POIs and transitions, and take the log of (factorised) transition probabilities
 - `NoTie-NoLog`: do NOT share parameters among POIs and transitions, and do NOT take the log of (factorised) transition probabilities
 ```
 SSVM-MultiLabel-Tie-Log:     F1 (0.624, 0.038), pairsF1 (0.363, 0.059), Tau (0.591, 0.042)
 SSVM-MultiLabel-Tie-NoLog:   F1 (0.659, 0.040), pairsF1 (0.423, 0.060), Tau (0.626, 0.044)
 SSVM-MultiLabel-NoTie-NoLog: F1 (0.649, 0.035), pairsF1 (0.373, 0.057), Tau (0.619, 0.038)
 SSVM-MultiLabel-NoTie-Log:   F1 (0.655, 0.035), pairsF1 (0.377, 0.057), Tau (0.624, 0.037)
 SSVM-Tie-Log:                F1 (0.611, 0.037), pairsF1 (0.339, 0.058), Tau (0.580, 0.040)
 SSVM-Tie-NoLog:              F1 (0.613, 0.037), pairsF1 (0.330, 0.057), Tau (0.579, 0.041) 
 SSVM-NoTie-NoLog:            F1 (0.637, 0.039), pairsF1 (0.378, 0.061), Tau (0.605, 0.043)
 SSVM-NoTie-Log:              F1 (0.626, 0.039), pairsF1 (0.361, 0.060), Tau (0.593, 0.042)
 RankSVM:                     F1 (0.678, 0.037), pairsF1 (0.433, 0.059), Tau (0.647, 0.040)
 ```

## Pruning real dataset to create a single label dataset

**Experiment**
- Pruning the Glasgow dataset $\mathcal{M}_0$, and keep only one trajectory for each query with regards to some deterministic method to choose which trajectory should be keeped among trajectories that conform to the query. (e.g., the one with maximum total number of POI popularity etc.), let $\mathcal{M}_1$ be the new dataset
- Train SSVM and RankSVM on $\mathcal{M}_1$, features are computed from $\mathcal{M}_0$ or $\mathcal{M}_1$

**Conclusion**
- Sample results:
 - Features are computed on the $\mathcal{M}_0$
 ```
 RankSVM:           F1 (0.536, 0.017), pairsF1 (0.145, 0.020), Tau (0.484, 0.018)
 SSVM:              F1 (0.612, 0.029), pairsF1 (0.317, 0.047), Tau (0.573, 0.032)
 SSVM-NoTransition: F1 (0.610, 0.029), pairsF1 (0.309, 0.044), Tau (0.571, 0.032)
 ```
 - Features are computed on the $\mathcal{M}_1$
 ```
 RankSVM:           F1 (0.705, 0.030), pairsF1 (0.414, 0.047), Tau (0.667, 0.033)
 SSVM:              F1 (0.629, 0.029), pairsF1 (0.334, 0.046), Tau (0.588, 0.032)
 SSVM-NoTransition: F1 (0.588, 0.026), pairsF1 (0.265, 0.040), Tau (0.546, 0.029)
 ```

## Generating Trajectories using SSVM with random weights

**Experiment**
- Features are computed from Glasgow dataset $\mathcal{M}_0$
- SSVM weights are random samples from a standard univariate Gaussian distribution `numpy.random.randn()`
- Prediction inference of SSVM is the list Viterbi algorithm, generated dataset $\mathcal{M}_1$
- Use leave-one-evaluation to evaluate performance (features are computed from $\mathcal{M}_1$ and labels are from $\mathcal{M}_1$)
- This experiment is performed in notebook ```generated_data.ipynb```

**Conclusion**
- The number of visit for POI are concentrate on a small subset of POIs
- The transition matrix of the number of visit (discretized) compute from $\mathcal{M}_1$ is similar to that from $\mathcal{M}_0$
- The transition features are correlated with POI features (as transition matrix are factorised according to several POI features)
- It could be that both the POI features and transition features are good, but transition features don't provide new information (Cheng)
- Sample results from $3$ runs (each run with a different random weights, thus different $\mathcal{M}_1$)
```
SSVM: F1 (0.812, 0.016), pairsF1 (0.574, 0.028), Tau (0.757, 0.019)
SSVM: F1 (0.745, 0.018), pairsF1 (0.500, 0.029), Tau (0.697, 0.020)
SSVM: F1 (0.772, 0.019), pairsF1 (0.544, 0.030), Tau (0.726, 0.021)
```

## Multi-label effects on SSVM and RankSVM

Some thoughts:
- Multi-label hurts SSVM as we have multiple labels given an example (a feature vector).
- On the other hand, as the label of example (a POI and query specific feature vector) for RankSVM is computed by counting the number of occurrence of the POI in trajectories that conform to the query, it seems there's no multi-label problem in RankSVM training.

**Experiment**
- Experiment on generated dataset (single label for an example/query)
- Both POI and transition features are scaled linearly to $(-1,1)$ using MinMaxScaler
- $C$ is tuned using Monte-Carlo CV at the beginning and keeped the same for all leave-one-out CV
- $C$ for SSVM, SSVM without transition features and the $C$ for RankSVM are tunned seperatly
- Prediction inference in SSVM is the list Viterbi algorithm
- This experiment is performed in notebook ```generated_data.ipynb```

**Conclusion**
- When features are computed from the original dataset (with duration related features), NOT the generated dataset, SSVM performs better than RankSVM, sample results:
```
RankSVM:           F1 (0.597, 0.011), pairsF1 (0.248, 0.012), Tau (0.507, 0.011)
SSVM:              F1 (0.883, 0.014), pairsF1 (0.667, 0.027), Tau (0.843, 0.017)
SSVM-NoTransition: F1 (0.827, 0.015), pairsF1 (0.510, 0.021), Tau (0.775, 0.015)
```
- When features are recomputed from the generated dataset (so NO duration related features), RankSVM performs better than SSVM, sample results:
```
RankSVM:           F1 (0.903, 0.012), pairsF1 (0.612, 0.023), Tau (0.858, 0.013)
SSVM:              F1 (0.865, 0.016), pairsF1 (0.607, 0.026), Tau (0.822, 0.017)
SSVM-NoTransition: F1 (0.869, 0.017), pairsF1 (0.589, 0.026), Tau (0.819, 0.018) 
```

## Transition feature scaling

**Experiment**
- Scale transition features, i.e. the log probabilities factorised according to a number of features, to range $(-1,1)$ linearly.
- The Osaka dataset is used and the performance is evaluated using leave-one-out cross validation.

**Conclusion**

## SSVM with RankSVM weights

**Experiment** 
- This experiment plug the weights of a trained RankSVM, i.e. a vector $w$ and $|w|$ is the number of POI (and query) features, in SSVM and do prediction, this means we ignore transition features and tie weights among different POIs.
- A dedicated notebook ```ssvm_ranksvm_weights.ipynb``` is created for this experiment.

**Conclusion** 
- We found that SSVM with RankSVM weights performance very similar to RankSVM, in fact, when using the list Viterbi algorithm, the predicted trajectory given any query (in leave-one-out cross validation) contains the exactly same set of POIs as the predition by RankSVM, except the visiting order of these POIs is different.
- Features are scaled using **MinMaxScaler** (scikit-learn), and scaled to range $(-1,1)$ linearly, which is the scaling method used in RankSVM (the libsvm implementation).
- Sample results:
```
As POI features for both RankSVM and SSVM are the same, so we can use the inference procedure of SSVM with RankSVM weights. (All transition features are set to zero)
The weights of RankSVM are computed from the leave-one-out, i.e., a vector of weights for each query in training set.
The dataset used here is Glasgow.
The performance of RankSVM:
F1 (0.767, 0.026), pairsF1 (0.553, 0.047), Tau (0.739, 0.030)
The performance of SSVM using the weights of the above RankSVM:
SSVM-Viterbi: F1 (0.713, 0.032), pairsF1 (0.575, 0.046), Tau (0.740, 0.031)
SSVM-listViterbi: F1 (0.765, 0.026), pairsF1 (0.550, 0.046), Tau (0.739, 0.030)
```
- When the **MaxAbsScaler** was used, SSVM performed much worse.

## SSVM vs RankSVM on generated dataset (Check if multi-label hurts)

**Experiment**
- POI features and transition features are computed on Glasgow dataset, trajectories (i.e., labels) are from generated dataset (#trajectories=125).
- POI and transition features are computed from generated data (NO duration related features)
- Performance are evaluated using leave-one-out cross validation.
- Regularisation parameter $C$ are tuned using Monte-Carlo cross validation at the beginning, then the same $C$ was used for all leave-one-out cross validations.
- Prediction inference in SSVM is the list Viterbi algorithm
- A dedicated notebook ```generated_data.ipynb``` is created for this experiment.

**Conclusion**
- Neither SSVM nor RankSVM can achieve nearly perfect performance, which mean multi-label problem hurt both models.
- RankSVM performs better than SSVM.
- Sample results:
```
SSVM:    F1 (0.699, 0.015), pairsF1 (0.416, 0.021), Tau (0.643, 0.016)
RankSVM: F1 (0.881, 0.011), pairsF1 (0.572, 0.019), Tau (0.829, 0.012)
```