Support Vector Machine (SVM)

SVM is used for classification and regression analysis.

1. Introduction

SVM solves the following optimization problem:

where is the regularization term; is the regularization coefficient; is the hinge loss as visualized below:

2. Distributed Implementation on Angel

Angel MLLib uses mini-batch gradient descent optimization method for solving SVM's objective; the algorithm is shown below:

3. Execution

Input Format

Data fromat is set in "ml.data.type", supporting "libsvm" and "dummy" types. For details, see Angel Data Format
Feature vector's dimension is set in "ml.feature.num"

Parameters

Algorithm Parameters
- ml.epoch.num: number of epochs
- ml.batch.sample.ratio: sampling rate for each epoch
- ml.num.update.per.epoch: number of mini-batches in each epoch
- ml.data.validate.ratio: proportion of data used for validation, no validation when set to 0
- ml.learn.rate: initial learning rate
- ml.learn.decay: decay rate of the learning rate
- ml.svm.reg.l2: coefficient of the L2 penalty
I/O Parameters
- angel.train.data.path: input path for train
- angel.predict.data.path: input path for predict
- ml.feature.num: number of features
- ml.data.type: Angel Data Format, supporting "dummy" and "libsvm"
- angel.save.model.path: save path for trained model
- angel.predict.out.path: output path for predict
- angel.log.path: save path for the log
Resource Parameters
- angel.workergroup.number: number of workers
- angel.worker.memory.mb: worker's memory requested in G
- angel.worker.task.number: number of tasks on each worker, default is 1
- angel.ps.number: number of PS
- angel.ps.memory.mb: PS's memory requested in G

Submit Command

You can submit job by setting the parameters above one by one in the script or construct network by json file as follows (see Json description for a complete description of the Json configuration file)

If you use both parameters and json in script, parameters in script have higher priority.
If you only use parameters in script you must change ml.model.class.name as --ml.model.class.name com.tencent.angel.ml.classification.SupportVectorMachine and do not set this parameter angel.ml.conf which is for json file path. Here we provide an example submitted by using json file(see data)

{
  "data": {
    "format": "dummy",
    "indexrange": 148,
    "numfield": 13,
    "validateratio": 0.1
  },
  "model": {
    "modeltype": "T_DOUBLE_SPARSE_LONGKEY",
    "modelsize": 148
  },
  "train": {
    "epoch": 10,
    "numupdateperepoch": 10,
    "lr": 0.1,
    "decay": 0.8
  },
   "default_optimizer": {
    "type": "momentum",
    "momentum": 0.9,
    "reg2": 0.01
  },
  "layers": [
    {
      "name": "wide",
      "type": "simpleinputlayer",
      "outputdim": 1,
      "transfunc": "identity"
    },
    {
      "name": "simplelosslayer",
      "type": "simplelosslayer",
      "lossfunc": "hingeloss",
      "inputlayer": "wide"
    }
  ]
}

Submit Command Training Job

 runner="com.tencent.angel.ml.core.graphsubmit.GraphRunner"
 modelClass="com.tencent.angel.ml.core.graphsubmit.AngelModel"
 
 $ANGEL_HOME/bin/angel-submit \
     --angel.job.name svm \
     --action.type train \
     --angel.app.submit.class $runner \
     --ml.model.class.name $modelClass \
     --angel.train.data.path $input_path \
     --angel.save.model.path $model_path \
     --angel.log.path $log_path \
     --angel.workergroup.number $workerNumber \
     --angel.worker.memory.gb $workerMemory  \
     --angel.worker.task.number $taskNumber \
     --angel.ps.number $PSNumber \
     --angel.ps.memory.gb $PSMemory \
     --angel.output.path.deleteonexist true \
     --angel.task.data.storage.level $storageLevel \
     --angel.task.memorystorage.max.gb $taskMemory \
     --angel.worker.env "LD_PRELOAD=./libopenblas.so" \
     --angel.ml.conf $svm_json_path \
     --ml.optimizer.json.provider com.tencent.angel.ml.core.PSOptimizerProvider

Prediction Job

 runner="com.tencent.angel.ml.core.graphsubmit.GraphRunner"
 modelClass="com.tencent.angel.ml.core.graphsubmit.AngelModel"
 
 $ANGEL_HOME/bin/angel-submit \
     --angel.job.name svm \
     --action.type predict \
     --angel.app.submit.class $runner \
     --ml.model.class.name $modelClass \
     --angel.predict.data.path $input_path \
     --angel.load.model.path $model_path \
     --angel.predict.out.path $predictout \
     --angel.log.path $log_path \
     --angel.workergroup.number $workerNumber \
     --angel.worker.memory.gb $workerMemory  \
     --angel.worker.task.number $taskNumber \
     --angel.ps.number $PSNumber \
     --angel.ps.memory.gb $PSMemory \
     --angel.output.path.deleteonexist true \
     --angel.task.data.storage.level $storageLevel \
     --angel.task.memorystorage.max.gb $taskMemory \
     --angel.worker.env "LD_PRELOAD=./libopenblas.so" \
     --angel.ml.conf $svm_json_path \
     --ml.optimizer.json.provider com.tencent.angel.ml.core.PSOptimizerProvider

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

svm_on_angel_en.md

svm_on_angel_en.md

Support Vector Machine (SVM)

1. Introduction

2. Distributed Implementation on Angel

3. Execution

Input Format

Parameters

Submit Command

Files

svm_on_angel_en.md

Latest commit

History

svm_on_angel_en.md

File metadata and controls

Support Vector Machine (SVM)

1. Introduction

2. Distributed Implementation on Angel

3. Execution

Input Format

Parameters

Submit Command