Skip to content

Latest commit

 

History

History
69 lines (52 loc) · 2.97 KB

cscw2013.md

File metadata and controls

69 lines (52 loc) · 2.97 KB

CSCW2013 Pipeline

To use the CSCW2013 pipeline, run ALOE with "CSCW2013" as follows:

java -jar aloe.jar CSCW2013 MODE OPTIONS...

This pipeline includes the functionality described in the following paper:

Brooks, M., Kuksenok, K., Torkildson, M. K., Perry, D., Robinson, J. J.,
  Scott, T. J., Anicello, O., Zukowski, A.,  Harris, P., Aragon, C. R. 2013.
  Statistical Affect Detection in Collaborative Chat. Proceedings of CSCW 2013. ACM.

The pipeline involves using a segmentation procedure to group related messages together, extracting a wide variety of features from the segments, and using a linear Support Vector Machine classifier to label the segments.

Below, the special options available when using the CSCW2013 pipeline are explained.

Train Mode

In "train" mode, the CSCW2013 pipeline accepts the following options in addition to the basic options.

Note that when the --roc option is enabled, logistic models will be fit to the SVM outputs so that proper probabilities can be extracted. This is necessary for the ROC curve.

Adjust the class frequencies:

  • --downsample, -ds: Use downsampling in the training data on the majority class to suit the cost ratio.
  • --upsample, -us: Use upsampling in the training data on the minority class in to suit the cost ratio.
  • --balance-test-set: Apply the selected balancing algorithm to the test set as well as the training set.

Use Weka's CostSensitiveClassifier:

  • --reweight, -rw: Reweight the training data to suit the cost ratio.
  • --min-cost: Train a classifier that uses the min-cost criterion.

Set custom mis-classification costs:

  • --fn-cost COST: Set a cost for false negatives (default 1).
  • --fp-cost COST: Set a cost for false positives (default 1).

Segmentation options:

  • --ignore-participants: Ignore participants during segmentation. By default, messages from different participants are teased apart into different segments.
  • --threshold SECONDS, -t SECONDS: Segmentation threshold in seconds (default 30). A gap between messages of more than this threshold starts a new segment.
  • --no-segmentation: Disable segmentation (each message is in its own segment).

Other options:

  • --emoticons FILE, -e FILE: Custom emoticon dictionary file (default emoticons.txt).
  • --folds FOLDS, -k FOLDS: Set the number of cross-validation folds (default 10, use 0 to disable cross validation).

Label Mode

Set custom mis-classification costs:

  • --fn-cost COST: Set a cost for false negatives (default 1).
  • --fp-cost COST: Set a cost for false positives (default 1).

Segmentation options:

  • --ignore-participants: Ignore participants during segmentation. By default, messages from different participants are teased apart into different segments.
  • --threshold SECONDS, -t SECONDS: Segmentation threshold in seconds (default 30). A gap between messages of more than this threshold starts a new segment.
  • --no-segmentation: Disable segmentation (each message is in its own segment).