-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* setup with the latest vw; online and offline aloi results can be reproduced here * wiki online script * wiki offline few shots script * readme * . * . * scripts updated * seperated multilabel and multiclass * updated xml part * . * multilabel classification scripts * fixed loaded bug in multilabel setting * a fix of nan prediction: initialized the ec.l.simple * update readme * scripts added to demo * updates on scripts * fixed some comments * remove the unique feature function and added sort feature to wikipara scripts * sort namespace indices and then walk through two sorted indices to avoid double for loop * avoided double loop in computing hamming loss * random seed, name changed on descent and insert example rew * add memory tree cc in cmakelist * got rid of write it define in memory tree file, putted it in io buf header * allocated a space in memory tree for designing kprod example, and free it at the end of the learning * Update vowpalwabbit/memory_tree.cc for windows build Co-Authored-By: Jacob Alber <jalber@fernir.com> * Update vowpalwabbit/memory_tree.cc for windows build Co-Authored-By: Jacob Alber <jalber@fernir.com> * Update vowpalwabbit/memory_tree.cc for windows build Co-Authored-By: Jacob Alber <jalber@fernir.com> * Update vowpalwabbit/memory_tree.cc for windows build Co-Authored-By: Jacob Alber <jalber@fernir.com> * typo * Update vowpalwabbit/memory_tree.cc supply default value in option memory_tree Co-Authored-By: Jacob Alber <jalber@fernir.com> * Update vowpalwabbit/memory_tree.cc fix off-by-epsilon issue in windows unit tests Co-Authored-By: Jacob Alber <jalber@fernir.com> * Update vowpalwabbit/memory_tree.cc alpha lower case Co-Authored-By: Jacob Alber <jalber@fernir.com> * lower case alpha in demo scripts * added two tests (online and offline) for cmt * Update test/RunTests extra line Co-Authored-By: Jacob Alber <jalber@fernir.com> * Update test/RunTests stderr Co-Authored-By: Jacob Alber <jalber@fernir.com> * Update test/RunTests stderr Co-Authored-By: Jacob Alber <jalber@fernir.com> * Update test/RunTests test upper case Co-Authored-By: Jacob Alber <jalber@fernir.com> * Update test/RunTests test upper case Co-Authored-By: Jacob Alber <jalber@fernir.com> * staged stderr files in train set ref folder and deleted time output in memory_tree.cc * decrease problem (smaller rcv1) and solution size (bit 15) * updates on stderr files * ignore cache file * dealt with some initilization * . * memory leak * memory leak.. * Update test/RunTests Co-Authored-By: Jacob Alber <jalber@fernir.com> * Update test/RunTests Co-Authored-By: Jacob Alber <jalber@fernir.com>
- Loading branch information
1 parent
407673f
commit a4475d5
Showing
17 changed files
with
1,769 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
Contextual Memory Tree (CMT) | ||
=============================== | ||
|
||
This demo exercises CMT for applications of logarithmic time | ||
multiclass classification (online and offline), and logarithmic time multilabel classification. | ||
|
||
|
||
The datasets for multiclass classification used are [ALOI](http://aloi.science.uva.nl/) and WikiPara. ALOI | ||
has 1000 classes, and each class has in average 100 training examples. WikiPara | ||
contains 10000 classes. We consider two versions of WikiPara here: 1-shot version which | ||
contains 1 training example per class, and 2-shot version which contains 2 training examples per class. | ||
|
||
The datasets for multilabel classification used are RCV1-2K, AmazonCat-13K, and Wiki10-31K from the XML [repository](http://manikvarma.org/downloads/XC/XMLRepository.html). | ||
|
||
We refer users to the [manuscript](https://arxiv.org/pdf/1807.06473.pdf) for detailed datastrutures and algorithms in CMT | ||
|
||
## Dependency: | ||
python 3 | ||
|
||
## Training Online Contextual Memory Tree on ALOI and WikiPara: | ||
```bash | ||
python aloi_script_progerror.py | ||
python wikipara10000_script_progerror.py | ||
``` | ||
|
||
## Training Offline Contextual Memory Tree on ALOI, WikiPara, RCV1-2K, AmazonCat-13K and Wiki10-31K: | ||
```bash | ||
python aloi_script.py | ||
python wikipara10000_script.py | ||
python xml_rcv1x.script.py | ||
python xml_amazoncat_13K_script.py | ||
python xml_wiki10.script.py | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
import os | ||
import time | ||
import numpy as np | ||
|
||
|
||
#for shot in available_shots.iterkeys(): | ||
print("## perform experiments on aloi ##") | ||
num_of_classes = 1000 | ||
leaf_example_multiplier = 4 #8 | ||
shots = 100 | ||
lr = 0.001 | ||
bits = 29 | ||
alpha = 0.1 #0.3 | ||
passes = 3 #3 #5 | ||
use_oas = 0 | ||
dream_at_update = 0 | ||
learn_at_leaf = 1 #turn on leaf at leaf actually works better | ||
num_queries = 5 #int(np.log(passes*num_of_classes*shots)) | ||
loss = "squared" | ||
dream_repeats = 3 | ||
online = 0 | ||
|
||
tree_node = int(2*passes*(num_of_classes*shots/(np.log(num_of_classes*shots)/np.log(2)*leaf_example_multiplier))); | ||
|
||
train_data = "aloi_train.vw" | ||
test_data = "aloi_test.vw" | ||
if os.path.exists(train_data) is not True: | ||
os.system("wget http://kalman.ml.cmu.edu/wen_datasets/{}".format(train_data)) | ||
if os.path.exists(test_data) is not True: | ||
os.system("wget http://kalman.ml.cmu.edu/wen_datasets/{}".format(test_data)) | ||
|
||
|
||
saved_model = "{}.vw".format(train_data) | ||
|
||
print("## Training...") | ||
start = time.time() | ||
command_train = "../../build/vowpalwabbit/vw {} --memory_tree {} --learn_at_leaf {} --max_number_of_labels {} --dream_at_update {} --dream_repeats {} --oas {} --online {} --leaf_example_multiplier {} --alpha {} -l {} -b {} -c --passes {} --loss_function {} --holdout_off -f {}".format( | ||
train_data, tree_node, learn_at_leaf, num_of_classes, dream_at_update, | ||
dream_repeats, use_oas, online, leaf_example_multiplier, alpha, lr, bits, passes, loss, saved_model) | ||
print(command_train) | ||
os.system(command_train) | ||
train_time = time.time() - start | ||
|
||
#test: | ||
print("## Testing...") | ||
start = time.time(); | ||
os.system("../../build/vowpalwabbit/vw {} -i {}".format(test_data, saved_model)) | ||
|
||
test_time = time.time() - start | ||
|
||
print("## train time {}, and test time {}".format(train_time, test_time)) | ||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
import os | ||
import time | ||
import numpy as np | ||
#from IPython import embed | ||
|
||
|
||
#for shot in available_shots.iterkeys(): | ||
print("## perform experiments on aloi ##") | ||
num_of_classes = 1000 | ||
leaf_example_multiplier = 10 | ||
shots = 100 | ||
lr = 0.001 | ||
bits = 29 | ||
alpha = 0.1 #0.3 | ||
passes = 1 #3 #5 | ||
use_oas = 0 | ||
dream_at_update = 0 | ||
learn_at_leaf = 1 #turn on leaf at leaf actually works better | ||
loss = "squared" | ||
dream_repeats = 20 #3 | ||
online = 1 | ||
#random_seed = 4000 | ||
|
||
tree_node = int(2*passes*(num_of_classes*shots/(np.log(num_of_classes*shots)/np.log(2)*leaf_example_multiplier))); | ||
|
||
train_data = "aloi_train.vw" | ||
test_data = "aloi_test.vw" | ||
if os.path.exists(train_data) is not True: | ||
os.system("wget http://kalman.ml.cmu.edu/wen_datasets/{}".format(train_data)) | ||
if os.path.exists(test_data) is not True: | ||
os.system("wget http://kalman.ml.cmu.edu/wen_datasets/{}".format(test_data)) | ||
|
||
|
||
saved_model = "{}.vw".format(train_data) | ||
|
||
print("## Training...") | ||
start = time.time() | ||
os.system("../../build/vowpalwabbit/vw {} --memory_tree {} --learn_at_leaf {} --max_number_of_labels {} --dream_at_update {}\ | ||
--dream_repeats {} --oas {} --online {}\ | ||
--leaf_example_multiplier {} --alpha {} -l {} -b {} -c --passes {} --loss_function {} --holdout_off -f {}".format( | ||
train_data, tree_node, learn_at_leaf, num_of_classes, dream_at_update, | ||
dream_repeats, use_oas, online, leaf_example_multiplier, alpha, lr, bits, passes, loss, saved_model)) | ||
train_time = time.time() - start | ||
|
||
#test: | ||
#print "## Testing..." | ||
#start = time.time(); | ||
#os.system(".././vw {} -i {}".format(test_data, saved_model)) | ||
|
||
#test_time = time.time() - start | ||
|
||
print("## train time {}, and test time {}".format(train_time, test_time)) | ||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
import os | ||
import time | ||
import numpy as np | ||
#from IPython import embed | ||
|
||
|
||
available_shots = {'three':3, "one":1} | ||
#available_shots = {'three':3} | ||
|
||
for shot,shots in available_shots.items(): | ||
print("## perform experiments on {}-shot wikipara-10K ##".format(shot)) | ||
#shots = available_shots[shot] | ||
num_of_classes = 10000 | ||
leaf_example_multiplier = 4 #2 | ||
lr = 0.1 | ||
bits = 29#30 | ||
passes = 2 #1 | ||
#hal_version = 1 | ||
#num_queries = 1 #int(np.log(shots*num_of_classes)/np.log(2.)) | ||
alpha = 0.1 | ||
learn_at_leaf = 1 | ||
use_oas = 0 | ||
dream_at_update = 1 | ||
dream_repeats = 5 | ||
loss = "squared" | ||
online = 0 | ||
sort_feature = 1 | ||
|
||
tree_node = int(2*passes*(num_of_classes*shots/(np.log(num_of_classes*shots)/np.log(2)*leaf_example_multiplier))); | ||
|
||
train_data = "paradata10000_{}_shot.vw.train".format(shot) | ||
test_data = "paradata10000_{}_shot.vw.test".format(shot) | ||
if os.path.exists(train_data) is not True: | ||
os.system("wget http://kalman.ml.cmu.edu/wen_datasets/{}".format(train_data)) | ||
if os.path.exists(test_data) is not True: | ||
os.system("wget http://kalman.ml.cmu.edu/wen_datasets/{}".format(test_data)) | ||
|
||
saved_model = "{}.vw".format(train_data) | ||
|
||
print("## Training...") | ||
start = time.time() | ||
os.system("../../build/vowpalwabbit/vw {} --memory_tree {} --learn_at_leaf {} --max_number_of_labels {} --oas {} --online {} --dream_at_update {}\ | ||
--leaf_example_multiplier {} --dream_repeats {} --sort_features {}\ | ||
--alpha {} -l {} -b {} -c --passes {} --loss_function {} --holdout_off -f {}".format( | ||
train_data, | ||
tree_node, learn_at_leaf, num_of_classes, use_oas, online, dream_at_update, | ||
leaf_example_multiplier, dream_repeats, sort_feature, alpha, lr, bits, passes, loss, saved_model)) | ||
train_time = time.time() - start | ||
|
||
#test: | ||
print("## Testing...") | ||
start = time.time(); | ||
os.system("../../build/vowpalwabbit/vw {} -i {}".format(test_data, saved_model)) | ||
|
||
test_time = time.time() - start | ||
|
||
|
||
print("## train time {}, and test time {}".format(train_time, test_time)) | ||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
import os | ||
import time | ||
import numpy as np | ||
|
||
|
||
#available_shots = {'three':3, "one":1} | ||
available_shots = {'three':3} | ||
|
||
for shot,shots in available_shots.items(): | ||
print("## perform experiments on {}-shot wikipara-10K ##".format(shot)) | ||
#shots = available_shots[shot] | ||
num_of_classes = 10000 | ||
leaf_example_multiplier = 10 #2 | ||
lr = 0.1 | ||
bits = 29#30 | ||
passes =1# 2 | ||
#hal_version = 1 | ||
#num_queries = 1 #int(np.log(shots*num_of_classes)/np.log(2.)) | ||
alpha = 0.1 | ||
learn_at_leaf = 0 | ||
use_oas = 0 | ||
dream_at_update = 1 | ||
dream_repeats = 15 | ||
loss = "squared" | ||
online = 1 | ||
sort_feature = 1 | ||
|
||
tree_node = int(2*passes*(num_of_classes*shots/(np.log(num_of_classes*shots)/np.log(2)*leaf_example_multiplier))); | ||
|
||
train_data = "paradata10000_{}_shot.vw.train".format(shot) | ||
test_data = "paradata10000_{}_shot.vw.test".format(shot) | ||
if os.path.exists(train_data) is not True: | ||
os.system("wget http://kalman.ml.cmu.edu/wen_datasets/{}".format(train_data)) | ||
if os.path.exists(test_data) is not True: | ||
os.system("wget http://kalman.ml.cmu.edu/wen_datasets/{}".format(test_data)) | ||
|
||
saved_model = "{}.vw".format(train_data) | ||
|
||
print("## Training...") | ||
start = time.time() | ||
os.system("../../build/vowpalwabbit/vw {} --memory_tree {} --learn_at_leaf {} --max_number_of_labels {} --oas {} --online {} --dream_at_update {}\ | ||
--leaf_example_multiplier {} --dream_repeats {} --sort_features {}\ | ||
--alpha {} -l {} -b {} -c --passes {} --loss_function {} --holdout_off -f {}".format( | ||
train_data, tree_node, learn_at_leaf, num_of_classes, use_oas, online, dream_at_update, | ||
leaf_example_multiplier, dream_repeats, sort_feature, alpha, lr, bits, passes, loss, saved_model)) | ||
train_time = time.time() - start | ||
|
||
#test: | ||
#print "## Testing..." | ||
#start = time.time(); | ||
#os.system(".././vw {} -i {}".format(test_data, saved_model)) | ||
|
||
#test_time = time.time() - start | ||
|
||
|
||
#print "## train time {}, and test time {}".format(train_time, test_time) | ||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
import os | ||
import time | ||
import numpy as np | ||
#from IPython import embed | ||
|
||
print("perform experiments on amazoncat 13K (multilabel)") | ||
leaf_example_multiplier = 2 | ||
lr = 1 | ||
bits = 30 | ||
alpha = 0.1 #0.3 | ||
passes = 4 | ||
learn_at_leaf = 1 | ||
use_oas = 1 | ||
#num_queries = 1 #does not really use | ||
dream_at_update = 1 | ||
#hal_version = 1 #does not really use | ||
loss = "squared" | ||
dream_repeats = 3 | ||
#Precision_at_K = 5 | ||
|
||
num_examples = 1186239 | ||
max_num_labels = 13330 | ||
|
||
tree_node = int(num_examples/(np.log(num_examples)/np.log(2)*leaf_example_multiplier)) | ||
train_data = "amazoncat_train.mat.mult_label.vw.txt" | ||
test_data = "amazoncat_test.mat.mult_label.vw.txt" | ||
|
||
if os.path.exists(train_data) is not True: | ||
os.system("wget http://kalman.ml.cmu.edu/wen_datasets/{}".format(train_data)) | ||
if os.path.exists(test_data) is not True: | ||
os.system("wget http://kalman.ml.cmu.edu/wen_datasets/{}".format(test_data)) | ||
|
||
saved_model = "{}.vw".format(train_data) | ||
|
||
print("## Training...") | ||
start = time.time() | ||
#train_data = 'tmp_rcv1x.vw.txt' | ||
os.system("../../build/vowpalwabbit/vw {} --memory_tree {} --learn_at_leaf {} --dream_at_update {}\ | ||
--max_number_of_labels {} --dream_repeats {} --oas {} \ | ||
--leaf_example_multiplier {} --alpha {} -l {} -b {} -c --passes {} --loss_function {} --holdout_off -f {}".format( | ||
train_data, tree_node, learn_at_leaf, dream_at_update, | ||
max_num_labels, dream_repeats, use_oas, | ||
leaf_example_multiplier, | ||
alpha, lr, bits, | ||
passes, loss, | ||
saved_model)) | ||
train_time = time.time() - start | ||
|
||
print("## Testing...") | ||
start = time.time() | ||
os.system("../../build/vowpalwabbit/vw {} --oas {} -i {}".format(test_data,use_oas, saved_model)) | ||
test_time = time.time() - start | ||
print("## train time {}, and test time {}".format(train_time, test_time)) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
import os | ||
import time | ||
import numpy as np | ||
#from IPython import embed | ||
|
||
print("perform experiments on rcv1x (multilabel)") | ||
leaf_example_multiplier = 2 | ||
lr = 0.1 | ||
bits = 30 | ||
alpha = 0.1 | ||
passes = 6 #4 | ||
learn_at_leaf = 1 | ||
use_oas = 1 | ||
dream_at_update =0 # 1 | ||
#num_queries = 1 #does not really use | ||
#hal_version = 1 #does not really use | ||
loss = "squared" | ||
dream_repeats = 3 | ||
#Precision_at_K = 5 | ||
|
||
num_examples = 630000 | ||
max_num_labels = 2456 | ||
|
||
tree_node = int(num_examples/(np.log(num_examples)/np.log(2)*leaf_example_multiplier)) | ||
train_data = "rcv1x_train.mat.mult_label.vw.txt" | ||
test_data = "rcv1x_test.mat.mult_label.vw.txt" | ||
if os.path.exists(train_data) is not True: | ||
os.system("wget http://kalman.ml.cmu.edu/wen_datasets/{}".format(train_data)) | ||
if os.path.exists(test_data) is not True: | ||
os.system("wget http://kalman.ml.cmu.edu/wen_datasets/{}".format(test_data)) | ||
|
||
saved_model = "{}.vw".format(train_data) | ||
|
||
print("## Training...") | ||
start = time.time() | ||
#train_data = 'tmp_rcv1.vw.txt' | ||
os.system("../../build/vowpalwabbit/vw {} --memory_tree {} --learn_at_leaf {} --dream_at_update {}\ | ||
--max_number_of_labels {} --dream_repeats {} --oas {} \ | ||
--leaf_example_multiplier {} --alpha {} -l {} -b {} -c --passes {} --loss_function {} -f {}".format( | ||
train_data, tree_node, learn_at_leaf, dream_at_update, | ||
max_num_labels, dream_repeats,use_oas, | ||
leaf_example_multiplier, | ||
alpha, lr, bits, | ||
passes, loss, | ||
saved_model)) | ||
train_time = time.time() - start | ||
|
||
print("## Testing...") | ||
start = time.time() | ||
os.system("../../build/vowpalwabbit/vw {} --oas {} -i {}".format(test_data, use_oas, saved_model)) | ||
test_time = time.time() - start | ||
print("## train time {}, and test time {}".format(train_time, test_time)) | ||
|
Oops, something went wrong.