Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
No description, website, or topics provided.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
C/C++ implementation for PCDN, SCDN and CDN metioned in the paper: Parallelized Coordinate Descent Newton Method for Efficient L1-Regularized Minimization. https://arxiv.org/abs/1306.4080 Contents ****************************************** Installation ============ `train' Usage ============= `infer' Usage ============= Datasets Download ================= Set #bundle_size, #threads ========================== The Log Files ============= Example ======== ***************************************** Installation ============ On Unix systems, type $ make to build the `train' and `infer' programs. type $ make clean to clean the built files. Run them without arguments to show the usages. The software has been tested on Ubuntu 12.04 x86_64. `train' Usage ============= Usage: train [options] training_file test_file [model_file_name] options: -a algorithm: set algorithm type (default 0) 0 -- CDN 1 -- Shotgun CDN (SCDN) 2 -- Parallel CDN (PCDN) -s solver type : set type of solver (default 0) 0 -- L1-regularized logistic regression with bias term 1 -- L1-regularized L2-loss support vector classification -c cost : set the parameter C (default 1) -e epsilon : set tolerance of termination criterion |f^S(w)|_1 <= eps*min(pos,neg)/l*|f^S(w0)|_1, where f^S(w) is the minimum-norm subgradient at w -g g -n n : to generate the experimental results of CDN using a decreasing epsilon values = eps/g^i, for i = 0,1,...,n-1 (default g=1.0 n=1) -q : quiet mode (no screen outputs) training_file: training set file test_file: test set file model_file_name: model file name If you do not set model_file_name, it will be set as the result file nam e following ".model" `infer' Usage ============= Usage: infer test_file model_file output_file test_file: test set file model_file_name: model file name output_file: output file name Datasets Download ================= Type $ python ./gen_data.py The script will defaultly download 1 data set (real-sim) from LIBSVM Data page. If you want to download more datasets, edit the "data_dict" in 'gen_data.py' to indicate data sets for generation. For those datasets, we do a 80/20 split for training and testing. It then stores *.train and *.test in the 'data' directory. Note that you need bunzip2, which is called by gen_data.py Set #bundle_size, #threads ========================== Edit line 121-123 of src/train.cpp : int g_pcdn_thread_num = 0; //#threads for pcdn. default (set as 0): num_procs -1; otherwise, set as other positive integer int g_bundle_size = 1250; // bundle size for pcdn int g_scdn_thread_num = 8; // #threads for scdn then type $ make The Log Files ============= With each run, two log files will be stored in 'log/' directory, with the name indicating configuration of the specific experiment. For example, 'pcdn_threads_3_bundle_1250_s_0_c_4.0_eps_1e-3_real-sim' 'pcdn_threads_3_bundle_1250_s_0_c_4.0_eps_1e-3_real-sim_verbosity' indicate: algorithm: pcdn, threads: 3, bundle size: 1250, slover: 0, C: 4.0, epsilon: 1e-3, dataset: real-sim. The first log file stores the contents printed on the terminal, the second log file stores outputs of each iteration, which could be used to generate the experimental results. Example ======== real-sim.train and real-sim.test are put as example dataset on the project webpage: real-sim bundle size: 1250 L1-regularized logistic regression with bias term: $./train -a 2 -s 0 -c 4.0 -e 1e-3 ./data/real-sim.train ./data/real-sim.test model_lrb $./infer ./data/real-sim.test model_lrb out_lrb L1-regularized L2-loss support vector classification: $ ./train -a 2 -s 1 -c 1.0 -e 1e-3 ./data/real-sim.train ./data/real-sim.test model_svc $ ./infer ./data/real-sim.test model_svc out_svc