Annual Data Science Bow Model Submission

Wei Dong wdong@wdong.org Yuanfang Guan yuanfang.guan@gmail.com

Quick Start

Download the binary release from http://a2genomics.com/static/aaalgo-adbs2.tar.bz2 Our binary release can be run from any X86_64 linux machine. It depends on the ImageMagick package to produce the gif animation (if you have the "convert" command then it's already satisfied). No other software or hardware dependency.

./study test/10/study  output --gif

Then use a browser to open output/index.html to see the visualization.

Our binary release is a compressed version of our final submission and it binary-reproduces the rank 9 score of 0.011645.

How to run model on test set.

Our model (scientific approach) consists of a set of binary programs (study, touchup) and a set of offline pre-trained model files (models/, pre/). These pre-trained models are frozen at model submission and are not to be retrained for the final test set. The bash scripts included in the submission specify the running order and parameters of study and touchup. They might need to be slightly altered to accomodate test data layout.

System Dependency

Hardware: any x86_64 machine with > 4GB memory. Software: 64-bit Linux with a modern kernel (Centos > 2.6, Ubuntu > 12.04)

The package doesn't depend on other software to produce the submission files. The study program needs ImageMagick to produce GIF visualization, but this is not needed for producing the submission files. for producing the submission files.

Data Preparation

Training and validation studies are numbered from 1-700. It is assumed that testing data are also similarly numbered, not using numbers between 1-700 which are already used.

Prepare a file named TEST and list the test study numbers, like the provided TRAIN FILE.

Prepare a directory (or a symbolic link to a directory) named "raw", containing all the training, validating and testing data, like the following structure:

raw/1/study/2ch_21/... raw/1/study/sax_10/... ... raw/700/study/2ch_15/.. raw/700/study/sax_10/.. ....

There is a train.csv file in the directory, which contains the groundtruth data. When validation set is released, the groundtruth data should be merged into this file.

Running the Programs

./run-study.sh
./run-submit1.sh
./run-submit2.sh

The first script does common computation of the two submissions; the following two scripts generate two versions of final submissions. Each of the run-submit scripts generates a series of submit files with priorities specified as below:

{pre/}ws_full1/submit {pre/}ws_full0/submit {pre/}ws_one/submit {pre/}ws_cli/submit

(run-submit1.sh output do not have "pre", run-submit2.sh output has "pre".)

The top submit file in the specified rank that is successfully produced should be used as final submission. In the rare/unexpected event when a high ranked submit file has NaN/Inf entries due to unforeseen failure modes, lines containing such bad entries in the submit file should be manually replaced with the corresponding lines in the next highest ranked submit file that doesn't have any NaN/Inf in the corresponding lines. This manual check and possible replacement is considered one step in our scientific approach.

Training pre-computed models.

Training caffe models.

Our caffe models included in the submission are trained with annotated training set only. These models are considered frozen with the submission and is not to be retrained after the release of test set.

Following recipe is just for reference.

Build the code, run.

caffe/bound/import.sh
caffe/bound/train.sh
caffe/contour/import.sh
caffe/contour/train.sh

We pick the bound parameter of the 562000th iteration and contour parameter of the 450000th iteration.

We haven't tested the binary reproducibility of this process. Our submitted models should be used to produce the final submissions as they are.

Name		Name	Last commit message	Last commit date
Latest commit History 322 Commits
bin		bin
caffe		caffe
caffex-fcn @ e5c96ae		caffex-fcn @ e5c96ae
data		data
models		models
pre		pre
val		val
.gitmodules		.gitmodules
COLUMNS		COLUMNS
LOG		LOG
Makefile.shared		Makefile.shared
Makefile.static		Makefile.static
README.md		README.md
README.txt		README.txt
RELEASE		RELEASE
RUNNING		RUNNING
TRAIN		TRAIN
TRAIN1		TRAIN1
TRAIN2		TRAIN2
TRAIN3		TRAIN3
TRAIN4		TRAIN4
adsb2-ca1.cpp		adsb2-ca1.cpp
adsb2-ca2.cpp		adsb2-ca2.cpp
adsb2-cv.h		adsb2-cv.h
adsb2-io.h		adsb2-io.h
adsb2.cpp		adsb2.cpp
adsb2.h		adsb2.h
best-snapshot.png		best-snapshot.png
bottom-detector.cpp		bottom-detector.cpp
ca2-dai.cpp		ca2-dai.cpp
check.cpp		check.cpp
contributors.txt		contributors.txt
cook.cpp		cook.cpp
crps.cpp		crps.cpp
detect.cpp		detect.cpp
detector-caffe.cpp		detector-caffe.cpp
dicom.cpp		dicom.cpp
dicom.dic		dicom.dic
dpsmooth.cpp		dpsmooth.cpp
dump-1245.cpp		dump-1245.cpp
dump-bottom-feature.cpp		dump-bottom-feature.cpp
dump-top.cpp		dump-top.cpp
em.cpp		em.cpp
em.h		em.h
eval.cpp		eval.cpp
export-polar-tasks.cpp		export-polar-tasks.cpp
gif.sh		gif.sh
heuristics.cpp		heuristics.cpp
import-polar.cpp		import-polar.cpp
import.cpp		import.cpp
import_many.cpp		import_many.cpp
make_gif.cpp		make_gif.cpp
make_release		make_release
obsolete.cpp		obsolete.cpp
osiris-loc.cpp		osiris-loc.cpp
predict.cpp		predict.cpp
propose.cpp		propose.cpp
qsub-buddy.sh		qsub-buddy.sh
qsub-study.sh		qsub-study.sh
rebuild.sh		rebuild.sh
regroup.cpp		regroup.cpp
report.cpp		report.cpp
run-buddy.sh		run-buddy.sh
run-study.sh		run-study.sh
run-submit1.sh		run-submit1.sh
run-submit2.sh		run-submit2.sh
run-val.sh		run-val.sh
runval.sh		runval.sh
sample_db.cpp		sample_db.cpp
scc.cpp		scc.cpp
score.cpp		score.cpp
spline.h		spline.h
stat.cpp		stat.cpp
study.cpp		study.cpp
submit.cpp		submit.cpp
sum.py		sum.py
touchup.cpp		touchup.cpp
train.csv		train.csv
val-buddy.sh		val-buddy.sh
xg.cpp		xg.cpp
xgboost		xgboost
xglinear.conf		xglinear.conf
xgtune.cpp		xgtune.cpp
xstat.cpp		xstat.cpp

ArunkumarRamanan/adsb2

Folders and files

Latest commit

History