# Setting up occCompare for experiments

The package **occCompaReExp** is used to get specifications required for setting up **occComparRe**. If you want to use **occCompaRe** for your own experiments you need create the settings list and functions.
When a function is used from **occCompaReExp** the namespace is named explicitly. 

## The parcc list

The parameter list contains the information defining the whole set of experiments (here derived from **occCompaReExp**. 
The list must contain the following elements (see help of run_experiments for mor information):

In [2]:
# require(occCompare)
# ?run_experiments
devtools::load_all("../occCompare")
devtools::load_all(".")


Loading occCompare
Loading occCompareExp


In [5]:
parcc <- occCompareExp::get_parcc()
parcc

name,id,nSmpls,nFlds
unlabeled,0,4763538,1
corn,1,147634,1336
oilseeds,2,39816,315
pgrass,3,469643,5685
root,4,8722,87
scereals,5,48701,535
wcereals,6,201480,1934
bforest,7,7721,100
cforest,8,7750,100
others,9,172363,2024


## The ``get_refset()``function

The function ``run_experiments()`` has the argument ``get_refset`` which itself is a function that is defined by the user and has to returns the training, validation and test data for a particular classification problem (CP). Furthermore, a A CP is defined by a seed, the number of positive and unlabeled training samples and the featureset. Here we we show how such a function can look like assuming that the basik data has been set up as in 001_data. However, as long as ``get_refset()`` returns the right thing it does not matter from where the data comes from.

All the information that is required (additional to seed, nP, â€¦) for creating the reference set can be passed to ``get_refset()`` via ``args.rs``.

In [6]:
args.rs <- get_argsRs(dirData=paste0(parcc$bdir, "/data/rdata_agri6clUforest"))
args.rs

It is required for **occCompaRe** that the data frame derived via ``get_refset()`` contains the columns "set" beeing a factor with levels "tr", "val" and "te" and the column "y" with the class IDs.

In [7]:
rs <- get_refset(seed=1, fset="re", nP=60, nU=600,
                 args.rs=args.rs)
str(rs)

'data.frame':	3180 obs. of  22 variables:
 $ set           : Factor w/ 3 levels "tr","va","te": 1 1 1 1 1 1 1 1 1 1 ...
 $ y             : num  1 1 1 1 1 1 1 1 1 1 ...
 $ re_062_0303_B : int  4753 4447 4600 4492 4853 4642 4739 4285 4540 4853 ...
 $ re_062_0303_G : int  3708 3752 3659 3622 4192 3947 4016 3299 3693 3930 ...
 $ re_062_0303_R : int  3602 2908 3296 3034 4000 3231 3252 2971 3076 3352 ...
 $ re_062_0303_RE: int  3220 3281 3203 2843 3463 3280 3959 2756 3543 3102 ...
 $ re_062_0303_IR: int  3089 3988 3103 2765 3132 3365 5054 2375 4286 2939 ...
 $ re_092_0402_B : int  5612 5228 5279 5385 6009 5225 4387 4583 5115 5696 ...
 $ re_092_0402_G : int  5156 4658 4821 4787 5762 4686 3860 3734 4341 5206 ...
 $ re_092_0402_R : int  5334 4260 4371 4579 5515 3936 2196 3268 3390 4687 ...
 $ re_092_0402_RE: int  4762 4135 4467 4180 4926 4124 4082 2954 4825 4197 ...
 $ re_092_0402_IR: int  4662 4512 4654 4372 4716 5191 10452 3042 7516 4258 ...
 $ re_178_0627_B : int  6084 5365 5880 5485 6491 74

In [8]:
table(rs$set)


  tr   va   te 
1140 1140  900 

In [9]:
table(rs$y[rs$set=="tr"])
table(rs$y[rs$set=="va"])
table(rs$y[rs$set=="te"])


  0   1   2   3   4   5   6   7   8   9 
600  60  60  60  60  60  60  60  60  60 


  0   1   2   3   4   5   6   7   8   9 
600  60  60  60  60  60  60  60  60  60 


  1   2   3   4   5   6   7   8   9 
100 100 100 100 100 100 100 100 100 

In [10]:
head(rs)

set,y,re_062_0303_B,re_062_0303_G,re_062_0303_R,re_062_0303_RE,re_062_0303_IR,re_092_0402_B,re_092_0402_G,re_092_0402_R,...,re_178_0627_B,re_178_0627_G,re_178_0627_R,re_178_0627_RE,re_178_0627_IR,re_268_0925_B,re_268_0925_G,re_268_0925_R,re_268_0925_RE,re_268_0925_IR
tr,1,4753,3708,3602,3220,3089,5612,5156,5334,...,6084,5873,5396,5764,8870,4433,3395,2047,2707,7479
tr,1,4447,3752,2908,3281,3988,5228,4658,4260,...,5365,4543,3217,3998,7412,4513,3545,2096,2664,7280
tr,1,4600,3659,3296,3203,3103,5279,4821,4371,...,5880,5414,4722,4966,7119,4499,3362,1947,2583,6745
tr,1,4492,3622,3034,2843,2765,5385,4787,4579,...,5485,5288,3780,4917,9384,4715,3518,2066,2791,6732
tr,1,4853,4192,4000,3463,3132,6009,5762,5515,...,6491,6599,5743,5990,7763,4652,3413,2022,2695,7959
tr,1,4642,3947,3231,3280,3365,5225,4686,3936,...,7400,7951,7320,7407,9259,4562,3597,2186,2860,7451
