# Caret - R

1. Define Model
2. Define tuning parameter components
3. Define resampling method
4. Choose optimal based on performance

## Data Split

in <- createdDataPartition(data,p=.75)

training <- df[in,]
test <- df[-in,]

## Train Control

trainControl function allows resampling methods to be used to train models. Such as: "boot", "cv", "LOOCV", "LGOCV", "repeatedcv", "timeslice", "none" and "oob". repeatedcv peforms cv multiple times, averaging the results.

Can also use custom peformance summary method via __summaryFunction__.

summaryFunction = twoClassSummary, allows using metrics: sensitivity,specificty and ROC.

__selectionFunction__: Function to determine final model parameters.

__tolerance__: Function used to results with a chosen perc.


## Pre process

Pre process class allows operations on predictors including centering and scaling. Can be interfaced through the train method.

It centers and scales a variable to mean 0 and standard deviation 1. It ensures that the criterion for finding linear combinations of the predictors is based on how much variation they explain and therefore improves the numerical stability.

## TuneGrid

Can take a dataframe with each col the tuning parameter of the method, the values being the values to use. Use the function expand.grid(par = c(), par2 = val). Param tuneGrid in train func.

__tuneLength__ (Another train param):an integer denoting the number of levels for each tuning parameter (ie if there are 2 parameters, tuneLength of 20 = 400 models tested) that should be generated by createGrid. (NOTE: If wanted, this argument must be named.)

## Plotting

plot(fit) Shows the results against the different parameter values.

bwolot() #box whisker
dotplot()
xyplot()
splot() #Scatter plot

## Predict

The predict function has two types class and prob.

## Model Comparison/ Performance

confusionMatrix()
postResample() #Defualt method
twoClassSummary() #ROC
prSummary() #AUC 
mnLogLoss() #Multi class multinominal log liklehood


Lift Curve - TBD
Calibration curve - TBD

### Confusion Matrix

Will produce a confusion matrix with accuracy, kappa and F-measure.

In [None]:
confusionMatrix(pred,truth,mode='everything') #Binary class use.

## Feature Filter / Regularisation

### Univarite Filters

Use stats to compare peformance with different predictors.

sbf(predictors, outcome, sbfControl = sbfControl(),...)

Uses __anova Score__ for each predictors (use multivariate to combine predictors). The mean predictor values are equal across the two classes.

ctrl <- sbfControl(functions = rfSBF, method = "repeatedcv", repeats =5)

## Recursive Feature Elimination

Default: The predictors are ranked then the model is fit for progressively less bottom ranked predictors. Optional to recaulate predictor ranking at every stage.

Beaware of the pitfalls of overfitting using a single training set.

Can choose elimination method, algo 1 : No resampling __refIter()__ and algo 2 : Resampling __rfe()__ . 

__size__ : Param for subset sizes of predictors to be tested

rfeControl$functions needs to be defined based on type of model. 

predictors(rfeResult) gives full list of predictors

## Arbitary Model

If the list for rfeContol is not found one has to be created:

In [None]:
# A simple list example for random forests:

rfRFE <-  list(summary = defaultSummary,
               fit = function(x, y, first, last, ...){
                 library(randomForest)
                 randomForest(x, y, importance = first, ...)
                 },
               pred = function(object, x)  predict(object, x),
               rank = function(object, x, y) {
                 vimp <- varImp(object)
                 vimp <- vimp[order(vimp$Overall,decreasing = TRUE),,drop = FALSE]
                 vimp$var <- rownames(vimp)                  
                 vimp
                 },
               selectSize = pickSizeBest,
               selectVar = pickVars)

# Recipes

In place of x,y formula can use recipes for a wider more customisable set of pre processing and validation tools.

List of uses for recipes:

- Convert qualitative predictor to indicator variable
- Transform data to diff scale
- Extract key features from raw vars

Possible step areas:

- Impute (ie sub missing values with mean)
- Individual transform (ie log)
- Discretisation (Convert cont to discrete)
- Dummy variables and encoding (ie create dummy var)
- Predictor interactions
- Normalisation (ie center)
- Multivariate Transformation (ie PCA)
- Filters (ie highly correlated)
- Row operation (ie sort)
- Others (ie rename)
- Check (ie check for missing)

Use tidy to force implement step.

## GLM Example

In [17]:
library(tidymodels)      # for the recipes package, along with the rest of tidymodels

# Helper packages
library(nycflights13)    # for flight data
library(skimr)           # for variable summaries

── [1mAttaching packages[22m ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidymodels 1.1.0 ──

[32m✔[39m [34mbroom       [39m 1.0.4     [32m✔[39m [34mtibble      [39m 3.2.1
[32m✔[39m [34mdials       [39m 1.2.0     [32m✔[39m [34mtidyr       [39m 1.3.0
[32m✔[39m [34minfer       [39m 1.0.4     [32m✔[39m [34mtune        [39m 1.1.1
[32m✔[39m [34mmodeldata   [39m 1.1.0     [32m✔[39m [34mworkflows   [39m 1.1.3
[32m✔[39m [34mparsnip     [39m 1.1.0     [32m✔[39m [34mworkflowsets[39m 1.0.1
[32m✔[39m [34mpurrr       [39m 1.0.1     [32m✔[39m [34myardstick   [39m 1.2.0
[32m✔[39m [34mrsample     [39m 1.1.1     

── [1mConflicts[22m ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidymodels_conflicts() ──
[31m✖[39m [34mpurrr[39m::[32mdiscard()[39m         mas

In [18]:
set.seed(123)

flight_data <- 
  flights %>% 
 mutate(
    # Convert the arrival delay to a factor
    arr_delay = ifelse(arr_delay >= 30, "late", "on_time"), #change vals to strings
    arr_delay = factor(arr_delay),
    # We will use the date (not date-time) in the recipe below
    date = lubridate::as_date(time_hour) #lubridate is R package for dates
  ) %>% 
  # Only retain the specific columns we will use
  select(dep_time, flight, origin, dest, air_time, distance, 
         carrier, date, arr_delay, time_hour) %>% 
  # Exclude missing data
  na.omit() %>% 
  # For creating models, it is better to have qualitative columns
  # encoded as factors (instead of character strings)
  mutate_if(is.character, as.factor)

flight_data

dep_time,flight,origin,dest,air_time,distance,carrier,date,arr_delay,time_hour
<int>,<int>,<fct>,<fct>,<dbl>,<dbl>,<fct>,<date>,<fct>,<dttm>
517,1545,EWR,IAH,227,1400,UA,2013-01-01,on_time,2013-01-01 05:00:00
533,1714,LGA,IAH,227,1416,UA,2013-01-01,on_time,2013-01-01 05:00:00
542,1141,JFK,MIA,160,1089,AA,2013-01-01,late,2013-01-01 05:00:00
544,725,JFK,BQN,183,1576,B6,2013-01-01,on_time,2013-01-01 05:00:00
554,461,LGA,ATL,116,762,DL,2013-01-01,on_time,2013-01-01 06:00:00
554,1696,EWR,ORD,150,719,UA,2013-01-01,on_time,2013-01-01 05:00:00
555,507,EWR,FLL,158,1065,B6,2013-01-01,on_time,2013-01-01 06:00:00
557,5708,LGA,IAD,53,229,EV,2013-01-01,on_time,2013-01-01 06:00:00
557,79,JFK,MCO,140,944,B6,2013-01-01,on_time,2013-01-01 06:00:00
558,301,LGA,ORD,138,733,AA,2013-01-01,on_time,2013-01-01 06:00:00


In [19]:
flight_data %>% 
  count(arr_delay) %>% 
  mutate(prop = n/sum(n)) #Creates table based on prev result 

arr_delay,n,prop
<fct>,<int>,<dbl>
late,52802,0.1613033
on_time,274544,0.8386967


Flight number and time_hour are identifiers and not wanted in model training.

In [20]:
flight_data %>% 
  skimr::skim(dest, carrier) 

── Data Summary ────────────────────────
                           Values    
Name                       Piped data
Number of rows             327346    
Number of columns          10        
_______________________              
Column type frequency:               
  factor                   2         
________________________             
Group variables            None      

── Variable type: factor ───────────────────────────────────────────────────────
  skim_variable n_missing complete_rate ordered n_unique
[90m1[39m dest                  0             1 FALSE        104
[90m2[39m carrier               0             1 FALSE         16
  top_counts                                    
[90m1[39m ATL: 16837, ORD: 16566, LAX: 16026, BOS: 15022
[90m2[39m UA: 57782, B6: 54049, EV: 51108, DL: 47658    


“'length(x) = 7 > 1' in coercion to 'logical(1)'”


Unnamed: 0_level_0,skim_type,skim_variable,n_missing,complete_rate,factor.ordered,factor.n_unique,factor.top_counts
Unnamed: 0_level_1,<chr>,<chr>,<int>,<dbl>,<lgl>,<int>,<chr>
1,factor,dest,0,1,False,104,"ATL: 16837, ORD: 16566, LAX: 16026, BOS: 15022"
2,factor,carrier,0,1,False,16,"UA: 57782, B6: 54049, EV: 51108, DL: 47658"


In [21]:
# Fix the random numbers by setting the seed 
# This enables the analysis to be reproducible when random numbers are used 
set.seed(222)
# Put .75 of the data into the training set 
data_split <- initial_split(flight_data, prop = .75)

# Create data frames for the two sets:
train_data <- training(data_split)
test_data  <- testing(data_split)

In [22]:
#Instead of the standard model formula with ~, use a recipe:

flights_rec <- 
  recipe(arr_delay ~ ., data = train_data) %>%
  update_role(flight, time_hour, new_role = "ID") %>% #Dont use these two preds in model only for identification
  step_date(date, features = c("dow", "month")) %>%  #create new preds and adds to df             
  step_holiday(date, 
               holidays = timeDate::listHolidays("US"), 
               keep_original_cols = FALSE) %>% #Adds holidays to df, removes data col
  step_dummy(all_nominal_predictors()) %>% #Creates dummy var using selector for all nominal vals
  step_zv(all_predictors()) #Removes constant dummy var


In [23]:
prep(flights_rec)



[36m──[39m [1mRecipe[22m [36m────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────[39m



── Inputs 

Number of variables by role

outcome:   1
predictor: 7
ID:        2



── Training information 

Training data contained 245509 data points and no incomplete rows.



── Operations 

[36m•[39m Date features from: [34mdate[39m | [3mTrained[23m

[36m•[39m Holiday features from: [34mdate[39m | [3mTrained[23m

[36m•[39m Dummy variables from: [34morigin[39m, [34mdest[39m, [34mcarrier[39m, [34mdate_dow[39m, [34mdate_month[39m | [3mTrained[23m

[36m•[39m Zero variance filter removed: [34m<none>[39m | [3mTrained[23m



Recipes do not auto convert factors into dummy numeric values, needs to be done manually! In this case there is a dest value only in the test set not in the training set, so will be given a constant dummy var during training. To avoid this useless predictor step_zv() removes constant pred cols.

In [None]:
#Use a model workflow to combine a model and recipe. (With caret can simply replace formula with recipe)

lr_mod <- 
  logistic_reg() %>% 
  set_engine("glm")

flights_wflow <- 
  workflow() %>% #Create workflow
  add_model(lr_mod) %>% 
  add_recipe(flights_rec)

flights_wflow

In [None]:
flights_fit <- 
  flights_wflow %>% 
  fit(data = train_data)

#predict(flights_fit, test_data)

# RFE Example

In [None]:
library(caret)
library(mlbench)
library(Hmisc)
library(randomForest)

In [None]:
n <- 100
p <- 40
sigma <- 1
set.seed(1)
sim <- mlbench.friedman1(n, sd = sigma) #Produces a X matrix and Y Vector
colnames(sim$x) <- c(paste("real", 1:5, sep = ""), #Create col names real1,real2 etc
                     paste("bogus", 1:5, sep = ""))
bogus <- matrix(rnorm(n * p), nrow = n)

colnames(bogus) <- paste("bogus", 5+(1:ncol(bogus)), sep = "") #Create 45 noise values

x <- cbind(sim$x, bogus)
y <- sim$y

In [None]:
normalization <- preProcess(x) #Defualt pre process is to center and scale

x <- predict(normalization, x)
x <- as.data.frame(x)

subsets <- c(1:5, 10, 15, 20, 25)

In [None]:
set.seed(10)

# Set func for linear models, 10 k folds (number)
ctrl <- rfeControl(functions = lmFuncs,
                   method = "repeatedcv",
                   number = 10,
                   repeats = 10,
                   verbose = FALSE)

lmProfile <- rfe(x, y,
                 sizes = subsets,
                 rfeControl = ctrl)

lmProfile

In [None]:
#Final model
lmProfile$fit

In [None]:
trellis.par.set(caretTheme())
plot(lmProfile, type = c("g", "o"))

In [None]:
lmProfile$resample

In [None]:
densityplot(lmProfile$resample)

# SVM Example

In [5]:
library(caret)
library(recipes)
library(dplyr)
library(QSARdata)

data(AquaticTox)
tox <- AquaticTox_moe2D

typeof(AquaticTox_moe2D)

tox$Activity <- AquaticTox_Outcome$Activity
nrow(tox)

tox

Molecule,moeGao_Abra_L,moeGao_Abra_R,moeGao_Abra_acidity,moeGao_Abra_basicity,moeGao_Abra_pi,moe2D_BCUT_PEOE_0,moe2D_BCUT_PEOE_1,moe2D_BCUT_PEOE_2,moe2D_BCUT_PEOE_3,⋯,moe2D_vdw_vol,moe2D_vsa_acc,moe2D_vsa_don,moe2D_vsa_hyd,moe2D_vsa_other,moe2D_vsa_pol,moe2D_weinerPath,moe2D_weinerPol,moe2D_zagreb,Activity
<fct>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>,<int>,<int>,<dbl>
(d)-limonene,4.729,0.512,0.030,0.126,0.330,-2.696,-0.009128,0.04606,2.639,⋯,232.50,0.000,0,158.10,0.000,0.000,120,11,46,5.29
111-trichloro-2-methyl-2-propanolol(chlorobytanol),4.226,0.527,0.375,0.452,0.712,-2.556,-0.080280,0.05082,2.615,⋯,168.00,0.000,0,147.60,0.000,13.570,58,9,38,3.12
111-trichloroethane,2.790,0.383,0.030,0.101,0.469,-2.262,-0.083670,0.02702,2.350,⋯,110.60,0.000,0,118.20,0.000,0.000,16,0,20,3.40
1122-tetrachloroethane,3.636,0.496,0.168,0.078,0.577,-2.235,-0.101800,0.01105,2.377,⋯,126.60,0.000,0,118.30,13.660,0.000,29,4,22,3.92
112-trichloroethane,3.078,0.401,0.099,0.081,0.509,-2.260,-0.091020,0.03334,2.350,⋯,110.60,0.000,0,106.90,6.831,0.000,18,2,16,3.21
11-dichloroethylene(vinylidene,2.225,0.412,0.030,0.076,0.407,-1.806,-0.415800,0.40630,1.820,⋯,88.76,0.000,0,88.57,6.831,0.000,9,0,12,2.84
1234-tetrachlorobenzene,5.945,1.196,0.030,0.093,0.904,-2.089,-0.686300,0.65950,2.130,⋯,192.70,0.000,0,168.00,0.000,0.000,109,14,48,5.29
123-trichlorobenzene,5.228,1.061,0.030,0.131,0.852,-2.104,-0.680300,0.67320,2.116,⋯,176.70,0.000,0,149.70,0.000,0.000,82,11,42,4.89
123-trichloropropane,3.742,0.430,0.049,0.108,0.559,-2.422,-0.067810,0.04442,2.479,⋯,135.10,0.000,0,129.50,0.000,0.000,31,4,20,3.41
1245-tetrachlorobenzene,6.026,1.196,0.030,0.039,0.947,-2.087,-0.678900,0.65340,2.129,⋯,192.70,0.000,0,168.00,0.000,0.000,111,13,48,5.83


In [2]:
tox <- tox %>%
  select(-Molecule) %>%
  ## Suppose the easy of manufacturability is 
  ## related to the molecular weight of the compound
  mutate(manufacturability  = 1/moe2D_Weight) %>%
  mutate(manufacturability = manufacturability/sum(manufacturability))

In [3]:
#Function to calculate RMSE

model_stats <- function(data, lev = NULL, model = NULL) {
  
  stats <- defaultSummary(data, lev = lev, model = model)
  
  wt_rmse <- function (pred, obs, wts, na.rm = TRUE) 
  sqrt(weighted.mean((pred - obs)^2, wts, na.rm = na.rm))
  
  res <- wt_rmse(pred = data$pred,
                 obs = data$obs, 
                 wts = data$manufacturability)
  c(wRMSE = res, stats)
}

In [11]:
tox_recipe <- recipe(Activity ~ ., data = tox) %>%
#   add_role(manufacturability, new_role = "performance var") %>%
  step_nzv(all_predictors()) %>% #Remove sparse and unbalanced preds
  #Reduce dimension of vsa cols into surf_area.. cols
  step_pca(contains("VSA"), prefix = "surf_area_",  threshold = .95) %>%
  #Remove closely corr pred except from surf_area
  step_corr(all_predictors(), -starts_with("surf_area_"), threshold = .90) %>%
  # Center and scale
  step_center(all_predictors()) %>%
  step_scale(all_predictors())

In [16]:
tox_ctrl <- trainControl(method = "cv")
set.seed(888)
# tox_svm <- train(x, y,
#                  method = "svmRadial", 
#                  metric = "wRMSE",
#                  maximize = FALSE,
#                  tuneLength = 10,
#                  trControl = tox_ctrl,
#                 tuneGrid = expand.grid(C=seq(10,20,length=10),sigma=0.01150696))

# tox_svm

caretFuncs$fit <- function(x,y,first,last,...) train(x, y,
                 method = "svmRadial", 
                 maximize = FALSE,
                 trControl = tox_ctrl,
                tuneGrid = expand.grid(C=12,sigma=0.01150696))


ctrl <- rfeControl(functions = caretFuncs,
                   method = "repeatedcv",
                   repeats = 5,
                   verbose = FALSE)

svmProfile <- rfe(tox_recipe,
                  data = tox,
                 sizes = 2:30,
                 rfeControl = ctrl)

svmProfile

“zero-width neighborhood. make span bigger”
“zero-width neighborhood. make span bigger”
“pseudoinverse used at -0.33036”
“neighborhood radius 0.16727”
“reciprocal condition number  3.2209e-18”
“zero-width neighborhood. make span bigger”
“at  -0.48076”
“radius  0.00075816”
“all data on boundary of neighborhood. make span bigger”
“pseudoinverse used at -0.48076”
“neighborhood radius 0.027535”
“reciprocal condition number  1”
“zero-width neighborhood. make span bigger”
“zero-width neighborhood. make span bigger”
“pseudoinverse used at -0.32498”
“neighborhood radius 0.16663”
“reciprocal condition number  3.2209e-18”
“pseudoinverse used at -0.12468”
“neighborhood radius 0.15113”
“reciprocal condition number  5.1413e-17”
“zero-width neighborhood. make span bigger”
“zero-width neighborhood. make span bigger”
“pseudoinverse used at -0.31844”
“neighborhood radius 0.17133”
“reciprocal condition number  4.6285e-18”
“zero-width neighborhood. make span bigger”
“at  -0.48488”
“radius  0.00073037”
“a

“reciprocal condition number  5.8971e-18”
“zero-width neighborhood. make span bigger”
“at  -0.49015”
“radius  0.00076415”
“all data on boundary of neighborhood. make span bigger”
“pseudoinverse used at -0.49015”
“neighborhood radius 0.027643”
“reciprocal condition number  1”
“zero-width neighborhood. make span bigger”
“zero-width neighborhood. make span bigger”
“at  -0.47202”
“radius  0.00061874”
“all data on boundary of neighborhood. make span bigger”
“pseudoinverse used at -0.47202”
“neighborhood radius 0.024875”
“reciprocal condition number  1”
“zero-width neighborhood. make span bigger”
“zero-width neighborhood. make span bigger”
“pseudoinverse used at -0.33791”
“neighborhood radius 0.16621”
“reciprocal condition number  4.0538e-18”
“zero-width neighborhood. make span bigger”
“at  -0.47731”
“radius  0.00076458”
“all data on boundary of neighborhood. make span bigger”
“pseudoinverse used at -0.47731”
“neighborhood radius 0.027651”
“reciprocal condition number  1”
“zero-width neighbo

“reciprocal condition number  1”
“zero-width neighborhood. make span bigger”
“zero-width neighborhood. make span bigger”
“at  -0.46121”
“radius  0.00074542”
“all data on boundary of neighborhood. make span bigger”
“pseudoinverse used at -0.46121”
“neighborhood radius 0.027302”
“reciprocal condition number  1”
“zero-width neighborhood. make span bigger”
“zero-width neighborhood. make span bigger”
“pseudoinverse used at -0.31709”
“neighborhood radius 0.1726”
“reciprocal condition number  2.4615e-18”
“zero-width neighborhood. make span bigger”
“zero-width neighborhood. make span bigger”
“pseudoinverse used at -0.32085”
“neighborhood radius 0.16936”
“reciprocal condition number  3.2209e-18”
“zero-width neighborhood. make span bigger”
“at  -0.47624”
“radius  0.00076697”
“all data on boundary of neighborhood. make span bigger”
“pseudoinverse used at -0.47624”
“neighborhood radius 0.027694”
“reciprocal condition number  1”
“zero-width neighborhood. make span bigger”
“zero-width neighborhood. 

“zero-width neighborhood. make span bigger”
“zero-width neighborhood. make span bigger”
“at  -0.48315”
“radius  0.0005721”
“all data on boundary of neighborhood. make span bigger”
“pseudoinverse used at -0.48315”
“neighborhood radius 0.023919”
“reciprocal condition number  1”
“zero-width neighborhood. make span bigger”
“zero-width neighborhood. make span bigger”
“pseudoinverse used at -0.32088”
“neighborhood radius 0.16818”
“reciprocal condition number  5.8971e-18”
“pseudoinverse used at -0.11499”
“neighborhood radius 0.15103”
“reciprocal condition number  1.3067e-16”
“zero-width neighborhood. make span bigger”
“at  -0.46654”
“radius  0.00075327”
“all data on boundary of neighborhood. make span bigger”
“pseudoinverse used at -0.46654”
“neighborhood radius 0.027446”
“reciprocal condition number  1”
“zero-width neighborhood. make span bigger”
“zero-width neighborhood. make span bigger”
“pseudoinverse used at -0.3164”
“neighborhood radius 0.17001”
“reciprocal condition number  2.4615e-18”

“neighborhood radius 0.029764”
“reciprocal condition number  1”
“zero-width neighborhood. make span bigger”
“zero-width neighborhood. make span bigger”
“at  -0.47674”
“radius  0.00064882”
“all data on boundary of neighborhood. make span bigger”
“pseudoinverse used at -0.47674”
“neighborhood radius 0.025472”
“reciprocal condition number  1”
“zero-width neighborhood. make span bigger”
“zero-width neighborhood. make span bigger”
“pseudoinverse used at -0.32898”
“neighborhood radius 0.17075”
“reciprocal condition number  3.2209e-18”
“pseudoinverse used at -0.55967”
“neighborhood radius 1.5254”
“reciprocal condition number  0”
“There are other near singularities as well. 2.2627”
“zero-width neighborhood. make span bigger”
“zero-width neighborhood. make span bigger”
“pseudoinverse used at -0.33711”
“neighborhood radius 0.1657”
“reciprocal condition number  4.6285e-18”
“pseudoinverse used at -0.56084”
“neighborhood radius 1.5306”
“reciprocal condition number  0”
“There are other near singular

“reciprocal condition number  8.7085e-18”
“pseudoinverse used at -0.56557”
“neighborhood radius 1.5742”
“reciprocal condition number  4.5989e-16”
“There are other near singularities as well. 2.4114”
“zero-width neighborhood. make span bigger”
“at  -0.48259”
“radius  0.00076208”
“all data on boundary of neighborhood. make span bigger”
“pseudoinverse used at -0.48259”
“neighborhood radius 0.027606”
“reciprocal condition number  1”
“zero-width neighborhood. make span bigger”
“zero-width neighborhood. make span bigger”
“pseudoinverse used at -0.3305”
“neighborhood radius 0.16606”
“reciprocal condition number  5.3091e-17”
“pseudoinverse used at -0.12082”
“neighborhood radius 0.16067”
“reciprocal condition number  2.5476e-17”
“zero-width neighborhood. make span bigger”
“zero-width neighborhood. make span bigger”
“pseudoinverse used at -0.31791”
“neighborhood radius 0.16957”
“reciprocal condition number  2.5561e-17”
“zero-width neighborhood. make span bigger”
“at  -0.47534”
“radius  0.0008134

“neighborhood radius 0.1697”
“reciprocal condition number  3.2209e-18”
“zero-width neighborhood. make span bigger”
“at  -0.47837”
“radius  0.00073426”
“all data on boundary of neighborhood. make span bigger”
“pseudoinverse used at -0.47837”
“neighborhood radius 0.027097”
“reciprocal condition number  1”
“zero-width neighborhood. make span bigger”
“zero-width neighborhood. make span bigger”
“pseudoinverse used at -0.33342”
“neighborhood radius 0.16636”
“reciprocal condition number  4.6285e-18”
“at  -0.55369”
“radius  0.00043634”
“all data on boundary of neighborhood. make span bigger”
“pseudoinverse used at -0.55369”
“neighborhood radius 0.020889”
“reciprocal condition number  1”
“zero-width neighborhood. make span bigger”
“pseudoinverse used at -0.12721”
“neighborhood radius 0.15221”
“reciprocal condition number  6.0028e-17”
“zero-width neighborhood. make span bigger”
“at  -0.48287”
“radius  0.00074317”
“all data on boundary of neighborhood. make span bigger”
“pseudoinverse used at -0.

“neighborhood radius 0.027223”
“reciprocal condition number  1”
“zero-width neighborhood. make span bigger”
“zero-width neighborhood. make span bigger”
“pseudoinverse used at -0.33192”
“neighborhood radius 0.16785”
“reciprocal condition number  5.0462e-17”
“zero-width neighborhood. make span bigger”
“at  -0.47592”
“radius  0.00072951”
“all data on boundary of neighborhood. make span bigger”
“pseudoinverse used at -0.47592”
“neighborhood radius 0.027009”
“reciprocal condition number  1”
“zero-width neighborhood. make span bigger”
“zero-width neighborhood. make span bigger”
“zero-width neighborhood. make span bigger”
“pseudoinverse used at -0.44661”
“neighborhood radius 0.19982”
“reciprocal condition number  0”
“zero-width neighborhood. make span bigger”
“at  -0.47997”
“radius  0.00072553”
“all data on boundary of neighborhood. make span bigger”
“pseudoinverse used at -0.47997”
“neighborhood radius 0.026936”
“reciprocal condition number  1”
“zero-width neighborhood. make span bigger”
“ze

“neighborhood radius 0.17159”
“reciprocal condition number  5.8971e-18”
“pseudoinverse used at -0.56622”
“neighborhood radius 1.5859”
“reciprocal condition number  8.3362e-16”
“There are other near singularities as well. 2.4474”
“zero-width neighborhood. make span bigger”
“at  -0.4891”
“radius  0.00073378”
“all data on boundary of neighborhood. make span bigger”
“pseudoinverse used at -0.4891”
“neighborhood radius 0.027088”
“reciprocal condition number  1”
“zero-width neighborhood. make span bigger”



Recursive feature selection

Outer resampling method: Cross-Validated (10 fold, repeated 5 times) 

Resampling performance over subset size:

 Variables   RMSE Rsquared    MAE RMSESD RsquaredSD   MAESD Num_Resamples
         2 0.7439   0.6774 0.5564 0.1311    0.09716 0.09472            50
         3 0.7456   0.6778 0.5571 0.1306    0.09399 0.09159            50
         4 0.7385   0.6822 0.5397 0.1317    0.09453 0.09676            50
         5 0.7324   0.6887 0.5354 0.1300    0.09185 0.09186            50
         6 0.7267   0.6929 0.5305 0.1289    0.09072 0.09038            50
         7 0.7257   0.6934 0.5293 0.1287    0.09140 0.09002            50
         8 0.7225   0.6969 0.5298 0.1227    0.08749 0.08759            50
         9 0.7180   0.7012 0.5236 0.1239    0.08653 0.08731            50
        10 0.7218   0.6986 0.5225 0.1259    0.08777 0.08811            50
        11 0.7224   0.6988 0.5213 0.1268    0.08827 0.08798            50
        12 0.7177   0.7032 0.5151 0.1243   