In [1]:
df = read.csv('http://cssbook.net/d/mediause.csv')
model = lm(formula = 'newspaper ~ age + gender', data = df)
# summary(model) would give a lot more info, but we only care about the coefficients:
model


Call:
lm(formula = "newspaper ~ age + gender", data = df)

Coefficients:
(Intercept)          age       gender  
   -0.08956      0.06762      0.17666  


In [2]:
gender = c(1,0)
age = c(20,40)
newdata = data.frame(age, gender)
predict(model, newdata)

In [3]:
library(tidyverse)
library(rsample)
library(glue)

df = read.csv('http://cssbook.net/d/mediause.csv')
df = na.omit(df %>% mutate(usesinternet=recode(internet, .default=TRUE, `0`=FALSE)))

set.seed(42)
df$usesinternet = as.factor(df$usesinternet)
print("How many people used online news at all?")
print(table(df$usesinternet))


split = initial_split(df, prop = .8)
traindata = training(split)
testdata  = testing(split)

X_train = select(traindata, c('age', 'gender', 'education'))
y_train = traindata$usesinternet
X_test = select(testdata, c('age', 'gender', 'education'))
y_test = testdata$usesinternet

print(glue("We have {nrow(X_train)} training and {nrow(X_test)} test cases."))

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──
[32m✔[39m [34mggplot2[39m 3.2.1     [32m✔[39m [34mpurrr  [39m 0.3.3
[32m✔[39m [34mtibble [39m 2.1.3     [32m✔[39m [34mdplyr  [39m 0.8.3
[32m✔[39m [34mtidyr  [39m 1.0.0     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.3.1     [32m✔[39m [34mforcats[39m 0.4.0
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

Attaching package: ‘glue’

The following object is masked from ‘package:dplyr’:

    collapse



[1] "How many people used online news at all?"

FALSE  TRUE 
  803  1262 
We have 1653 training and 412 test cases.


In [4]:
library(caret)
library(naivebayes)

myclassifier = train(x = X_train, y = y_train, method = "naive_bayes")
y_pred = predict(myclassifier, newdata = X_test)

Loading required package: lattice

Attaching package: ‘caret’

The following object is masked from ‘package:purrr’:

    lift

naivebayes 0.9.6 loaded


In [5]:
print(confusionMatrix(y_pred, y_test))

print("Confusion matrix:")
confmat = table(testdata$usesinternet, y_pred)
print(confmat)

print('Precision for predicting True internet users and non-internet-users, respecitively:')
precision = diag(confmat) / rowSums(confmat)
print(precision)


print('Recall for predicting True internet users and non-internet-users, respecitively:')
recall = (diag(confmat) / colSums(confmat))
print(recall)

Confusion Matrix and Statistics

          Reference
Prediction FALSE TRUE
     FALSE    39   50
     TRUE     98  225
                                          
               Accuracy : 0.6408          
                 95% CI : (0.5924, 0.6872)
    No Information Rate : 0.6675          
    P-Value [Acc > NIR] : 0.8849694       
                                          
                  Kappa : 0.1128          
                                          
 Mcnemar's Test P-Value : 0.0001118       
                                          
            Sensitivity : 0.28467         
            Specificity : 0.81818         
         Pos Pred Value : 0.43820         
         Neg Pred Value : 0.69659         
             Prevalence : 0.33252         
         Detection Rate : 0.09466         
   Detection Prevalence : 0.21602         
      Balanced Accuracy : 0.55143         
                                          
       'Positive' Class : FALSE           
                     

In [6]:
library(tidyverse)
library(caret)
library(LogicReg)

myclassifier = train(x = X_train, y = y_train, method = "logreg")
y_pred = predict(myclassifier, newdata = X_test)


Loading required package: survival

Attaching package: ‘survival’

The following object is masked from ‘package:caret’:

    cluster

“model fit failed for Resample01: ntrees=2, treesize= 4 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample01: ntrees=3, treesize= 4 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample01: ntrees=4, treesize= 4 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample01: ntrees=2, treesize= 8 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees

“model fit failed for Resample04: ntrees=3, treesize= 8 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample04: ntrees=4, treesize= 8 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample04: ntrees=2, treesize=16 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample04: ntrees=3, treesize=16 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model f

“model fit failed for Resample07: ntrees=4, treesize=16 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample08: ntrees=2, treesize= 4 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample08: ntrees=3, treesize= 4 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample08: ntrees=4, treesize= 4 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model f

“model fit failed for Resample11: ntrees=2, treesize= 8 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample11: ntrees=3, treesize= 8 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample11: ntrees=4, treesize= 8 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample11: ntrees=2, treesize=16 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model f

“model fit failed for Resample14: ntrees=3, treesize=16 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample14: ntrees=4, treesize=16 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample15: ntrees=2, treesize= 4 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample15: ntrees=3, treesize= 4 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model f

“model fit failed for Resample18: ntrees=4, treesize= 4 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample18: ntrees=2, treesize= 8 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample18: ntrees=3, treesize= 8 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample18: ntrees=4, treesize= 8 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model f

“model fit failed for Resample21: ntrees=2, treesize=16 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample21: ntrees=3, treesize=16 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample21: ntrees=4, treesize=16 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample22: ntrees=2, treesize= 4 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model f

“model fit failed for Resample25: ntrees=3, treesize= 4 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample25: ntrees=4, treesize= 4 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample25: ntrees=2, treesize= 8 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model fit failed for Resample25: ntrees=3, treesize= 8 Error in LogicReg::logreg(resp = y, bin = x, ntrees = param$ntrees, tree.control = LogicReg::logreg.tree.control(treesize = param$treesize),  : 
  some non binary data among binary predictors
“model f

Something is wrong; all the Accuracy metric values are missing:
    Accuracy       Kappa    
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :9     NA's   :9    


ERROR: Error: Stopping


In [9]:
library(tidyverse)
library(caret)
library(LiblineaR)

# !!! We normalize our features to have M = 0 and SD = 1, which we do with the preProcess argument
# This is necessary as our features are not measured on the same scale, which SVM requires
# It may also be OK to rescale to a range of [0:1] or [-1:1]

myclassifier = train(x = X_train, y = y_train,  preProcess = c("center", "scale"), method = "svmLinear3")
y_pred = predict(myclassifier, newdata = X_test)

In [None]:
library(tidyverse)
library(caret)
library(randomForest)

myclassifier = train(x = X_train, y = y_train, method = "rf")
y_pred = predict(myclassifier, newdata = X_test)
