-
Notifications
You must be signed in to change notification settings - Fork 4
/
predict.outliertree.Rd
90 lines (79 loc) · 3.23 KB
/
predict.outliertree.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/outliertree.R
\name{predict.outliertree}
\alias{predict.outliertree}
\title{Predict method for Outlier Tree}
\usage{
\method{predict}{outliertree}(
object,
newdata,
outliers_print = 15L,
min_decimals = 2L,
return_outliers = TRUE,
nthreads = object$nthreads,
...
)
}
\arguments{
\item{object}{An Outlier Tree object as returned by `outlier.tree`.}
\item{newdata}{A Data Frame in which to look for outliers according to the fitted model.}
\item{outliers_print}{How many outliers to print. Pass zero or `NULL` to avoid printing them. Must pass
at least one of `outliers_print` and `return_outliers`.}
\item{min_decimals}{Minimum number of decimals to use when printing numeric values for the flagged
outliers. The number of decimals will be dynamically increased according to the relative magnitudes of the
values being reported. Ignored when passing `outliers_print=0` or `outliers_print=FALSE`.}
\item{return_outliers}{Whether to return the outliers in an R object (otherwise will just print them).}
\item{nthreads}{Number of parallel threads to use. Parallelization is done by rows.}
\item{...}{Not used.}
}
\value{
If passing `return_outliers` = `TRUE`, will return a list of lists with the outliers and their
information (each row is an entry in the first list, with the same names as the rows in the input data
frame), which can be printed into a human-readable format after-the-fact through functions
`print` and `summary` (they do the same thing).
Otherwise, will not return anything, but will print the outliers if any are detected.
Note that, while the object that is returned will display a short summary of only some observations
when printing it in the console, it actually contains information for all rows, and can be subsetted
to obtain information specific to one row.
}
\description{
Predict method for Outlier Tree
}
\details{
Note that after loading a serialized object from `outlier.tree` through `readRDS` or `load`,
it will only de-serialize the underlying C++ object upon running `predict` or `print`, so the first run will
be slower, while subsequent runs will be faster as the C++ object will already be in-memory.
}
\examples{
library(outliertree)
### random data frame with an obvious outlier
nrows = 100
set.seed(1)
df = data.frame(
numeric_col1 = c(rnorm(nrows - 1), 1e6),
numeric_col2 = rgamma(nrows, 1),
categ_col = sample(c('categA', 'categB', 'categC'),
size = nrows, replace = TRUE)
)
### test data frame with another obvious outlier
nrows_test = 50
df_test = data.frame(
numeric_col1 = rnorm(nrows_test),
numeric_col2 = c(-1e6, rgamma(nrows_test - 1, 1)),
categ_col = sample(c('categA', 'categB', 'categC'),
size = nrows_test, replace = TRUE)
)
### fit model on training data
outliers_model = outlier.tree(df, outliers_print=FALSE, nthreads=1)
### find the test outlier
test_outliers = predict(outliers_model, df_test,
outliers_print = 1, return_outliers = TRUE,
nthreads = 1)
### retrieve the outlier info (for row 1) as an R list
test_outliers[[1]]
### to turn it into a 6-column table:
# dt = t(data.table::as.data.table(test_outliers))
}
\seealso{
\link{outlier.tree} \link{print.outlieroutputs}
}