# auditor - introduction into model audit

The diagnostic analysis is well researched and commonly used to validate linear models, while it is often neglected for complex black-box models. 

This notebook is a gentle introduction to the `auditor` package which is a uniform interface to statistics and visualizations that facilitate assessing and comparing the goodness of fit, performance, diagnostic and similarity of any model. 

For more detailed descriptions of the methodology and functionalities see [auditor webpage](https://mi2-warsaw.github.io/auditor/index.html).

## Regression use case - apartments data

To illustrate applications of auditor to regression problems we will use an artificial dataset apartments available in the `DALEX` package. Our goal is to predict the price per square meter of an apartment based on selected features such as construction year, surface, floor, number of rooms, district. It should be noted that four of these variables are continuous while the fifth one is a categorical one. Prices are given in Euro.

In [None]:
library(DALEX)

In [None]:
data(apartments)
data(apartmentsTest)
head(apartments)

## Models

We fit two models: linear model and random forest.

In [None]:
lm_model <- lm(m2.price ~ construction.year + surface + floor + no.rooms + district, data = apartments)

In [None]:
library("randomForest")
set.seed(59)
rf_model <- randomForest(m2.price ~ construction.year + surface + floor +  no.rooms + district, data = apartments)

## Preparation for error analysis

The beginning of each analysis is creation of a `modelAudit` object which contains metadata required for further analysis.

In [None]:
library("auditor")

lm_audit <- audit(lm_model, label = "lm", data = apartmentsTest, y = apartmentsTest$m2.price)
rf_audit <- audit(rf_model, label = "rf", data = apartmentsTest, y = apartmentsTest$m2.price)

## Model audit

Now we give short overview of a functionalities of the `auditor`. 

### Plotting residuals
Function `plot` used on `modelAudit` object returns a Residuals vs fitted values plot.

In [None]:
plot(rf_audit, type = "Residual")

An alternative way to obtain plots are functions `plot[type]`. For plot above it'll be `plotResidual`.

In [None]:
plotResidual(rf_audit)

It is also possible to compare different models.

In [None]:
plotResidual(rf_audit, lm_audit)

You can always find more ditails in function documentation.

In [None]:
?plotResidual

### Observed vs Predicted
Plot of the predicted response vs observed or variable values.
The black line corresponds to y = x.

In [None]:
plot(rf_audit, type = "Prediction")

### Density of residuals
You can also plot the density of residuals.

In [None]:
plotResidualDensity(rf_audit, lm_audit)

In [None]:
plotResidualDensity(lm_audit, rf_audit, variable = "district")

### Other types of plots
Auitor provides many different tyes of plots. 
Besides examples above you can try `type=` 'ACF', 'Autocorrelation', 'CumulativeGain', 'CooksDistance', 'HalfNormal', 'LIFT', 'ModelPCA', 'ModelRanking', 'ModelCorrelation', 'REC', 'Residual', 'ROC', and 'RROC'.

In [None]:
plot(lm_audit, type = )