diff --git a/DESCRIPTION b/DESCRIPTION new file mode 100644 index 0000000..3b08d5f --- /dev/null +++ b/DESCRIPTION @@ -0,0 +1,16 @@ +Package: IPWboxplot +Type: Package +Title: Adapted Boxplot to Missing Observations +Version: 0.1.0 +Author: Ana Maria Bianco [aut], Graciela Boente [aut], Ana Perez-Gonzalez [aut] +Maintainer: Ana Perez-Gonzalez +Description: Boxplots adapted to the happenstance of missing observations where drop-out probabilities can be given by the practitioner or modelled using auxiliary covariates. The paper of "Zhang, Z., Chen, Z., Troendle, J. F. and Zhang, J.(2012) ", proposes estimators of marginal quantiles based on the Inverse Probability Weighting method. +Imports: isotone +Suggests: mice, knitr, rmarkdown +License: GPL (>= 2) +NeedsCompilation: no +Encoding: UTF-8 +Repository: CRAN +VignetteBuilder: knitr +Packaged: 2019-01-02 10:22:53 UTC; anapg +Date/Publication: 2019-01-02 13:30:10 UTC diff --git a/MD5 b/MD5 new file mode 100644 index 0000000..5e3662d --- /dev/null +++ b/MD5 @@ -0,0 +1,13 @@ +2a50f16a05b483c7e7b93177c6bcf060 *DESCRIPTION +da40518d85ad7191075c56cb91556d5b *NAMESPACE +411584c0fa2f2e8b268661198764711b *R/IPW_ASYM_Boxplot.R +aefc8c2b35aab90d53e7461422efcd06 *R/IPW_boxplot.R +b86c1d252cc3549353fa23b93c4bf0a4 *R/IPW_quantile.R +83a3d493af55fce379fb605bdcad958f *build/vignette.rds +bbad84c1d78db08d7e77b8ce31fd759b *inst/doc/my-vignette.R +e6f5503498bf8994c0df926c19040190 *inst/doc/my-vignette.Rmd +f7e15eee051677129e6f7a9792f0c76c *inst/doc/my-vignette.pdf +44a388c10e4ffa63c6580f0e5215e520 *man/IPW_ASYM_Boxplot.Rd +97840792501b47416339b6adea3d7813 *man/IPW_boxplot.Rd +2367edb2015f89943668b791e250807a *man/IPW_quantile.Rd +e6f5503498bf8994c0df926c19040190 *vignettes/my-vignette.Rmd diff --git a/NAMESPACE b/NAMESPACE new file mode 100644 index 0000000..4432ac8 --- /dev/null +++ b/NAMESPACE @@ -0,0 +1,5 @@ +exportPattern("^[[:alpha:]]+") + + importFrom("graphics", "axis", "lines", "plot", "points") + importFrom("stats", "glm") + importFrom(isotone,weighted.fractile) diff --git a/R/IPW_ASYM_Boxplot.R b/R/IPW_ASYM_Boxplot.R new file mode 100644 index 0000000..bb55509 --- /dev/null +++ b/R/IPW_ASYM_Boxplot.R @@ -0,0 +1,318 @@ +library(isotone) + +################################################################################################ +# FUNCTION USED TO DRAW THE BOXPLOTS ADAPTED TO MISSING VALUES AND SKEWED DISTRIBUTIONS # +################################################################################################ +IPW.ASYM.boxplot=function(y,px=NULL,x=NULL,graph=c("IPW","both"),names=c("IPW Asymmetric Boxplot", "NAIVE Asymmetric Boxplot"), size.letter=1.2, + method=c("quartile","octile"), ctea=-4, cteb=3, lim.inf=NULL,lim.sup=NULL,main=" ",xlab = " ", ylab =" ",color="black") +{ + ############################################################################################################################################ + # Arguments + # + # y Required. Numerical vector of values with possible missing values codified NA or NAN with length n. + # px Optional. Numerical vector of probabilities. If not provided a logistic fit is performed using x + # x Optional. The matrix of fully observed variables used to estimate the missing model with dimension nrows=n and ncol=dimension. + # graph Optional. Character string indicating if the plot contains two boxplots ("both") or + # only the boxplot computed with the inversely probability weighted quantiles("IPW"). + # The default is "IPW". + # names Optional. Character string to name the boxplots. + # The default is "IPW Asymmetric Boxplot" when graph="IPW" and + # c("IPW Asymmetric Boxplot", "NAIVE Asymmetric Boxplot") when graph= "both" + # size.letter Optional. The font size of names. + # method Optional. Character string indicating if the measure of asymmetry is based on the quartiles ("quartile") or the octiles ("octile"). + # The default is "quartile". + # ctea, cteb Optional. Scaling factors multiplied by the asymmetry measure to determine outlier boundary. + # When ctea=cteb=0 the IPW boxplot is obtained. ctea is a negative value while cteb is positive. + # The default ones correspond to the choices in Hubert and Vandervieren (2008) when using the medcouple. + # lim.inf Optional. The lower limit of the plot if supplied by the user. + # lim.sup Optional. The upper limit of the plot if supplied by the user. + # main Optional. Character string to title the plot. The default is "IPW Boxplot". + # xlab Optional. Character string to indicate the label of the horizontal axis. + # ylab Optional. Character string to indicate the label of the vertical axis. + # color Optional. Color for the IPW Boxplot. + ############################################################################################################################################ + + ######################################################################## + # Value + # + ############################################################################################################################################ + # Value + # + # px Numerical vector of probabilities. + # IPW.Quartiles Numerical vector of inversely probability weighted quartiles. + # IPW.whisker Numerical vector of lower and upper whisker calculated from IPW quantiles. + # out.IPW Numerical vector of data points detected as atypical by the IPW boxplot. + # SKEW.IPW Skewness measure based on the IPW quartiles (method="quartile") or IPW octiles (method="octile"). + # NAIVE.Quartiles Numerical vector of naive quartiles computed from the subset of non-missing values of y. Returned only when graph="both". + # NAIVE.whisker Numerical vector of lower and upper whisker obtained from the Naive quantiles. Returned only when graph="both". + # out.NAIVE Numerical vector of data points detected as atypical by the Naive boxplot. Returned only when graph="both". + # SKEW.NAIVE Skewness measure based on the Naive quartiles (method="quartile"). + # or IPW octiles (method="octile"), computed from the subset of non-missing values of y. + # Returned only when graph="both". + ######################################################################## + + ######################################################################### + #---Preliminary checks--- + ######################################################################### + + dimension=NCOL(x) + + if (is.null(x)=="TRUE" & is.null(px)=="TRUE") + stop("ERROR: It is neccesary to supply the vector of dropout probabilities for each observation or a covariate to estimate it") + + if (is.null(px)=="FALSE") + { + if (min(px)<=0) + stop("ERROR: px should take positive values") + if (NROW(y)!=NROW(px)) + stop("ERROR: 'y' and 'px' have different lengths") + if (sum(is.na(px))+sum(is.nan(px))>0) + stop(" ERROR: px has missing values") + } + + + if (is.null(px)=="TRUE") + { + if (NROW(y)!=NROW(x)) + stop("ERROR: 'y' and 'x' have different lengths") + if (sum(is.na(x))+sum(is.nan(x))>0) + stop(" ERROR: The covariates matrix has missing observations") + } + + if (dimension==1){x=as.vector(x)} + + if (sum(is.na(y))+sum(is.nan(y))==length(y)) + stop(" ERROR: All values are missing") + + GRAPH=graph[1] + COLOR=color[1] + nsamp=length(y) + METHOD=method[1] + + delta=rep(1,nsamp) + for (i in 1: nsamp){ + if (is.na(y[i])=="TRUE"|is.nan(y[i])=="TRUE") + { + delta[i]<-0 + y[i]=NA + } + } + + + if(is.null(px)=="FALSE"){PROBS="The dropout probability is given"} + + if(is.null(px)){ + ############################################################### + # Estimation of the dropout probability for each observation # + ############################################################### + a=glm(delta~x,family="binomial") + px=a$fitted.values + PROBS="LOGISTIC" + } + + ############################################################### + # Estimation of the IPW QUANTILES AND OUTLIERS DETECTION # + ############################################################### + + for (i in 1:nsamp) px[i]= replace(px[i],which(px[i]<=10^(-50)),10^(-50)) + + peso=delta/px + tau= peso/sum(peso) + yp <- y[ tmp <- (!is.na(y))] + + mediana.IPW=weighted.fractile(y, tau, p=0.5) + cuantil025.IPW= weighted.fractile(y, tau, 0.25) + cuantil075.IPW= weighted.fractile(y, tau, 0.75) + theta.IPW=c(cuantil025.IPW,mediana.IPW,cuantil075.IPW) + IQR.IPW=theta.IPW[3]-theta.IPW[1] + + if(METHOD=="quartile"){ + deno=cuantil075.IPW-cuantil025.IPW + if(deno==0) + print("WARNING: Too many ties. The quartiles are equal.") + deno= replace(deno,which(deno<=10^(-50)),10^(-50)) + num=(cuantil075.IPW-mediana.IPW)-(mediana.IPW-cuantil025.IPW) + SKEW.IPW=num/deno + }else + { + cuantil0125.IPW= weighted.fractile(y, tau, 0.125) + cuantil0875.IPW= weighted.fractile(y, tau, 0.875) + deno=cuantil0875.IPW-cuantil0125.IPW + if(deno==0) + print("WARNING: Too many ties. The octiles are equal.") + deno= replace(deno,which(deno<=10^(-50)),10^(-50)) + num=(cuantil0875.IPW-mediana.IPW)-(mediana.IPW-cuantil0125.IPW) + SKEW.IPW=num/deno + } + + cteinf.IPW=ctea*(SKEW.IPW>0) - cteb*(SKEW.IPW<0) + ctesup.IPW= cteb*(SKEW.IPW>0) - ctea*(SKEW.IPW<0) + + bigote.IPW=c(theta.IPW[1]-1.5*exp(cteinf.IPW*SKEW.IPW)*IQR.IPW,theta.IPW[3]+1.5*exp(ctesup.IPW*SKEW.IPW)*IQR.IPW) + + bigo.IPW=bigote.IPW + bigo.IPW[1]=min(yp[yp>=bigote.IPW[1]]) + bigo.IPW[2]=max(yp[yp<=bigote.IPW[2]]) + + outsup.IPW=yp[yp>bigote.IPW[2]] + outinf.IPW=yp[yp0) - cteb*(SKEW.NAIVE<0) + ctesup.NAIVE=cteb*(SKEW.NAIVE>0) - ctea*(SKEW.NAIVE<0) + + bigote.NAIVE=c(theta.NAIVE[1]-1.5*exp(cteinf.NAIVE*SKEW.NAIVE)*IQR.NAIVE,theta.NAIVE[3]+1.5*exp(ctesup.NAIVE*SKEW.NAIVE)*IQR.NAIVE) + + bigo.NAIVE=bigote.NAIVE + bigo.NAIVE[1]=min(yp[yp>=bigote.NAIVE[1]]) + bigo.NAIVE[2]=max(yp[yp<=bigote.NAIVE[2]]) + + outsup.NAIVE=yp[yp>bigote.NAIVE[2]] + outinf.NAIVE=yp[yp0) + stop(" ERROR: px has missing values") + } + + + if (is.null(px)=="TRUE") + { + if (NROW(y)!=NROW(x)) + stop("ERROR: 'y' and 'x' have different lengths") + if (sum(is.na(x))+sum(is.nan(x))>0) + stop(" ERROR: The covariates matrix has missing observations") + } + + if (dimension==1){x=as.vector(x)} + + if (sum(is.na(y))+sum(is.nan(y))==length(y)) + stop(" ERROR: All values are missing") + + GRAPH=graph[1] + COLOR=color[1] + nsamp=length(y) + + delta=rep(1,nsamp) + for (i in 1: nsamp){ + if (is.na(y[i])=="TRUE"|is.nan(y[i])=="TRUE") + { + delta[i]<-0 + y[i]=NA + } + } + + + if(is.null(px)=="FALSE"){PROBS="The dropout probability is given"} + + if(is.null(px)){ + ############################################################### + # Estimation of the dropout probability for each observation # + ############################################################### + a=glm(delta~x,family="binomial") + px=a$fitted.values + PROBS="LOGISTIC" + } + + ############################################################### + # Estimation of the IPW QUANTILES AND OUTLIERS DETECTION # + ############################################################### + + for (i in 1:nsamp) px[i]= replace(px[i],which(px[i]<=10^(-50)),10^(-50)) + + peso=delta/px + tau= peso/sum(peso) + yp <- y[ tmp <- (!is.na(y))] + + mediana.IPW=weighted.fractile(y, tau, p=0.5) + cuantil025.IPW= weighted.fractile(y, tau, 0.25) + cuantil075.IPW= weighted.fractile(y, tau, 0.75) + theta.IPW=c(cuantil025.IPW,mediana.IPW,cuantil075.IPW) + + IQR.IPW=theta.IPW[3]-theta.IPW[1] + bigote.IPW=c(theta.IPW[1]-1.5*IQR.IPW,theta.IPW[3]+1.5*IQR.IPW) + bigo.IPW=bigote.IPW + bigo.IPW[1]=min(yp[yp>=bigote.IPW[1]]) + bigo.IPW[2]=max(yp[yp<=bigote.IPW[2]]) + + outsup.IPW=yp[yp>bigote.IPW[2]] + outinf.IPW=yp[yp=bigote.NAIVE[1]]) + bigo.NAIVE[2]=max(yp[yp<=bigote.NAIVE[2]]) + + outsup.NAIVE=yp[yp>bigote.NAIVE[2]] + outinf.NAIVE=yp[yp0) + stop(" ERROR: px has missing values") + } + + + if (is.null(px)=="TRUE") + { + if (NROW(y)!=NROW(x)) + stop("ERROR: 'y' and 'x' have different lengths") + if (sum(is.na(x))+sum(is.nan(x))>0) + stop(" ERROR: The covariates matrix has missing observations") + } + + if (dimension==1){x=as.vector(x)} + + if (sum(is.na(y))+sum(is.nan(y))==length(y)) + stop(" ERROR: All values are missing") + + nsamp=length(y) + + delta=rep(1,nsamp) + for (i in 1: nsamp){ + if (is.na(y[i])=="TRUE"|is.nan(y[i])=="TRUE") + { + delta[i]<-0 + y[i]=NA + } + } + + + if(is.null(px)=="FALSE"){METHOD="The dropout probability is given"} + + if(is.null(px)){ + ############################################################### + # Estimation of the dropout probability for each observation # + ############################################################### + a=glm(delta~x,family="binomial") + px=a$fitted.values + METHOD="LOGISTIC" + } + + ############################################################### + # Estimation of the IPW QUANTILES # + ############################################################### + + for (i in 1:nsamp) px[i]= replace(px[i],which(px[i]<=10^(-50)),10^(-50)) + + peso=delta/px + tau= peso/sum(peso) + + lprobs=length(probs) + + IPW.quantile=rep(NA,length=lprobs) + + for (i in 1:lprobs){ + IPW.quantile[i]=weighted.fractile(y, tau, p=probs[i]) + } + + res=list(px=px, IPW.quantile=IPW.quantile) + return(res) +} + + + diff --git a/build/vignette.rds b/build/vignette.rds new file mode 100644 index 0000000..7721bd4 Binary files /dev/null and b/build/vignette.rds differ diff --git a/inst/doc/my-vignette.R b/inst/doc/my-vignette.R new file mode 100644 index 0000000..760b019 --- /dev/null +++ b/inst/doc/my-vignette.R @@ -0,0 +1,60 @@ +## ----setup, include = FALSE---------------------------------------------- +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) + +## ------------------------------------------------------------------------ +library(IPWboxplot) + +## ---- par=TRUE,message=FALSE--------------------------------------------- +library(mice) +data(boys) +attach(boys) +dim(boys) +res=IPW.quantile(tv,x=age,probs=c(0.25,0.5,0.75,0.9)) +ls(res) +#res$px is the vector of estimated drop-out probabilities +#res$IPW.quantile is the vector of estimated IPW quantiles +res$IPW.quantile + +## ---- par=TRUE, fig.cap="Inverse probability weighted boxplot for testicular volume"---- +res=IPW.boxplot(tv,x=age,main=" ") + + +## ------------------------------------------------------------------------ +ls(res) + +## ---- par=TRUE----------------------------------------------------------- +res$out.IPW + +## ---- par=TRUE,fig.cap="Inverse probability weighted and naive boxplots for testicular volume"---- +res1=IPW.boxplot(tv,x=age,graph="both",color="blue",size.letter=0.7,main=" ") + + +## ------------------------------------------------------------------------ +ls(res1) + +## ---- fig.cap="Inverse probability weighted boxplot adapted to skewness for head circumference.",fig.show='hold'---- +res2=IPW.ASYM.boxplot(hc,x=age,size.letter=0.85,main=" ") + +## ------------------------------------------------------------------------ +ls(res2) + +## ---- par=TRUE----------------------------------------------------------- +res2$out.IPW + +## ------------------------------------------------------------------------ +res2$SKEW.IPW + +## ---- par=TRUE,fig.cap="Inverse probability weighted and naive boxplots adjusted for skewness of head circumference.",fig.show='hold'---- +res3=IPW.ASYM.boxplot(hc,x=age,graph="both",main=" ",color="blue",size.letter=0.75) + +## ------------------------------------------------------------------------ +res3$out.IPW +res3$out.NAIVE + +## ------------------------------------------------------------------------ +res3$SKEW.IPW +res3$SKEW.NAIVE + diff --git a/inst/doc/my-vignette.Rmd b/inst/doc/my-vignette.Rmd new file mode 100644 index 0000000..0239d88 --- /dev/null +++ b/inst/doc/my-vignette.Rmd @@ -0,0 +1,162 @@ +--- +title: "IPWboxplot" +author: 'Ana Maria Bianco, Graciela Boente, and Ana Perez-Gonzalez' +date: "`r Sys.Date()`" +output: + pdf_document: + number_sections: yes + toc: yes + html_document: + df_print: paged + rmarkdown::html_vignette: default +vignette: > + %\VignetteIndexEntry{Vignette Title} + %\VignetteEncoding{UTF-8}{inputenc} + %\VignetteEngine{knitr::rmarkdown} +--- + +```{r setup, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` +# Introduction + +**IPWboxplot** is a contributed R package for drawing boxplots adapted to the happenstance of missing observations when drop-out probabilities are given by the practitioner or modelled using auxiliary covariates. It also provides a function to estimate asymptotically unbiased quantiles based on inverse probability weighting (IPW) as in Zhang et al. (2012). For that purpose, a missing at random model is assumed. These IPW quantiles are used to compute the measures needed to construct the boxplot and hence, to calculate the outlier cut--off values. + + + +This document gives a quick tour of **IPWboxplot** (version `r packageVersion("IPWboxplot")`) functionalities. It was written in R Markdown, using the [knitr](https://cran.r-project.org/package=knitr) package for production. +See `help (package="IPWboxplot")` for further details and references provided by `citation ("IPWboxplot")`. + +```{r} +library(IPWboxplot) +``` + +# Inverse Probability Weighted Quantiles + +The function `IPW.quantile` computes the IPW quantiles of a vector _y_ containing missing observations when auxiliary information from a vector of drop-out probabilities supplied by the user or from a set of covariates is available. +The dataset `boys` of the R package **mice** allows us to illustrate the use of this function. + +The dataset contains 748 observations and the variable _y=tv_ has 522 missing observations. For illustrative purposes, we consider the variable _age_, which is completely observed, as covariate with predictive capability for the propensity. By default, a logistic model is used to fit the happenstance probabilities. The following code returns the +$\alpha-$quantiles corresponding to $\alpha=$ 0.25, 0.5, 0.75 and 0.9 of the variable _"Testicular volume (tv)"_ using inverse probability weighting. + +```{r, par=TRUE,message=FALSE} +library(mice) +data(boys) +attach(boys) +dim(boys) +res=IPW.quantile(tv,x=age,probs=c(0.25,0.5,0.75,0.9)) +ls(res) +#res$px is the vector of estimated drop-out probabilities +#res$IPW.quantile is the vector of estimated IPW quantiles +res$IPW.quantile +``` + +# Inverse Probability Weighted Boxplot +The function `IPW.boxplot` draws the modified boxplot adapted to missing data using the IPW quantiles. +The function also returns a list of statistical summaries. As default, the function returns only the adapted boxplot and the statistics computed by inverse probability weighting. +```{r, par=TRUE, fig.cap="Inverse probability weighted boxplot for testicular volume"} +res=IPW.boxplot(tv,x=age,main=" ") + +``` +The function returns a list containing the quartiles, the lower and upper whiskers of the IPW boxplot, the observations considered as outliers and the vector of estimated or given drop-out probabilities. +```{r} +ls(res) +``` + +As shown in Figure 1, the IPW boxplot does not detect ouliers for this data set. +```{r, par=TRUE} +res$out.IPW +``` +Specifying *both* in the argument "graph", the function allows to compare the adapted boxplot with the naive boxplot obtained by simply dropping out the missing observations. In this situation, besides the measures related to the IPW boxplot, the function also returns the quartiles, whiskers and detected outliers obtained with the observations at hand which are associated to naive boxplot. + +```{r, par=TRUE,fig.cap="Inverse probability weighted and naive boxplots for testicular volume"} +res1=IPW.boxplot(tv,x=age,graph="both",color="blue",size.letter=0.7,main=" ") + +``` +From Figure 2, the differences between both boxplots become evident. In particular the box of the naive boxplot is enlarged with respect to that of the IPW. + +As mentioned above, when the argument "graph" equals *both*, the function returns a list with the naive and IPW statistical summaries. +```{r} +ls(res1) +``` + +Other arguments, such as the color of the boxes, the main title, the letter size or the axis labels can be given as arguments in this function. + + +# Inverse Probability Weighted Boxplot adapted to skewed data. + +The function `IPW.ASYM.boxplot` draws the modified boxplot adapted to missing data and skewness. In addition to the parameters returned by the function IPW.boxplot, this function also computes a skewness measure calculated as in Hinkley (1975), see also Brys et al. (2003). + +The argument "method" selects the quartiles (method="quartile" as default) or the octiles (method="octile") as a procedure to compute the skewness measure denoted SKEW and defined, respectively, as + +\begin{align*} +SKEW &=\frac{(Q_{0.75}-Q_{0.5})-(Q_{0.5}-Q_{25})}{(Q_{0.75}-Q_{0.25}))}, +\\ +SKEW &=\frac{(Q_{0.875}-Q_{0.5})-(Q_{0.5}-Q_{0.125})}{(Q_{0.875}-Q_{0.125})}, +\end{align*} + +where $Q_{\alpha}$ denotes the $\alpha-$quantile. + +The whiskers and the outlier cut--off values are computed by means of an exponential model in the fashion of Hubert and Vandervieren (2008) taking into account the interval: + +\begin{equation*}\label{interval} +(Q_{0.25}-1.5*\exp{(c_i*SKEW)}*IQR,Q_{0.75}+1.5*\exp{(c_s*SKEW)}*IQR). +\end{equation*} + +where $IQR=Q_{0.75}-Q_{0.25}$ and $c_i$=`ctea` and $c_s$=`cteb` if SKEW is positive, otherwise, $c_i$=`-cteb` and $c_s$=`-ctea`. + + +The default values for `ctea` and `cteb` are $-4$ and $3$, however, the user may choose other values for these constants. + +As an example, Figures 3 displays the boxplot adapted to skewness and missing values for the variable head circumference, hc, which has 46 missing values. + +```{r, fig.cap="Inverse probability weighted boxplot adapted to skewness for head circumference.",fig.show='hold'} +res2=IPW.ASYM.boxplot(hc,x=age,size.letter=0.85,main=" ") +``` +The elements returned in the list are the following: +```{r} +ls(res2) +``` +The detected outliers are: +```{r, par=TRUE} +res2$out.IPW +``` +The skewness measure computed using the quartiles equals: +```{r} +res2$SKEW.IPW +``` + +By specifying "graph" equal to *both*, the function displays two parallel modified boxplots as in Figure 4, where the plot on the left corresponds to the IPW version and that on the right, to the naive one. +```{r, par=TRUE,fig.cap="Inverse probability weighted and naive boxplots adjusted for skewness of head circumference.",fig.show='hold'} +res3=IPW.ASYM.boxplot(hc,x=age,graph="both",main=" ",color="blue",size.letter=0.75) +``` +The elements res3\$out.IPW and res3\$out.NAIVE provide the outliers detected by each method. +```{r} +res3$out.IPW +res3$out.NAIVE +``` + +The values of res3\$SKEW.IPW and res3\$SKEW.NAIVE are the skewness measures calculated from the IPW quantiles or from the naive ones, respectively. + +```{r} +res3$SKEW.IPW +res3$SKEW.NAIVE +``` + +It is worth noticing that the naive boxplot detects only one observation as outlier, while the IPW version identifies five observations as atypical. + +# References + +Brys, G., Hubert, M. and Struyf, A. (2003). A comparison of some new measures of skewness. In Developments in Robust Statistics, ICORS 2001, eds. R. Dutter, P. Filzmoser, U. Gather, and P.J. Rousseeuw, Heidelberg: Springer-Verlag, pp. 98-113. + +Hinkley, D. V. (1975). On power transformations to symmetry. Biometrika, 62, 101-111. + +Hubert, M. and Vandervieren, E. (2008). An adjusted boxplot for skewed distributions. Computational Statistics & Data Analysis, 52, 5186-5201. + +Zhang, Z., Chen, Z., Troendle, J. F. and Zhang, J. (2012). Causal inference on quantiles with an +obstetric application. Biometrics, 68, 697-706. + + diff --git a/inst/doc/my-vignette.pdf b/inst/doc/my-vignette.pdf new file mode 100644 index 0000000..61d2f3e Binary files /dev/null and b/inst/doc/my-vignette.pdf differ diff --git a/man/IPW_ASYM_Boxplot.Rd b/man/IPW_ASYM_Boxplot.Rd new file mode 100644 index 0000000..1c4f2c6 --- /dev/null +++ b/man/IPW_ASYM_Boxplot.Rd @@ -0,0 +1,136 @@ +\name{IPW.ASYM.boxplot} +\alias{IPW.ASYM.boxplot} + +\title{ +Boxplot adapted to skewness and missing values +} +\description{The function draws a modified boxplot adapted to missing data and skewness. The drop-out probabilities can be given by the practitioner or fitted through a logistic model using auxiliary covariates. The plots are adapted to asymmetric distributions by correcting the whiskers through a measure of the skewness. +} +\usage{ +IPW.ASYM.boxplot(y,px=NULL,x=NULL,graph=c("IPW","both"),names=c("IPW Asymmetric Boxplot", + "NAIVE Asymmetric Boxplot"), size.letter=1.2, + method=c("quartile","octile"), ctea=-4, cteb=3, lim.inf=NULL,lim.sup=NULL, + main=" ",xlab = " ", ylab =" ",color="black") +} + +\arguments{ + \item{y}{ Numerical vector of length n with possible missing values codified by NA or NAN.} + \item{px}{Optional. Numerical vector of drop-out probabilities. If not provided a logistic fit is performed using \code{x} as predictive variable. Missing values are not admitted.} + +\item{x}{ Optional. The matrix of fully observed variables used to estimate the missing model with dimension nrows=n and ncol=p. Missing values are not admitted. One of the vectors px or x must be supplied.} +\item{graph}{ Optional. Character string indicating if the plot contains two boxplots ("both") or only the boxplot computed with the inverse probability weighted quantiles ("IPW"). The default is "IPW".} +\item{names}{ Optional. Character string to name the boxplots. The default is "IPW Asymmetric Boxplot", when \code{graph="IPW"} and + c("IPW Asymmetric Boxplot", "NAIVE Asymmetric Boxplot") when \code{graph="both"}.} +\item{size.letter}{ Optional. The font size of names. Default value is 1.2} +\item{method}{ Optional. Character string indicating if the skewness measure is based on the quartiles ("quartile") or the octiles ("octile"). + The default is "quartile". } + \item{ctea}{ Optional. Scaling factors to compute the outlier boundary. The default is -4.} + + \item{cteb}{ Optional. Scaling factors to compute the outlier boundary. The default is 3. When ctea=cteb=0 the IPW boxplot for symmetric data is obtained.} + +\item{lim.inf}{ Optional. The lower limit of the plot if supplied by the user.} + \item{lim.sup}{ Optional. The upper limit of the plot if supplied by the user.} + \item{main}{ Optional. Character string to title the plot. By default no main title is given.} +\item{xlab}{Optional. Character string to indicate the label of the horizontal axis.} + \item{ylab}{ Optional. Character string to indicate the label of the vertical axis.} +\item{color}{ Optional. Color for the IPW Boxplot.} + +} +\details{ + +The function draws boxplots designed to adjust both for skewness and missingness. The drop-out probabilities can be supplied by the user or estimated through a logistic model from given covariates. + +The function plots as default a modified boxplot based on the inverse probability weighted (IPW) quantiles adapting for missing observations as in Zhang et al.(2012), but using a correction factor to adjust for skewness. For that purpose, the function incorporates a skewness measure to compute the whiskers and the outlier cut--off values in a similar way to that considered in Hubert and Vandervieren (2008). +The argument \code{method} selects quartiles (\code{method="quartile"}) or octiles (\code{method="octile"}) to calculate the skewness measure SKEW, respectively, as +\deqn{ SKEW=\frac{(Q_{0.75}-Q_{0.5})-(Q_{0.5}-Q_{25})}{(Q_{0.75}-Q_{0.25})},} + +\deqn{ SKEW=\frac{(Q_{0.875}-Q_{0.5})-(Q_{0.5}-Q_{0.125})}{(Q_{0.875}-Q_{0.125})},} +where \eqn{Q\alpha} denotes the \eqn{\alpha-}quantile. + +The whiskers and the outlier cut--off values are computed by means of an exponential model in the fashion of Hubert and Vandervieren (2008) taking into account the interval: + +\deqn{(Q_{0.25}-1.5*\exp{(c_i*SKEW)}*IQR,Q_{0.75}+1.5*\exp{(c_s*SKEW)}*IQR),} + +where \eqn{IQR=Q_{0.75}-Q_{0.25}} and \eqn{c_i}=\code{ctea} and \eqn{c_s}=\code{cteb} if SKEW is positive, otherwise, \eqn{c_i}=-\code{cteb} and \eqn{c_s}=-\code{ctea}. + +The default values for \code{ctea} and \code{cteb} are \eqn{-4} and \eqn{3}, however, the user may choose other values for these constants. + +By specifying \code{graph = "both"}, the function displays two parallel modified boxplots. The boxplot on the left corresponds to the IPW boxplot adapted for missingness and skewness, while that on the right, to its naive counterpart which is simply based on the observations \code{y} at hand without any correction for missingness. + + +The user can supply a vector of drop-out probabilities \code{px} or a set of covariates \code{x} to estimate the propensity. +When both \code{px} and \code{x} are supplied, the IPW.ASYM.boxplot is executed using \code{px}. When \code{px} is not given, it is estimated assuming a logistic model depending on the covariates \code{x} . +For more details, see Bianco et al. (2018). + + + +} +\value{ +The output of the function is a list with components: + \item{px }{Numerical vector of drop-out probabilities.} +\item{IPW.Quartiles}{Numerical vector of inverse probability weighted quartiles.} +\item{IPW.whisker}{Numerical vector of lower and upper whiskers calculated from IPW quantiles.} + \item{out.IPW}{Numerical vector of data points detected as atypical by the IPW boxplot adapted to skewness.} + \item{SKEW.IPW}{Skewness measure based on the IPW quartiles (method="quartile") or IPW octiles (method="octile"). } +\item{NAIVE.Quartiles}{Numerical vector of naive quartiles computed from the subset of non-missing values of \code{y}. Returned only when graph="both".} + \item{NAIVE.whisker}{Numerical vector of lower and upper whiskers obtained from the naive quantiles. Returned only when graph="both".} +\item{out.NAIVE}{Numerical vector of data points detected as atypical by the naive boxplot. Returned only when graph="both".} +\item{SKEW.NAIVE}{ Skewness measure based on the naive quartiles (method="quartile") or Naive octiles (method="octile"), computed from the subset of non-missing values of \code{y}. Returned only when graph="both".} + + +} +\references{ +Bianco, A. M., Boente, G., and Perez-Gonzalez, A. (2018). A boxplot adapted to missing values: an R function when predictive covariates are available. Submitted.\cr\cr + +Hubert, M. and Vandervieren, E. (2008). An adjusted boxplot for skewed distributions. Computational Statistics & Data Analysis, 52, 5186-5201. \cr\cr + +Zhang, Z., Chen, Z., Troendle, J. F. and Zhang, J. (2012). Causal inference on quantiles with an obstetric application. Biometrics, 68, 697-706. \cr\cr + +} +\author{ +Ana Maria Bianco , Graciela Boente and Ana Perez-Gonzalez .} +\note{ +The missing values of \code{y} must be codified as NA or NAN. + +The numerical vector \code{px} and the matrix of covariates \code{x} must be fully observed. \code{px} or \code{x} must be supplied by the user. + +The lengths of \code{y}, \code{px}, and \code{nrow(x)} must be equal. + + + +} +\seealso{IPW.quantile, +IPW.Boxplot +%% ~~objects to See Also as \code{\link{help}}, ~~~ +} +\examples{ + +## A real data example + + +library(mice) +data(boys) +attach(boys) + +# The function plots the IPW boxplot adapted to skewness. +# Some statistical summaries computed using the inverse probability weighting approach +# are also returned. +res1=IPW.ASYM.boxplot(hc,x=age,main="IPW boxplot adjusted for skewness of the head circumference") + +# We can compare the naive and IPW approaches. We also can consider the skewness measure computed +# using the quartiles (as default). +res2=IPW.ASYM.boxplot(hc,x=age,method="quartile",graph="both",main=" ") + +# The results obtained if the skewness measure is computed with the octiles (method="octile") are: + +res3=IPW.ASYM.boxplot(hc,x=age,method="octile",graph="both",main=" ") + +} +% Add one or more standard keywords, see file 'KEYWORDS' in the +% R documentation directory. + + +\keyword{ quantile } +\keyword{ boxplot } +\keyword{ missing } +\keyword{ inverse probability weighted} diff --git a/man/IPW_boxplot.Rd b/man/IPW_boxplot.Rd new file mode 100644 index 0000000..2eeedae --- /dev/null +++ b/man/IPW_boxplot.Rd @@ -0,0 +1,102 @@ +\name{IPW.boxplot} +\alias{IPW.boxplot} + +\title{ + +Boxplot adapted to missing values + +} +\description{ +The function draws a modified boxplot adapted to missing values. The drop-out probabilities can be given by the practitioner or fitted through a logistic model using auxiliary covariates. +The function returns the usual boxplot of the available data as well as a modified plot which takes into account the missing data model and weights the observations using the estimated/given propensity. +} +\usage{ +IPW.boxplot(y,px=NULL,x=NULL,graph=c("IPW","both"), +names=c("IPW Boxplot", "NAIVE Boxplot"), size.letter=1.2, +lim.inf=NULL,lim.sup=NULL,main=" ",xlab = " ", ylab =" ",color="black") +} +\arguments{ + \item{y}{ Numerical vector of length n with possible missing values codified by NA or NAN.} + \item{px}{Optional. Numerical vector of drop-out probabilities. If not provided a logistic fit is performed using \code{x} as predictive variable. Missing values are not admitted.} + +\item{x}{ Optional. The matrix of fully observed variables used to estimate the missing model with dimension nrows=n and ncol=p. Missing values are not admitted. One of the vectors px or x must be supplied.} + +\item{graph}{ Optional. Character string indicating if the plot contains two boxplots ("both") or only the boxplot computed with the inverse probability weighted quantiles("IPW"). The default is "IPW".} +\item{names}{ Optional. Character string to name the boxplots. The default is "IPW Boxplot", when \code{graph="IPW"} and + c("IPW Boxplot", "NAIVE Boxplot") when \code{graph="both"}.} +\item{size.letter}{ Optional. The font size of names. Default value is 1.2} +\item{lim.inf}{ Optional. The lower limit of the plot if supplied by the user.} + \item{lim.sup}{ Optional. The upper limit of the plot if supplied by the user.} + \item{main}{ Optional. Character string to title the plot. By default no main title is given.} + \item{xlab}{Optional. Character string to indicate the label of the horizontal axis.} + \item{ylab}{ Optional. Character string to indicate the label of the vertical axis.} +\item{color}{ Optional. Color for the IPW Boxplot.} + +} +\details{ +The function draws boxplots designed to adjust for missing values. The propensity can be supplied by the user or estimated through a logistic model from given covariates. + +The function plots as default a modified boxplot based on the inverse probability weighted (IPW) quantiles adapting for missing observations as in Zhang et al.(2012). + +By specifying \code{graph = "both"}, the function displays two parallel boxplots. The boxplot on the left corresponds to the IPW boxplot adapted for missingness, while on the right, the naive boxplot, i.e., the usual boxplot simply computed with the observations \code{y} at hand, is displayed. + + +The user can supply a vector of probabilities \code{px} or a set of covariates \code{x} to estimate it. +When both \code{px} and \code{x} are supplied, the IPW.boxplot is executed using \code{px}. When \code{px} is not supplied, it is estimated assuming a logistic model depending on the covariates \code{x} . +For more details, see Bianco et al. (2018). +} +\value{ +The output of the function is a list with components: + \item{px }{Numerical vector of probabilities.} +\item{IPW.Quartiles}{Numerical vector of inverse probability weighted quartiles.} +\item{IPW.whisker}{Numerical vector of lower and upper whiskers calculated from IPW quartiles.} + \item{out.IPW }{Numerical vector of data points detected as atypical by the IPW boxplot.} + \item{ NAIVE.Quartiles }{Numerical vector of naive quartiles computed from the subset of non-missing values of \code{y}. Returned only when graph="both".} +\item{NAIVE.whisker }{Numerical vector of lower and upper whiskers obtained from the naive quantiles. Returned only when graph="both".} +\item{out.NAIVE }{Numerical vector of data points detected as atypical by the naive boxplot. Returned only when graph="both".} +} + +\references{ +Bianco, A. M., Boente, G., and Perez-Gonzalez, A. (2018). A boxplot adapted to missing values: an R function when predictive covariates are available. Submitted.\cr\cr + +Zhang, Z., Chen, Z., Troendle, J. F. and Zhang, J. (2012). Causal inference on quantiles with an obstetric application. Biometrics, 68, 697-706. \cr\cr + +} +\author{ +Ana Maria Bianco , Graciela Boente and Ana Perez-Gonzalez .} +\note{ +The missing values of \code{y} must be codified as NA or NAN. + +The numerical vector \code{px} and the matrix of covariates \code{x} must be fully observed. \code{px} or \code{x} must be supplied by the user. + +The lengths of \code{y}, \code{px}, and \code{nrow(x)} must be equal. + +} + +%% ~Make other sections like Warning with \section{Warning }{....} ~ + +\seealso{IPW.quantile, IPW.ASYM.Boxplot +%% ~~objects to See Also as \code{\link{help}}, ~~~ +} +\examples{ + + +## A real data example + +library(mice) +data(boys) +attach(boys) + +res1=IPW.boxplot(tv,x=age,main="IPW boxplot of the testicular volume") + + +# We can compare the naive and IPW boxplots +res2=IPW.boxplot(tv,x=age,graph="both",main=" ") + +} +% Add one or more standard keywords, see file 'KEYWORDS' in the +% R documentation directory. +\keyword{ quantile } +\keyword{ boxplot } +\keyword{ missing } +\keyword{ inverse probability weighted} diff --git a/man/IPW_quantile.Rd b/man/IPW_quantile.Rd new file mode 100644 index 0000000..122c001 --- /dev/null +++ b/man/IPW_quantile.Rd @@ -0,0 +1,83 @@ +\name{IPW.quantile} +\alias{IPW.quantile} + +\title{ +Computes the IPW quantiles +} +\description{ +The function calculates the inverse probability weighted quantiles of a numeric vector. +} +\usage{ +IPW.quantile(y, px=NULL,x=NULL,probs = seq(0, 1, 0.25)) +} + +\arguments{ + \item{y}{ Numerical vector of length n with possible missing values codified by NA or NAN.} + \item{px}{Optional. Numerical vector of drop-out probabilities. If not provided a logistic fit is performed using \code{x} as predictive variable. Missing values are not admitted.} + +\item{x}{ Optional. The matrix of fully observed variables used to estimate the missing model with dimension nrows=n and ncol=p. Missing values are not admitted. One of the vectors px or x must be supplied.} + +\item{probs}{ Required. Numeric vector of probabilities with values in (0,1).} +} +\details{ +The function computes inverse probability weighted (IPW) quantiles of a numeric vector \code{y} adapting for missing observations as in Zhang et al.(2012). + +The user can supply a vector of drop-out probabilities \code{px} or a set of covariates \code{x} to estimate the propensity. +When both \code{px} and \code{x} are supplied, the IPW.quantile is executed using \code{px}. When \code{px} is not supplied, the happenstance probabilities are estimated assuming a logistic model depending on the covariates \code{x}. +For more details, see Bianco et al. (2018). + +We adapted the function \code{weighted.fractile} from the \pkg{isotone} package to missing values in variable \code{y}. See \pkg{isotone} for more details. +} +\value{ +The output of the function is a list with components:\describe{ + \item{ipw.quantile }{ Numerical vector of length \code{length(probs)} containing the estimated quantiles.} + \item{px }{Numerical vector of drop-out probabilities.} +} +} +\references{Bianco, A. M., Boente, G. and Perez-Gonzalez, A. (2018). A boxplot adapted to missing values: an R function when predictive covariates are available. Submitted.\cr\cr + +Zhang, Z., Chen, Z., Troendle, J. F. and Zhang, J. (2012). Causal inference on quantiles with an obstetric application. Biometrics, 68, 697-706. \cr\cr +} +\author{ +Ana Maria Bianco , Graciela Boente and Ana Perez-Gonzalez . +} +\note{ +The missing values of \code{y} must be codified as NA or NAN. + +The numerical vector \code{px} and the matrix of covariates \code{x} must be fully observed. \code{px} or \code{x} must be supplied by the user. + +The lengths of \code{y}, \code{px}, and \code{nrow(x)} must be equal. + +} + +%% ~Make other sections like Warning with \section{Warning }{....} ~ + + +\examples{ + + + +## A real data example +library(mice) +data(boys) +attach(boys) +# As an illustration, we consider variable testicular volume, tv. +# To compute the inverse probability weighted (IPW) quartiles +# the covariate age is considered as covariate with predictive capability +# to estimate the vector of drop-out probabilities. + +res=IPW.quantile(tv,x=age,probs=c(0.25,0.5,0.75)) +res$IPW.quantile + +# Compute the inverse probability weighted (IPW) quantiles +# corresponding to the fractiles 0.3, 0.8 and 0.9 +# using the covariate age to estimate the propensity. + + +res1=IPW.quantile(tv,x=age,probs=c(0.3,0.8,0.9)) +res1$IPW.quantile + +} +\keyword{ quantile } +\keyword{ missing } +\keyword{ inverse probability weighted} diff --git a/vignettes/my-vignette.Rmd b/vignettes/my-vignette.Rmd new file mode 100644 index 0000000..0239d88 --- /dev/null +++ b/vignettes/my-vignette.Rmd @@ -0,0 +1,162 @@ +--- +title: "IPWboxplot" +author: 'Ana Maria Bianco, Graciela Boente, and Ana Perez-Gonzalez' +date: "`r Sys.Date()`" +output: + pdf_document: + number_sections: yes + toc: yes + html_document: + df_print: paged + rmarkdown::html_vignette: default +vignette: > + %\VignetteIndexEntry{Vignette Title} + %\VignetteEncoding{UTF-8}{inputenc} + %\VignetteEngine{knitr::rmarkdown} +--- + +```{r setup, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` +# Introduction + +**IPWboxplot** is a contributed R package for drawing boxplots adapted to the happenstance of missing observations when drop-out probabilities are given by the practitioner or modelled using auxiliary covariates. It also provides a function to estimate asymptotically unbiased quantiles based on inverse probability weighting (IPW) as in Zhang et al. (2012). For that purpose, a missing at random model is assumed. These IPW quantiles are used to compute the measures needed to construct the boxplot and hence, to calculate the outlier cut--off values. + + + +This document gives a quick tour of **IPWboxplot** (version `r packageVersion("IPWboxplot")`) functionalities. It was written in R Markdown, using the [knitr](https://cran.r-project.org/package=knitr) package for production. +See `help (package="IPWboxplot")` for further details and references provided by `citation ("IPWboxplot")`. + +```{r} +library(IPWboxplot) +``` + +# Inverse Probability Weighted Quantiles + +The function `IPW.quantile` computes the IPW quantiles of a vector _y_ containing missing observations when auxiliary information from a vector of drop-out probabilities supplied by the user or from a set of covariates is available. +The dataset `boys` of the R package **mice** allows us to illustrate the use of this function. + +The dataset contains 748 observations and the variable _y=tv_ has 522 missing observations. For illustrative purposes, we consider the variable _age_, which is completely observed, as covariate with predictive capability for the propensity. By default, a logistic model is used to fit the happenstance probabilities. The following code returns the +$\alpha-$quantiles corresponding to $\alpha=$ 0.25, 0.5, 0.75 and 0.9 of the variable _"Testicular volume (tv)"_ using inverse probability weighting. + +```{r, par=TRUE,message=FALSE} +library(mice) +data(boys) +attach(boys) +dim(boys) +res=IPW.quantile(tv,x=age,probs=c(0.25,0.5,0.75,0.9)) +ls(res) +#res$px is the vector of estimated drop-out probabilities +#res$IPW.quantile is the vector of estimated IPW quantiles +res$IPW.quantile +``` + +# Inverse Probability Weighted Boxplot +The function `IPW.boxplot` draws the modified boxplot adapted to missing data using the IPW quantiles. +The function also returns a list of statistical summaries. As default, the function returns only the adapted boxplot and the statistics computed by inverse probability weighting. +```{r, par=TRUE, fig.cap="Inverse probability weighted boxplot for testicular volume"} +res=IPW.boxplot(tv,x=age,main=" ") + +``` +The function returns a list containing the quartiles, the lower and upper whiskers of the IPW boxplot, the observations considered as outliers and the vector of estimated or given drop-out probabilities. +```{r} +ls(res) +``` + +As shown in Figure 1, the IPW boxplot does not detect ouliers for this data set. +```{r, par=TRUE} +res$out.IPW +``` +Specifying *both* in the argument "graph", the function allows to compare the adapted boxplot with the naive boxplot obtained by simply dropping out the missing observations. In this situation, besides the measures related to the IPW boxplot, the function also returns the quartiles, whiskers and detected outliers obtained with the observations at hand which are associated to naive boxplot. + +```{r, par=TRUE,fig.cap="Inverse probability weighted and naive boxplots for testicular volume"} +res1=IPW.boxplot(tv,x=age,graph="both",color="blue",size.letter=0.7,main=" ") + +``` +From Figure 2, the differences between both boxplots become evident. In particular the box of the naive boxplot is enlarged with respect to that of the IPW. + +As mentioned above, when the argument "graph" equals *both*, the function returns a list with the naive and IPW statistical summaries. +```{r} +ls(res1) +``` + +Other arguments, such as the color of the boxes, the main title, the letter size or the axis labels can be given as arguments in this function. + + +# Inverse Probability Weighted Boxplot adapted to skewed data. + +The function `IPW.ASYM.boxplot` draws the modified boxplot adapted to missing data and skewness. In addition to the parameters returned by the function IPW.boxplot, this function also computes a skewness measure calculated as in Hinkley (1975), see also Brys et al. (2003). + +The argument "method" selects the quartiles (method="quartile" as default) or the octiles (method="octile") as a procedure to compute the skewness measure denoted SKEW and defined, respectively, as + +\begin{align*} +SKEW &=\frac{(Q_{0.75}-Q_{0.5})-(Q_{0.5}-Q_{25})}{(Q_{0.75}-Q_{0.25}))}, +\\ +SKEW &=\frac{(Q_{0.875}-Q_{0.5})-(Q_{0.5}-Q_{0.125})}{(Q_{0.875}-Q_{0.125})}, +\end{align*} + +where $Q_{\alpha}$ denotes the $\alpha-$quantile. + +The whiskers and the outlier cut--off values are computed by means of an exponential model in the fashion of Hubert and Vandervieren (2008) taking into account the interval: + +\begin{equation*}\label{interval} +(Q_{0.25}-1.5*\exp{(c_i*SKEW)}*IQR,Q_{0.75}+1.5*\exp{(c_s*SKEW)}*IQR). +\end{equation*} + +where $IQR=Q_{0.75}-Q_{0.25}$ and $c_i$=`ctea` and $c_s$=`cteb` if SKEW is positive, otherwise, $c_i$=`-cteb` and $c_s$=`-ctea`. + + +The default values for `ctea` and `cteb` are $-4$ and $3$, however, the user may choose other values for these constants. + +As an example, Figures 3 displays the boxplot adapted to skewness and missing values for the variable head circumference, hc, which has 46 missing values. + +```{r, fig.cap="Inverse probability weighted boxplot adapted to skewness for head circumference.",fig.show='hold'} +res2=IPW.ASYM.boxplot(hc,x=age,size.letter=0.85,main=" ") +``` +The elements returned in the list are the following: +```{r} +ls(res2) +``` +The detected outliers are: +```{r, par=TRUE} +res2$out.IPW +``` +The skewness measure computed using the quartiles equals: +```{r} +res2$SKEW.IPW +``` + +By specifying "graph" equal to *both*, the function displays two parallel modified boxplots as in Figure 4, where the plot on the left corresponds to the IPW version and that on the right, to the naive one. +```{r, par=TRUE,fig.cap="Inverse probability weighted and naive boxplots adjusted for skewness of head circumference.",fig.show='hold'} +res3=IPW.ASYM.boxplot(hc,x=age,graph="both",main=" ",color="blue",size.letter=0.75) +``` +The elements res3\$out.IPW and res3\$out.NAIVE provide the outliers detected by each method. +```{r} +res3$out.IPW +res3$out.NAIVE +``` + +The values of res3\$SKEW.IPW and res3\$SKEW.NAIVE are the skewness measures calculated from the IPW quantiles or from the naive ones, respectively. + +```{r} +res3$SKEW.IPW +res3$SKEW.NAIVE +``` + +It is worth noticing that the naive boxplot detects only one observation as outlier, while the IPW version identifies five observations as atypical. + +# References + +Brys, G., Hubert, M. and Struyf, A. (2003). A comparison of some new measures of skewness. In Developments in Robust Statistics, ICORS 2001, eds. R. Dutter, P. Filzmoser, U. Gather, and P.J. Rousseeuw, Heidelberg: Springer-Verlag, pp. 98-113. + +Hinkley, D. V. (1975). On power transformations to symmetry. Biometrika, 62, 101-111. + +Hubert, M. and Vandervieren, E. (2008). An adjusted boxplot for skewed distributions. Computational Statistics & Data Analysis, 52, 5186-5201. + +Zhang, Z., Chen, Z., Troendle, J. F. and Zhang, J. (2012). Causal inference on quantiles with an +obstetric application. Biometrics, 68, 697-706. + +