author: Soroor Hediyehzadeh - Walter and Eliza Hall Institute of Medical Research date: December 2018 autosize: true
- 45 - 50 minutes online sessions via Zoom
- Aims to enhance the statistical knowledge and expertise of the RLadies community
- Topics include statistical and statistical learning models
Congradulations!
- To Earo Wang for 2018 ASA Statistical Graphics Student Paper Award
- And to our very own Alexandra Garnham for 2018 ABACBS Professional Bioinformatician Award
- Strong distributional assumptions behind parametric models >> less flexibility
- No assumptions about distribution of the data in non-parametric modelling. Models are more flexible, and more complex.
- Parametric models are generalizable, while non-parametric models are not
-
Bias - variance trade-off: parametric estimates have smaller variance and large bias, non-parametric models have small bias, but large variance
- Non-parametric Density Estimation :
- Histogram
- Kernel Density Estimation
- Non-parametric Regression :
- Nadyara-Watson estimator
- Local Polynomial estimators
- splines
- Histogram
- Kernel Density Estimation (KDE)
Typically used for trend estimation. Models conditional expectation of the response
- splines: Fix a number of data points. These points are called knots. Then, fit piecewise polynomials between knots. Typically local polynomial order 4 is used, which are called cubic splines.
In the above expressions, epsilons are knots
- Avoid selection of Knots
- All distinct data points are taken as knots
- Have control on the curvature (i.e. smoothness) of the fit
-
Non-parametric models require estimation of a large number of parameters, resulting in very complex models >> curse of dimensionality
-
Use a combination of parametric and non-parametric models to avoid estimation of large # of parameters >> semi-parametric models
- Additive Models
- Generalized Partial Linear models
- Generalized additive model
- Single Index model - link function unknown
- Response variables:
- motor_UPDRS
- total_UPDRS
- Covariates:
- NHR,HNR
- RPDE - A nonlinear dynamical complexity measure
- DFA - Signal fractal scaling exponent
- PPE - A nonlinear measure of fundamental frequency variation
- Jitter(Abs)
- Shimmer
- 2-D KDE
# Generate 2-d density plots
contour(kde2d(Upd_avg[,param], Upd_avg$total_UPDRS, n=100),
xlab = param, ylab = "total_UPDRS")
- Kernel regression
plot(ksmooth(x, y, kernel = "normal", bandwidth =bw))
points(x,y)
Fitting GAM (Generalized Additive Models) with the mgcv CRAN package
- Recall that the covariates in the dataset are a combination of linear and non-linear covariates
- We use the
gam()
function to fit a Generalized Additive model with a Gaussian link function. The model consists of linear fit for linear covariates, and regression splines, denoted bys()
, for non-linear terms. This is an extention to the Generalized Partial Linear model.
Fitting GAM (Generalized Additive Models) with the mgcv CRAN package
library(mgcv)
fit <- gam(motor_UPDRS ~ age + sex + Jitter +
JitterAbs + JitterRAP +
JitterDDP + Shimmer +
ShimmerAPQ3 +
NHR + HNR + DFA + s(PPE) + s(RPDE)+
s(subject,bs="re"), data = dat, method = "REML")
s(, bs="re")
is a way to incorporate random effect terms to GAMs.
===================
summary(fit)
====================
library(np)
fhat <- npcdens(motor_UPDRS~DFA,data=dat)
plot(fhat,view="fixed", theta = 310, phi=15,main="")
Tests for interactions between non-linear terms is done using te()
, the tensor product smooths
fit <- gam(motor_UPDRS ~ age + sex + Jitter +
JitterAbs + JitterRAP +
JitterDDP +
NHR + HNR + DFA + s(PPE) + s(RPDE)+
te(PPE, RPDE), data = dat, method = "REML")
Definitions and figures in this presentation are taken from Liuhua Peng's material. The Parkinson's Telemonitoring data analysis is a joint work of Gavriel Olshansky and myself.
Suggested readings
- Generalized Additive Models, Second Edition. Simon Wood.
- RACINE, J. NONPARAMETRIC AND SEMIPARAMETRIC METHODS IN R
- mgcv package vignette
R-Ladies Melbourne will soon undergo substantial structural changes. Your feedback helps the committee to decide if #rstats lunch seminars should have a place in the upcoming changes.
Link to the Google Survey :