Skip to content

Example 17: Simulation with Varying Sample Size (Version 0.2)

psunthud edited this page Dec 30, 2012 · 2 revisions

Model Description

This example will show how to make the simulation study such that the sample size is not equal across replications. That is, the sample size is increasing from 50 to 1000 by 1. Then, we will find the sample size value that the power of a given parameter is equal to .8 and the fit indices cutoff of the estimated sample size value. Let’s go back to the confirmatory factor analysis (CFA) model with two factors and three indicators each. Factor loadings are .7. Error variances are s to make the indicator variances equal to 1. The factor correlation is .5. We will find the sample size that provides the power of 0.8 in detecting the factor correlation.

Example 17 Model

Syntax

The syntax will be similar to the Example 1:

loading <- matrix(0, 6, 2)
loading[1:3, 1] <- NA
loading[4:6, 2] <- NA
LX <- simMatrix(loading, 0.7)
  
latent.cor <- matrix(NA, 2, 2)
diag(latent.cor) <- 1
RPH <- symMatrix(latent.cor, 0.5)
  
error.cor <- matrix(0, 6, 6)
diag(error.cor) <- 1
RTD <- symMatrix(error.cor)
  
CFA.Model <- simSetCFA(LX = LX, RPH = RPH, RTD = RTD)
SimData <- simData(CFA.Model, 200)
SimModel <- simModel(CFA.Model)

Note that the specified sample size in the data object is arbitrary. When we build the result object, the specified sample size will be overwritten. The result object can be specified for the varying sample size by

Output <- simResult(NULL, SimData, SimModel, n=50:1000)
summary(Output)

The figure below shows the screen provided by the summary function:

Example 17 summary

The first element, the number of replications, is specified as NULL because the number of replications will depend on the number of elements of the vectors given the n argument. The n argument is the varying sample size, which is specified as the sequence of 50 to 1000, increasing by 1. If you call the summary function from the Output object, you will find the note that the sample size is varying. Note that the resulting summary of fit indices cutoffs is based on each value of sample size.

We can plot the fit indices cutoff by the plotCutoff function as usual:

plotCutoff(Output, 0.05)

The figure below shows the graph provided by the plotCutoff function:

Example 17 SSD

Notice that the graph will show the cutoffs (red line) given each value of sample size given a specified alpha level (the second argument). We can find the cutoff given a value of sample size by the getCutoff function:

getCutoff(Output, 0.05, nVal = 200)	

The first argument is the result object. The second argument is the alpha level. The third argument, nVal, is the value of sample size. If percent missing completely at random is varying, the pmMCARval argument is available for specifying a given value of the missing percentage. If percent missing at random is varying, the pmMARval argument is available for specifying a given value of the missing percentage.

The power of each parameter given the values of sample size can be obtained by the getPower function:

Cpow <- getPower(Output)

The figure below shows the first twenty rows of the Cpow object:

Example 17 Cpow

The result of the getPower function is a data frame that the first column is the sample size and the other columns are the power of each parameter. We can find a power given a value of sample size by adding the nVal argument in the getPower function:

Cpow2 <- getPower(Output, nVal = 200)

The figure below shows the Cpow2 object:

Example 17 Cpow2

We can find the sample size that provides the power just over 0.80 by the findPower function:

findPower(Cpow, "N", 0.80)

The figure below shows the screen provided by the findPower function:

Example 17 findpower

The resulting values of the findPower function can be classified into five types. Please see the help file of the findPower function for further details:

?findPower

The sample size that provides power of 0.8 in detecting factor correlation is 62.

The first argument is the power table obtained by the getPower function. The second argument is the target varying parameter, which is sample size. This argument can be specified as 1 (as the index of the sample size) or "iv.N" (for the full first column name). The third argument is the desired power. We may plot the power of target parameters when the sample size is varying by the plotPower function:

plotPower(Output, powerParam=c("LY1_1", "PS2_1"))

The figure below shows the graph provided by the plotCutoff function:

Example 17 plotpower

The powerParam argument means the parameter names that we wish to plot.

Here is the summary of the whole script in this example.

Remarks

Increase the Number of Replications for Each Sample Size

In the example, the number of replications for each sample size is 1 because the vector contains only one value of each sample size. We can increase the number of replications for each sample size by duplicating the sample size vector by changing Line 19 by

Output <- simResult(NULL, SimData, SimModel, n=rep(50:1000,3))

In this code, the number of replications for each sample size is 3.

Specify Sample Size as a Distribution Object

The sample size can be specified as a distribution object and use the number of replications argument (the first argument) as the number of drawn from the distribution object. For example, Line 19 can be changed as

Output <- simResult(1000, SimData, SimModel, n=simUnif(50, 1000))

Function Review

  • getPower Get the power of each parameter
  • findPower Find the value of varying parameters that provides a given power
  • plotPower Plot the power of specified parameters against varying parameters
Clone this wiki locally