GitHub - bsathyamur/SimulationProject_R: Simulation project using rMarkDown

SIMULATION PROJECT - r programming

Simulation Study 1, Significance of Regression

In this simulation study we will investigate the significance of regression test. We will simulate from two different models using the data found in study_1.csv file:

The “significant” model Yi=β0+β1xi1+β2xi2+β3xi3+ϵi where ϵi ~ N(0,σ2) and β0 =3, β1 =1, β2 =1, β3 =1.

The “non-significant” model Yi=β0+β1xi1+β2xi2+β3xi3+ϵi where ϵi ~ N(0,σ2 ) and β0 =3, β1 =0, β2 =0, β3 =0.

For both, we will consider a sample size of 25 (n = 25) and three possible levels of noise. That is, three values of σ i.e., σ ∈ (1,5,10)

Then for each of the three values of σ, for calculate the following for both models.

A. The Fstatistic for the significance of regression test B. The p-value for the significance of regression test C. R2

For each model-σ combination we will use 2500 simulations. Then for each simulation, we will fit a regression model.

Discussion

f-Statistic

The f-statistic doesn’t seem to align with the true distribution curve.

p-Value

When the null hypothesis, H0, is true, all p-values between 0 and 1 are equally likely. In other words, the p-value has a rectangular distribution between 0 and 1 On the other hand, if H1 is true, then the p-values have a distribution for which p-values near zero are more likely than p-values near 1. The precise distribution under the alternative hypothesis depends on the specific hypotheses being tested and the true value of the parameter, but it always favours values near 0.

R-squared

I am not able to understand what type of distribution does the r-squared follows under null and alternate hypthesis. Lower values of sigma seem to have more dense higher values of R-squared for the significant model. In case of the non significant model since its mostly noise, the r-squared doesn’t seem to vary much for each sigma.

Simulation Study 2, Using RMSE for Selection

Using the data found in study_2.csv, We will simulate from the below model

Yi=β0+β1xi1+β2xi2+β3xi3+β3xi4+β3xi5+β3xi6+ϵi where ϵi ~ N(0,σ2) and β0 =0, β1 =5, β2 =-4, β3 =1.6, β1 =-1.1, β2 =0.7, β3 =0.3.

We will consider a sample size of 500 and three possible levels of noise. That is, three values of σ i.e., n=500 and σ ∈ (1,2,4).We simulate the data by randomly splitting the data into train and test sets of equal sizes (250 observations for training, 250 observations for testing).For each simulation, we fit the nine models, with forms as shown below: y ~ x1 y ~ x1 + x2 y ~ x1 + x2 + x3 y ~ x1 + x2 + x3 + x4 y ~ x1 + x2 + x3 + x4 + x5 y ~ x1 + x2 + x3 + x4 + x5 + x6 y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 For each model and for each of the level of noise, we simulate 1000 times and calculate the Train and Test RMSE to identify the best model.

Discussion

Based on the plots in the results section, when looking at the mean RMSE for train vs. Test and also the frequency of the lowest RMSE, model 6 seems to get picked consistently as the best model. So the method seem to be pick the right model consistently.

When sigma is low, the frequency of lowest RMSE increases which suggests that as noise increases the possibility of a model getting picked up as best model increases.

Simulation Study 3: Power

we will investigate the power of the significance of regression test for simple linear regression.

H0:β1=0 vs H1:β1≠0

Power is the probability of rejecting the null hypothesis when the null is not true, that is, the alternative is true and β1 is non-zero.Many things affect the power of a test. In this case, some of those are:

Sample Size, n Signal Strength, β1 Noise Level, σ Significance Level, α In this study, we’ll investigate the first three. To do so we will simulate from the model

Yi=β0+β1xi+ei

We will then consider different signals, noises, and sample sizes: β1 ∈ (−2,−1.9,−1.8,…,−0.1,0,0.1,0.2,0.3,…1.9,2) σ ∈ (1,2,4) n ∈ (10,20,30)

We will hold the significance level constant at α=0.05 .

Discussion

Based on the plots in the results section, for sigma = 1 and 2 their seem to be not much difference in power but when sigma = 4 the power reduced drastically which suggests that as sigma increases power decreases.

Also the power curve is least when beta1 is close to 0 rather than the beta1 values farther away from zero which suggests that as beta1 is close to zero power decreases.

Within each sigma, the number of observations doesn’t seem to have much of variation or impact on the power curve. There is some negligible difference between each values of n. So overall we can conclude sigma and beta1 seem to have a greater impact on power rather than number of observations.

I tried to simulate with 2000 observations and the result still seem to be the same.So doing the simulation 1000 times seem to be sufficient for this case study.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
README.md		README.md
RMSE-comp.png		RMSE-comp.png
balajis2-sim-proj.Rmd		balajis2-sim-proj.Rmd
balajis2-sim-proj.html		balajis2-sim-proj.html
fstat-and-val-comp.png		fstat-and-val-comp.png
power-comp1.png		power-comp1.png
rsquared-comp.png		rsquared-comp.png
sim-table.png		sim-table.png
study_1.csv		study_1.csv
study_2.csv		study_2.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SIMULATION PROJECT - r programming

Simulation Study 1, Significance of Regression

Discussion

f-Statistic

p-Value

R-squared

Simulation Study 2, Using RMSE for Selection

Discussion

Simulation Study 3: Power

Discussion

About

Releases

Packages

Languages

bsathyamur/SimulationProject_R

Folders and files

Latest commit

History

Repository files navigation

SIMULATION PROJECT - r programming

Simulation Study 1, Significance of Regression

Discussion

f-Statistic

p-Value

R-squared

Simulation Study 2, Using RMSE for Selection

Discussion

Simulation Study 3: Power

Discussion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages