Repeated use restricting to e(sample) leads to different results #203

jorpppp · 2024-05-14T21:17:27Z

Consider this example

clear all
webuse nlswork
by idcode (year): gen time=_n
xtset idcode time
by idcode (time): gen union2 = sum(union)
replace union2 = 1 if union2 > 1
xtevent ln_w age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure ///
, pol(union2) w(3) cluster(idcode) impute(nuchange)

gen s = e(sample)

xtevent ln_w age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure if s ///
, pol(union2) w(3) cluster(idcode) impute(nuchange)

The second run over the estimation sample from the first run leads to different results. This probably has to do with the impute option.

I am going to label this as a bug for now because it looks like it.

The text was updated successfully, but these errors were encountered:

jorpppp · 2024-05-14T21:20:00Z

Note that repeated saving of imputed variables with different suffixes is not working:

clear all
webuse nlswork
by idcode (year): gen time=_n
xtset idcode time
by idcode (time): gen union2 = sum(union)
replace union2 = 1 if union2 > 1
xtevent ln_w age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure ///
, pol(union2) w(3) cluster(idcode) impute(nuchange, saveimp)

ren union2_imputed union2_imputed1
gen s = e(sample)

xtevent ln_w age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure if s ///
, pol(union2) w(3) cluster(idcode) impute(nuchange, saveimp)

returns an error message.

This is because we are confirming vars with unab, so I'm going to fix that first.

jorpppp · 2024-05-14T21:31:05Z

@Constantino-Carreto-Romero note the issue above #203 (comment) and the fix in 548b1a6. I think this is also an issue for #177 that we just closed, because it means that when xtevent drops variables to replace them, it may be dropping more variables that it should.

Can you please open a new issue to check all confirms to make sure this doesn't happen? Note that for the fix in #177 you may need to move the check for replacing to a later place in the code, when the code already knows the new kvars and can confirm them individually.

jorpppp · 2024-05-14T21:34:01Z

After 548b1a6 the above code works: #203 (comment)

and it shows that the imputed variable is different across runs, so the issue has to do with the imputation.

jorpppp · 2024-05-14T21:37:34Z

Actually the issue may be deeper since it shows up without any imputation:

clear all
webuse nlswork
by idcode (year): gen time=_n
xtset idcode time
by idcode (time): gen union2 = sum(union)
replace union2 = 1 if union2 > 1
xtevent ln_w age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure ///
, pol(union2) w(3) cluster(idcode)
est sto m1

xtevent ln_w age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure if s ///
, pol(union2) w(3) cluster(idcode) 
est sto m2


est tab m1 m2

jorpppp · 2024-05-14T22:12:12Z

This is not an imputation issue, and I don't think this is a bug. This code checks the estimation samples for the two runs:

clear all
webuse nlswork
by idcode (year): gen time=_n
xtset idcode time
by idcode (time): gen union2 = sum(union)
replace union2 = 1 if union2 > 1
xtevent ln_w age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure ///
, pol(union2) w(3) cluster(idcode) savek(a)
est sto m1


* ren union2_imputed union2_imputed1
gen s1 = e(sample)

xtevent ln_w age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure if s1 ///
, pol(union2) w(3) cluster(idcode) 
est sto m2

gen s2 = e(sample)

* ren union2_imputed union2_imputed2

* assert union2_imputed1 == union2_imputed2

est tab m1 m2

tab s1 s2
tab idcode if s1 != s2

This shows that the samples differ between runs and it lists some ids for where this happens. These ids are the same that are excluded in the second run because of ambiguous event times.

After that if we check the sample for missing values for one of the ids:

li idcode time union2 tenure age ttl_exp s1 s2 if idcode == 13

we get

Note that tenure is missing precisely when the policy variable, union2, changes from zero to one. So in the first run, this observation is excluded because of missing tenure. Now, the second run that is restricted to the estimation sample of the first run. Since that observation is not in the sample for the second run, the observations for id=13 have ambiguous event times so the entire unit is excluded. The same happens for the other ids in the sample.

The same happens with the other ids with the sample.

I'll now check the imputation case.

jorpppp · 2024-05-14T22:31:13Z

It's a good idea to return the list of excluded units due to ambiguous event times so they can be checked. I added this in 3a84461 and 7afc04c

After doing this I checked if the excluded units due to ambiguous event time when using impute(nuchange) (a larger set than without imputation) had the same issue of missing tenure when the policy activates, and they do. So the issue is the same overall.

In summary I think this is not a bug.

@Constantino-Carreto-Romero could you read the posts on this issue and let me know if you agree? If so we can start a PR to add these changes.

Constantino-Carreto-Romero · 2024-05-14T23:38:16Z

It's a good idea to return the list of excluded units due to ambiguous event times so they can be checked. I added this in 3a84461 and 7afc04c

After doing this I checked if the excluded units due to ambiguous event time when using impute(nuchange) (a larger set than without imputation) had the same issue of missing tenure when the policy activates, and they do. So the issue is the same overall.

In summary I think this is not a bug.

@Constantino-Carreto-Romero could you read the posts on this issue and let me know if you agree? If so we can start a PR to add these changes.

@jorpppp
I replicated the examples throughout the issue and I agree it should not be considered as a bug. In the first run, to create the event-time dummies, the program focuses on a larger sample, but then estimates restricting to non-missing observations in the varlist. If the user wants to reestimate using e(sample) he/she might not be able to run the same estimation because the program won’t be able to define event time for some units that lost key observations in the previous run. Maybe we could add a second sample function, one that focuses on the sample used when creating the event-time dummies.

jorpppp · 2024-05-15T00:30:19Z

@Constantino-Carreto-Romero Yes, this makes me think that we just need to set the sample to the one used when creating the event-time dummies. That way the results will replicate using e(sample), and at the end of the day, that is the sample that xtevent is using as a whole, even if _eventols or _eventiv use smaller samples. I'll implement.

jorpppp · 2024-05-15T00:37:36Z

Summary: In this issue we addressed an issue with xtevent giving different results when ran twice, using the e(sample) from the first run in the second run.
Thread continues in #205

…#205) * #203 confirm imputed variable with confirm, exact instead of unab * #203 Return list of excluded units due to ambiguous event time * #203 Modify help file * #203 Set sample to larger sample used to create event-time dummies * #203 #205 Add seed to test file

#207) * #203 confirm imputed variable with confirm, exact instead of unab * #203 Return list of excluded units due to ambiguous event time * #203 Modify help file * #203 Set sample to larger sample used to create event-time dummies * #203 #205 Add seed to test file

…… ( (#208) * #203 repeated use restricting to e(sample) leads to different results… (#207) * #203 confirm imputed variable with confirm, exact instead of unab * #203 Return list of excluded units due to ambiguous event time * #203 Modify help file * #203 Set sample to larger sample used to create event-time dummies * #203 #205 Add seed to test file * #206 Change endpoint dummy var labels

jorpppp added the bug Something isn't working label May 14, 2024

jorpppp self-assigned this May 14, 2024

jorpppp added a commit that referenced this issue May 14, 2024

#203 confirm imputed variable with confirm, exact instead of unab

548b1a6

jorpppp added a commit that referenced this issue May 14, 2024

#203 Return list of excluded units due to ambiguous event time

3a84461

jorpppp added a commit that referenced this issue May 14, 2024

#203 Modify help file

7afc04c

jorpppp assigned Constantino-Carreto-Romero and unassigned jorpppp May 14, 2024

jorpppp removed the bug Something isn't working label May 14, 2024

Constantino-Carreto-Romero mentioned this issue May 14, 2024

Erroneously dropping variables owing to inexact variable name match #204

Closed

jorpppp added a commit that referenced this issue May 15, 2024

#203 Set sample to larger sample used to create event-time dummies

13c9fef

jorpppp linked a pull request May 15, 2024 that will close this issue

#203 repeated use restricting to e(sample) leads to different results #205

Merged

jorpppp mentioned this issue May 15, 2024

#203 repeated use restricting to e(sample) leads to different results #205

Merged

jorpppp added a commit that referenced this issue May 15, 2024

#203 #205 Add seed to test file

34a8577

jorpppp closed this as completed in #205 May 15, 2024

jorpppp mentioned this issue May 15, 2024

#203 repeated use restricting to e(sample) leads to different results… #207

Merged

jorpppp mentioned this issue May 15, 2024

#203 repeated use restricting to e(sample) leads to different results… ( #208

Merged

Constantino-Carreto-Romero mentioned this issue May 21, 2024

210 review helpfile #213

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repeated use restricting to e(sample) leads to different results #203

Repeated use restricting to e(sample) leads to different results #203

jorpppp commented May 14, 2024

jorpppp commented May 14, 2024

jorpppp commented May 14, 2024 •

edited

Loading

jorpppp commented May 14, 2024

jorpppp commented May 14, 2024

jorpppp commented May 14, 2024

jorpppp commented May 14, 2024

Constantino-Carreto-Romero commented May 14, 2024 •

edited

Loading

jorpppp commented May 15, 2024

jorpppp commented May 15, 2024 •

edited

Loading

Repeated use restricting to e(sample) leads to different results #203

Repeated use restricting to e(sample) leads to different results #203

Comments

jorpppp commented May 14, 2024

jorpppp commented May 14, 2024

jorpppp commented May 14, 2024 • edited Loading

jorpppp commented May 14, 2024

jorpppp commented May 14, 2024

jorpppp commented May 14, 2024

jorpppp commented May 14, 2024

Constantino-Carreto-Romero commented May 14, 2024 • edited Loading

jorpppp commented May 15, 2024

jorpppp commented May 15, 2024 • edited Loading

jorpppp commented May 14, 2024 •

edited

Loading

Constantino-Carreto-Romero commented May 14, 2024 •

edited

Loading

jorpppp commented May 15, 2024 •

edited

Loading