In [6]:
import numpy as np
import pandas as pd
import seaborn as sns
import statsmodels.api as sm

from pyfixest.estimation import feols
from pyfixest.utils import get_data
from pyfixest import etable


    You have loaded the 'pyfixest.did' module. While every function is tested in `tests/test_did.py`,
    the module is not yet as thoroughly tested as I would like. So please use it with caution and
    provide feedback in case you stumble over any bugs!
    


## **Exercise, Work from Home and Performance.**

### Experiment Description
Ctrip, which, as of 2016, had 16,000 employees and a NASDAQ valuation of nearly $10 billion, was interested in the potential of WFH to reduce its high office rental costs and annual staff turnover of 50%. At the same time, managers worried that allowing employees to work away from the direct oversight of their supervisors would lead to a large increase in shirking.
 
Given the uncertainty surrounding the effects of WFH in the research literature as well as in practice, Ctrip decided to run a randomized controlled trial. The authors assisted in designing the experiment and, whenever feasible, our recommendations were followed by management. We had complete access to the resulting data, as well as to data from surveys conducted by the firm. We also conducted various surveys ourselves and numerous interviews with employees, line supervisors and senior management.
 
In this nine-month experiment, Ctrip asked the 996 employees in the airfare and hotel departments of its Shanghai call center whether they would be interested in WFH four days a week, with the fifth day in the office. Approximately half of the employees (503) were interested, particularly those who were married, had children and faced long commutes. Of these, 249 were qualified to take part in the experiment by virtue of having at least six months’ tenure, broadband access and a private room at home in which they could work. After a lottery draw, those employees with even-numbered birthdays were selected for WFH, while those with odd-numbered birthdates stayed in the office to act as the control group.
 
Office and home workers used the same IT equipment, faced the same work order flow from a common central server, carried out the same tasks and were compensated under the same pay system, which included an element of individual performance pay. The only difference between the two groups was the location of work. This allows us to isolate the impact of WFH versus other practices that are often bundled alongside this practice in attempts to improve work-life balance, such as flexible work hours. Importantly, individual employees were not allowed to work overtime outside their team shift. In particular, eliminating commuting time did not permit the treatment group to work overtime, so this is not a factor directly driving the results.


### Variable definitions:

| Variable          | Description                                                                                         |
|-------------------|-----------------------------------------------------------------------------------------------------|
| `experiment_treatment` | Is the treatment on for this person in this week (`1` = on, `0` = not on) |
| `expgroup` | The treatment indicator; `1` = randomly assigned to work from home; `0` = randomly assigned to come to the office. |
| `personid`           | The ID of the worker.                                                                               |
| `year_week`          | The week of the measurement.                                                                        |
| `perform1`           | Overall performance metric devised by the company.                                                  |
| `phonecall`          | Number of phone calls in that week. (Note: not all employees have phone call counts due to their jobs.) |
| `logcalllength`      | Minutes on the phone in that week. (Note: not all employees have phone calls due to their jobs.)   |
| `children`           | Does the person have children? (`1` or `0`)                                                         |
| `married`            | Married? (`1` or `0`)                                                                               |
| `commute`            | Cost of commute (yuan)                                                                              |
| `bedroom`            | Does the worker have their own bedroom? (`1` or `0`)                                                |
| `high_educ`          | Does the worker have a high school education or above (`1` or `0`)                                  |
| `tenure`             | The worker’s tenure in months.                                                                      |
| `grosswage`          | Monthly wage in yuan.                                                                               |
| `age`                | The worker’s age.                                                                                   |
| `men`                | Is the worker a man (`1` or `0`).     

In [12]:
data_wfh = pd.read_csv("wfh_small_heterog.csv")

Your task is to use difference in differences to answer the following questions:
1) What is the effect of wfh on performance?
2) What is the effect of wfh on phone calls?
3) What is the effect of wfh on call length?

How do you interpret the results?


In [21]:
# This is the basic model without fixed effects. Note, we are clustering the standard errors at the person level, since this is the unit of randomization.
model = feols("perform1 ~ experiment_treatment", data = data_wfh).vcov({"CRV3": "personid"})
etable(model)

                                est1
--------------------  --------------
depvar                      perform1
------------------------------------
Intercept             -0.044 (0.036)
experiment_treatment   0.027 (0.058)
------------------------------------
------------------------------------
R2                             0.000
S.E. type               by: personid
Observations                   2E+04
------------------------------------
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001
Format of coefficient cell:
Coefficient (Std. Error)
