
---
title: "Longitudinal Survey Designs"
mathjax: true
toc: true
toc_sticky: true
categories: [data science, statistics]
---

Notes for Chapter 4of [Causal Inference with Survey Data](https://www.linkedin.com/learning/causal-inference-with-survey-data/surveys-with-longitudinal-data?autoSkip=true&resume=false&u=185169545) on LinkedIn Learning, given by Franz Buscha. I'm using this series of posts to take some notes.

In [1]:
import graphviz as gr

In [2]:
def draw_causal_graph(
    edge_list, node_props=None, edge_props=None, graph_direction="UD"
):
    """Utility to draw a causal (directed) graph
    Taken from: https://github.com/dustinstansbury/statistical-rethinking-2023/blob/a0f4f2d15a06b33355cf3065597dcb43ef829991/utils.py#L52-L66

    """
    g = gr.Digraph(graph_attr={"rankdir": graph_direction})

    edge_props = {} if edge_props is None else edge_props
    for e in edge_list:
        props = edge_props[e] if e in edge_props else {}
        g.edge(e[0], e[1], **props)

    if node_props is not None:
        for name, props in node_props.items():
            g.node(name=name, **props)
    return g

# Surveys with longitudinal data

- A series of snapshots.
- Captures information from the same subjects across multiple points in time.
- Useful in understanding how relationships evolve and spotting trends.

**Example: A training program**

Cross-sectional
- snapshot
- static
- limited causality
- quick and cheap

Longitudinal
- time series
- dynamic (can follow someone's productivity over time)
- better causality
- slow and expensive

**Types of longitudinal data**

1. Panel survey
- Collect data on individuals, households, or companies over short time periods. Example: studies of demographic dynamics of families.
2. Cohort survey
- Follow a group of people who share a common characteristic or experience within a defined survey.
3. Repeated cross-section
- Collect data from different samples over time but from the same population.

**Statistical Framework**
- Key to working with time is the t subscript

$$Y_{it} = \beta_0 + \beta_1X1_{it} + ... + \beta_nXn_{it} + \epsilon_{it} $$

- Time subscripts are manipulated by methods in different ways


**Conclusion**
- Allow for a deeper level of analysis, especially for cause-and-effect relationships
- Remember to consider challenges such as data attrition, time-carrying confounders, and complexity of such data
- They often provide a richer and more nuanced view of the world.

# Regression models with time effects

- Adding time to a regression model can significantly improve causal inference
- Time flows in one direction
- Time trends and lagged values are common ways to include time


**OLS with longitudinal data**

- Work with time is the t subscript
- Static model makes no specific use of time from a methods perspective
- Time can be added to this model

**Time manipulation: trends**
- Time can be included as a variable (linear or otherwise)

$$Y_{it} = \beta_0 + \beta_1X1_{it} + \beta_2X2_{it} + \beta_3X3_{it} + \beta_4T_{t} + \epsilon_{it} $$

- T is simply the survey time variable
- Many processes trend, so it makes sense to add time as a control

**Time manipulation: lags**
- Lags help explain how past values of X are related to present values of Y
- Help trace how past events affect today's outcome
- Termed finite distributed lag models of order N

- Model of order 2:

$$Y_{it} = \beta_0 + \beta_1X1_{it} + \beta_2X2_{it} + \beta_3X3_{it} + \beta_4X3_{it-1} + \beta_5X3_{it-2} + \epsilon_{it} $$

X1 and X2 are measured in the present. X3 is measured at three timepoints (present, lag of 1 and lag of 2.)

- $\beta_3, \beta_4, \beta_5$ are independent; they are often summed to estimate a long-run effect of X on Y

- Powerful model for estimating cause and effect of a variable

**Advantages**
- Capture dynamic effects
- Temporal causality
- Flexibility

**Disadvantages**
- Require lots of data
- Autocorrelation/multi-collinearity
- Reverse causality

**Conclusion**
- Using time in a regression can be a real game changer
- You can uncover short- and long-run effects, which cannot be done using static models

In [6]:
%load_ext watermark
%watermark -n -u -v -iv -w

Last updated: Fri May 24 2024

Python implementation: CPython
Python version       : 3.11.7
IPython version      : 8.21.0

graphviz: 0.20.1

Watermark: 2.4.3

