# Double machine learning (DML)

Double machine learning (DML) is a powerful framework used to estimate causal effects in the presence of many confounding variables. It combines machine learning techniques with econometric methods to control for confounding variables and obtain unbiased estimates of treatment effects. Let's consider a practical example where we want to estimate the effect of wind power production ($w$) on electricity prices ($y$), while **accounting for other influencing factors** such as weather conditions, demand, and other market variables ($x$).

We can model the relationship between these variables as follows. First, we know that there is an unknown function $g$ relating the response variable $y$ to both the endogenous explanatory variable of interest, $w$, and other variables $x$.

\begin{equation}
    y = g(w, x) + \epsilon
\end{equation}

where $\epsilon$ is an error term. This equation simply corresponds to saying:
\begin{equation}
    \text{Electricity price} = g(\text{wind power production}, \text{weather conditions}, \text{demand}, \text{market variables}) + \epsilon
\end{equation}

Then, we also know that the explanatory variable of interest $w$ can be expressed as a function of other known variables (e.g., weather conditions).

\begin{equation}
    w = m(x) + \nu
\end{equation}

where $\nu$ is an error variable.

The DML framework involves two main stages:
- **Nuisance parameter estimation**: use a machine learning model to estimate the functions $ \hat{g}(w, x)$ and $ \hat{m}(x)$.
- **Orthogonalization and estimation**: use the estimated functions to adjust the variables and then estimate the causal effect using a second-stage regression.

**Difference with IV and 2SLS**: while this approach might appear very similar to IVs and 2SLS, there is a fundamental difference. IVs are used in the presence of unknown or unobserved confounders. Here, we have at our disposal a (potentially very large) set of variables that affect the explanatory variable of interest.