# Other Omega Estimation Methods
- In this report, we try to estimate omega in a different way from the original paper.
- The `omega` is the weight of a feature (donor pull) in Synthetic Control Methods.
- The classical Synthetic Control Methods (ADH) restrictions the following:
    - non-negativity of weights
    - summing to one
    - no intercept
- In the original paper, intercept is allowed for this. (It also incorporates the L2 regularization term into the loss function.)

## Additional methods for PySynthDID
### (1) Search zeta by cross validation
- `zeta` is a hyper-parameter in the estimation of `omega`
- In the original paper, theoretical values were used for zeta.
- In this module, we will search for a more optimal zeta by performing Cross-Validation in the pre-intervention period separately from this theoretical value.
    - Grid Search
    - Baysian Optimaization

### (2) Significant relaxation of ADH conditions
- While the ADH condition is very good in terms of interpretability, it does not seem to be particularly necessary mathematically.
- Here, we relax the `sum(w)=1 condition` and the `non-negative constraint`.
- Specifically, we adopt Lasso, Rige, and ElasticNet, and after performing CV, we adopt the coefficients of sparse regression as `omega`

In [1]:
import warnings

warnings.filterwarnings("ignore")

import sys
import os

sys.path.append(os.path.abspath("../"))

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
from scipy.stats import spearmanr
plt.style.use('ggplot')

from tqdm import tqdm

from synthdid.model import SynthDID
from synthdid.sample_data import fetch_CaliforniaSmoking

In [2]:
df = fetch_CaliforniaSmoking()

PRE_TEREM = [1970, 1979]
POST_TEREM = [1980, 1988]

TREATMENT = ["California"]

df.head()

Unnamed: 0,Alabama,Arkansas,Colorado,Connecticut,Delaware,Georgia,Idaho,Illinois,Indiana,Iowa,...,South Dakota,Tennessee,Texas,Utah,Vermont,Virginia,West Virginia,Wisconsin,Wyoming,California
1970,89.800003,100.300003,124.800003,120.0,155.0,109.900002,102.400002,124.800003,134.600006,108.5,...,92.699997,99.800003,106.400002,65.5,122.599998,124.300003,114.5,106.400002,132.199997,123.0
1971,95.400002,104.099998,125.5,117.599998,161.100006,115.699997,108.5,125.599998,139.300003,108.400002,...,96.699997,106.300003,108.900002,67.699997,124.400002,128.399994,111.5,105.400002,131.699997,121.0
1972,101.099998,103.900002,134.300003,110.800003,156.300003,117.0,126.099998,126.599998,149.199997,109.400002,...,103.0,111.5,108.599998,71.300003,138.0,137.0,117.5,108.800003,140.0,123.5
1973,102.900002,108.0,137.899994,109.300003,154.699997,119.800003,121.800003,124.400002,156.0,110.599998,...,103.5,109.699997,110.400002,72.699997,146.800003,143.100006,116.599998,109.5,141.199997,124.400002
1974,108.199997,109.699997,132.800003,112.400002,151.300003,123.699997,125.599998,131.899994,159.600006,116.099998,...,108.400002,114.800003,114.699997,75.599998,151.800003,149.600006,119.900002,111.800003,145.800003,126.699997


In [3]:
sdid = SynthDID(df, PRE_TEREM, POST_TEREM, TREATMENT)
sdid.fit(zeta_type="base", sparce_estimation=True)

"""
hogehoge
TODO：I'll add it after tomorrow!
"""

In [4]:
sdid.hat_omega_Ridge

array([-1.28802629e-03, -3.47034871e-02, -7.60256313e-02,  3.01931192e-02,
       -9.16670958e-04, -2.87991859e-02,  1.53710385e-02,  6.86255147e-02,
       -6.92353381e-02,  5.64509761e-02,  8.06362473e-02,  8.30709615e-02,
        3.74172391e-03,  4.45110626e-02, -2.21364549e-02, -7.65686008e-02,
       -1.65048278e-02,  3.30605863e-02,  3.93666731e-02,  1.08693274e-01,
        2.02180346e-02,  1.16792258e-03,  1.03934341e-02, -2.31001465e-02,
       -1.87129206e-02, -6.65485079e-03,  1.39675972e-02,  2.43310311e-02,
       -9.71081855e-03, -1.02950954e-02, -3.73858795e-02, -4.68365807e-03,
        4.22547025e-03, -2.78909549e-03,  4.05963584e-02,  9.97363154e-02,
       -2.36082463e-03, -8.55780934e-02,  7.82033294e+01])

In [5]:
sdid.sparce_sdid_potentical_outcome(model="ElasticNet")

1970    122.054395
1971    122.708246
1972    123.945441
1973    125.115995
1974    125.546985
1975    125.933362
1976    126.340801
1977    125.976751
1978    125.526283
1979    124.951740
1980    121.696124
1981    121.279645
1982    121.308067
1983    120.722194
1984    119.806323
1985    119.675498
1986    119.529437
1987    119.101432
1988    119.075984
dtype: float64

In [6]:
sdid.sparceReg_potentical_outcome()

1970    122.054395
1971    122.708246
1972    123.945441
1973    125.115995
1974    125.546985
1975    125.933362
1976    126.340801
1977    125.976751
1978    125.526283
1979    124.951740
1980    124.747859
1981    124.331380
1982    124.359803
1983    123.773930
1984    122.858058
1985    122.727234
1986    122.581172
1987    122.153167
1988    122.127719
dtype: float64

In [7]:
sdid.estimated_params(model="ElasticNet")

Unnamed: 0,features,ElasticNet_weight
0,Alabama,0.0
1,Arkansas,-0.0
2,Colorado,0.0
3,Connecticut,-0.0
4,Delaware,-0.0
5,Georgia,-0.0
6,Idaho,0.0
7,Illinois,0.0
8,Indiana,0.0
9,Iowa,0.0
