## [DoWhy](https://github.com/Microsoft/dowhy) | An end-to-end library for causal inference
- 微軟
    - 根據[微軟 Blog](https://www.microsoft.com/en-us/research/blog/dowhy-a-library-for-causal-inference/) 說道，DoWhy是過去幾十年對於推薦系統影響的估計、給定一個事件的預測結果這些研究的成果。在開發之中，慢慢整理出來而成。
        - [Causal inference in RecSys](https://www2.slideshare.net/AmitSharma315/causal-inference-in-recommender-systems)
- 隨著計算發達，透過A/B test、傳統機器學習方法去建構pattern recognition and 相關性分析開始對於decision-making是不太足夠的。
- 因此DoWhy延伸而出，專門解決因果分析問題，DoWhy提供四步驟的interface去解決因果推論問題，可以專注在建立模型因果假設以及驗證因果假假設。
- DoWhy支持
    - 估計平均因果效果 for backdoor, frontdoor, instrumental variable and other identification方法以及estimation of the conditional effect(CATE)，藉由與EconML套件整合。
- [KDD 2018 教學](https://causalinference.gitlab.io/kdd-tutorial/)
- 小心得
    - 有很多需要慢慢學習，開始了新世界的大門!
    - 不知道理論下面的根本抓瞎。

In [1]:
# 官方範例

from dowhy import CausalModel
import dowhy.datasets

# 讀取樣本資料
data= dowhy.datasets.linear_dataset(
    beta=10,
    num_common_causes=5,
    num_instruments=2,
    num_samples=10000,
    treatment_is_binary=True
)
data

{'df':        Z0        Z1        W0        W1        W2        W3        W4    v0  \
 0     1.0  0.780821 -0.449474 -0.609384  1.092468 -1.748543  1.373165  True   
 1     0.0  0.811563  1.297437  0.235059  0.747337 -0.135757  1.153010  True   
 2     1.0  0.831755  1.064492 -0.207847 -1.098899 -0.396288 -0.005911  True   
 3     1.0  0.255333  1.589069  0.072279  1.784720  0.212283  0.698475  True   
 4     1.0  0.992790 -0.379035 -0.051387 -0.463359 -1.295947 -1.448540  True   
 ...   ...       ...       ...       ...       ...       ...       ...   ...   
 9995  1.0  0.811566 -0.774730 -0.671071  1.310581 -2.089931  0.646137  True   
 9996  1.0  0.618498 -0.607884  0.173740 -1.287235 -0.136820 -0.472738  True   
 9997  1.0  0.331059  1.786628 -1.861276  0.538140  1.123069  0.119399  True   
 9998  1.0  0.815829  0.328043  1.182187  0.458626  0.292257 -0.722289  True   
 9999  1.0  0.808265  0.974303 -0.508138  3.178290 -0.830409  1.189410  True   
 
               y  
 0     10.189

In [3]:
data['df']

Unnamed: 0,Z0,Z1,W0,W1,W2,W3,W4,v0,y
0,1.0,0.780821,-0.449474,-0.609384,1.092468,-1.748543,1.373165,True,10.189618
1,0.0,0.811563,1.297437,0.235059,0.747337,-0.135757,1.153010,True,20.608345
2,1.0,0.831755,1.064492,-0.207847,-1.098899,-0.396288,-0.005911,True,10.595442
3,1.0,0.255333,1.589069,0.072279,1.784720,0.212283,0.698475,True,22.086096
4,1.0,0.992790,-0.379035,-0.051387,-0.463359,-1.295947,-1.448540,True,-0.126907
...,...,...,...,...,...,...,...,...,...
9995,1.0,0.811566,-0.774730,-0.671071,1.310581,-2.089931,0.646137,True,5.905484
9996,1.0,0.618498,-0.607884,0.173740,-1.287235,-0.136820,-0.472738,True,4.016330
9997,1.0,0.331059,1.786628,-1.861276,0.538140,1.123069,0.119399,True,15.006084
9998,1.0,0.815829,0.328043,1.182187,0.458626,0.292257,-0.722289,True,13.180636


In [4]:
# 四步驟: model, estimate, identify and refute

# 1. 建立因果模型, 透過資料以及graph
model = CausalModel(
    data=data['df'],
    treatment=data['treatment_name'], # v0
    outcome=data['outcome_name'], # y
    graph=data['gml_graph']
)

# 2. 確定因果關係 以及 return 目標估計數
identified_estimand = model.identify_effect()

# 3. 估計目標估計數, 透過統計方法
estimate = model.estimate_effect(identified_estimand, method_name='backdoor.propensity_score_matching')

# 4. 使用多重穩健性檢查反駁獲得的估計
refute_results = model.refute_estimate(identified_estimand, estimate, method_name="random_common_cause")

WARN: Do you want to continue by ignoring any unobserved confounders? (use proceed_when_unidentifiable=True to disable this prompt) [y/n] y


  return f(*args, **kwargs)
  return f(*args, **kwargs)
