/
robust_regression.jl
92 lines (72 loc) · 2.94 KB
/
robust_regression.jl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
### A Pluto.jl notebook ###
# v0.17.5
using Markdown
using InteractiveUtils
# ╔═╡ cea125d8-7303-11ec-3f43-6b79e00bca6a
# hideall
let
docs_dir = dirname(dirname(@__DIR__))
pkg_dir = dirname(docs_dir)
using Pkg: Pkg
Pkg.activate(docs_dir)
Pkg.develop(; path=pkg_dir)
Pkg.instantiate()
# Putting the include here to avoid Pluto getting confused about cell order.
include(joinpath(docs_dir, "src", "tutorials_utils.jl"))
end;
# ╔═╡ e0b79bde-f1c1-4256-866e-eb0649e77cb7
using CSV
# ╔═╡ 5a7e2bbe-ded0-4ce4-b24c-6b4387a7e80d
using DataFrames
# ╔═╡ 6002455d-45ba-42f4-9601-03754c2906d1
using TuringGLM
# ╔═╡ cd2dc745-d2bf-4960-854d-bad17cf767ab
md"""
For the **Robust Regression** with Student-$t$ distribution as the likelihood, we'll use a famous dataset called `kidiq` (Gelman & Hill, 2007), which is data from a survey of adult American women and their respective children.
Dated from 2007, it has 434 observations and 4 variables:
* `kid_score`: child's IQ
* `mom_hs`: binary/dummy (0 or 1) if the child's mother has a high school diploma
* `mom_iq`: mother's IQ
* `mom_age`: mother's age
"""
# ╔═╡ 2fda1cb5-ccc0-41e2-82a6-7549ffa082aa
url = "https://github.com/TuringLang/TuringGLM.jl/raw/main/data/kidiq.csv";
# ╔═╡ b2fa0a26-177d-4cb1-a8e6-73d6becf07e1
kidiq = CSV.read(download(url), DataFrame)
# ╔═╡ dcfc7263-5a89-4632-951d-591ddae5f447
md"""
Using `kid_score` as dependent variable and `mom_hs` along with `mom_iq` as independent variables with a moderation (interaction) effect:
"""
# ╔═╡ 22b266c4-a92b-4f4c-b23c-2fa3cf3a2afb
fm = @formula(kid_score ~ mom_hs * mom_iq)
# ╔═╡ 4f9ff3fd-3f04-49a5-924e-aa23702e75a0
md"""
We instantiate our model with `turing_model` passing a keyword argument `model=TDist` to indicate that the model is a robust regression with the Student's t-distribution:
"""
# ╔═╡ 3f4241d4-1c76-4d7c-99c1-aaa2111385f9
model = turing_model(fm, kidiq; model=TDist);
# ╔═╡ 7bbe4fe4-bcaf-4699-88e4-0dff92250d30
chn = sample(model, NUTS(), 2_000);
# ╔═╡ c2c17f3b-8728-4b0c-ab06-639862ca31f9
# hide
plot_chains(chn)
# ╔═╡ a90f1d84-b39c-47b5-a251-3a6f0dfae4b3
md"""
## References
Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge university press.
"""
# ╔═╡ Cell order:
# ╠═cea125d8-7303-11ec-3f43-6b79e00bca6a
# ╠═cd2dc745-d2bf-4960-854d-bad17cf767ab
# ╠═e0b79bde-f1c1-4256-866e-eb0649e77cb7
# ╠═5a7e2bbe-ded0-4ce4-b24c-6b4387a7e80d
# ╠═2fda1cb5-ccc0-41e2-82a6-7549ffa082aa
# ╠═b2fa0a26-177d-4cb1-a8e6-73d6becf07e1
# ╠═6002455d-45ba-42f4-9601-03754c2906d1
# ╠═dcfc7263-5a89-4632-951d-591ddae5f447
# ╠═22b266c4-a92b-4f4c-b23c-2fa3cf3a2afb
# ╠═4f9ff3fd-3f04-49a5-924e-aa23702e75a0
# ╠═3f4241d4-1c76-4d7c-99c1-aaa2111385f9
# ╠═7bbe4fe4-bcaf-4699-88e4-0dff92250d30
# ╠═c2c17f3b-8728-4b0c-ab06-639862ca31f9
# ╠═a90f1d84-b39c-47b5-a251-3a6f0dfae4b3