-
Notifications
You must be signed in to change notification settings - Fork 115
Open
Milestone
Description
One of the things I like most about Julia is that it propagates missing values, encouraging me to think critically about how I handle them in my data. For instance, sum([1,2,missing]) evaluates to missing, not 3, which tells me I need to be careful and think about why there are missing values and how I should handle them. I might want to drop them, or impute values, or realize that my data cleaning functions are broken and I need to fix them before modeling.
In the case of GLM, missing values are dropped. I would rather the result be missing, as it creates a summary of the data just like sum. Then I won't have a false impression that I'm using complete data and I'll think more about the meaning of my operations.
julia> lm(@formula(y~x), (;x=[1,2,3,missing], y=[10,20,31, 41]))
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}}}}, Matrix{Float64}}
y ~ 1 + x
Coefficients:
─────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
─────────────────────────────────────────────────────────────────────────
(Intercept) -0.666667 0.62361 -1.07 0.4788 -8.59038 7.25704
x 10.5 0.288675 36.37 0.0175 6.83203 14.168
─────────────────────────────────────────────────────────────────────────Metadata
Metadata
Assignees
Labels
No labels