Skip to content

Propagation of missing values #496

@jariji

Description

@jariji

One of the things I like most about Julia is that it propagates missing values, encouraging me to think critically about how I handle them in my data. For instance, sum([1,2,missing]) evaluates to missing, not 3, which tells me I need to be careful and think about why there are missing values and how I should handle them. I might want to drop them, or impute values, or realize that my data cleaning functions are broken and I need to fix them before modeling.

In the case of GLM, missing values are dropped. I would rather the result be missing, as it creates a summary of the data just like sum. Then I won't have a false impression that I'm using complete data and I'll think more about the meaning of my operations.

julia> lm(@formula(y~x), (;x=[1,2,3,missing], y=[10,20,31, 41]))
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}}}}, Matrix{Float64}}

y ~ 1 + x

Coefficients:
─────────────────────────────────────────────────────────────────────────
                 Coef.  Std. Error      t  Pr(>|t|)  Lower 95%  Upper 95%
─────────────────────────────────────────────────────────────────────────
(Intercept)  -0.666667    0.62361   -1.07    0.4788   -8.59038    7.25704
x            10.5         0.288675  36.37    0.0175    6.83203   14.168
─────────────────────────────────────────────────────────────────────────

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions