Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modelcols converts x matrix of Float32 to Float64 #293

Open
xgdgsc opened this issue May 26, 2023 · 4 comments
Open

modelcols converts x matrix of Float32 to Float64 #293

xgdgsc opened this issue May 26, 2023 · 4 comments

Comments

@xgdgsc
Copy link

xgdgsc commented May 26, 2023

I' m attempting JuliaStats/GLM.jl#260 . After some line running of code, it seems modelcols converts x matrix of Float32 to Float64, while keeping y Float32, which might be the cause. Any suggestions on fixing this behavior?

@kleinschmidt
Copy link
Member

kleinschmidt commented May 26, 2023

Without a reproducer it's a bit hard to tell exactly what you're trying to do, but here's my understanding of the current modelcols situation:

  • ContinuousTerms are just copied, which should not on its own change eltype (this why a continuous left-hand side of a formula is un-changed, I think)
  • CategoricalTerms look up rows in the associated contrasts matrix. As of Allow arbitrary storage for ContrastsMatrix and other contrasts breakage #273 , the contrasts matrix can be any AbstractMatrix, so if you have categorical variables you can provide your own contrasts matrix; I'm not sure off teh top of my head whether/how you'd actually thread through a non-Matrix{Float64} contrasts matrix but it should be possible, using StatsModels.ConstrastsCoding as a fallback.
  • InterceptTerm, ah now here is one issue: this creates either ones(nrows) (for InterceptTerm{true}, or an empty Matrix{Float64} (for InterceptTerm{false}). and that, combined with..
  • MatrixTerm (which is what the collection of terms on the RHS of the formula becomes) does hcat(modelcols.(terms)...) essentially, which is gonna trigger julia's normal promotion rules and convert everything to Float64.

So one pretty straightforward fix would be to make the columns generated by InterceptTerm be some "weak" type (like Bool) that will never cause promotion of other columns. That may be enough to make this all Just Work™...

@kleinschmidt
Copy link
Member

Long term, I think we'd probably want to support a third T argument to modelcols that converts things as needed, but that's a bit trickier to get right (have to thread through lots of different code paths, including external libraries that extend the syntax...)

@kleinschmidt
Copy link
Member

Another possible fix would be to introduce modelcols!(storage, term, data) methods which would allow uesrs to pass in their own typed containers (using StatsModels.width to figure out the appropriate size).

@xgdgsc
Copy link
Author

xgdgsc commented May 26, 2023

Thanks. I tested the branch dfk/float32s with

using GLM
using Random
using DataFrames
x32 = randn(Float32, (100, 1))
y32 = randn(Float32, 100)
regdata = DataFrame(:x => x32[:, 1], :y => y32)
ols = lm(@formula(y ~ x + 1), regdata)

It seem to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants