Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Conformal Predictive Distributions #2

Open
azev77 opened this issue Sep 29, 2022 · 5 comments
Open

[Feature request] Conformal Predictive Distributions #2

azev77 opened this issue Sep 29, 2022 · 5 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@azev77
Copy link
Collaborator

azev77 commented Sep 29, 2022

Hi & thanks for this package.
I've been waiting for a package for conformal prediction...

Here is some sample code from my test drive which may or may not be useful for docs:

using Pkg
Pkg.add.(["MLJ" "EvoTrees" "Plots"])
Pkg.add(url="https://github.com/pat-alt/ConformalPrediction.jl")
using MLJ, EvoTrees, ConformalPrediction, Plots, Random;
########################################
rng=MersenneTwister(49); #rng=Random.GLOBAL_RNG;
n= 100_000; p=7; σ=0.1;
X = [ones(n) randn(rng, n, p-1)]
θ = randn(rng, p)
y = X * θ .+ σ .* randn(rng, n)
train, calibration, test = partition(eachindex(y), 0.4, 0.4)
########################################
EvoTreeRegressor = @load EvoTreeRegressor pkg=EvoTrees
model = EvoTreeRegressor() 
mach = machine(model, X, y)
fit!(mach, rows=train)
pr_y = predict(mach, rows=test)
########################################
conf_mach = conformal_machine(mach)
calibrate!(conf_mach, selectrows(X, calibration), y[calibration])
pr = predict(conf_mach, X[test,:]; coverage=0.95)

pr_lower = [pr[j][1][2][] for j in 1:length(test)]
pr_upper = [pr[j][2][2][] for j in 1:length(test)]

###########################################
plot()
plot!(y[test], lab="y test")
plot!(pr_y, lab="y prediction")
plot!(pr_lower, lab = "y 95% prediction lower bound")
plot!(pr_upper, lab = "y 95% prediction upper bound")

mean(pr_lower .<= y[test] .<= pr_upper)
@azev77
Copy link
Collaborator Author

azev77 commented Sep 29, 2022

Question: how do I plot the predicted distribution of y at a given x?

#AZ: recover the predicted distribution? 
xt = [X[test[1],:] ;;]'
c_grid = .01:.01:0.99 
LB = []; UB = [];
for j in eachindex(c_grid)
    pr = predict(conf_mach, xt; coverage=c_grid[j] )
    push!(LB, pr[1][1][2][])
    push!(UB, pr[1][2][2][])
end
plot(legend=:topleft)
plot!(LB, 1.0 .- c_grid, lab="LB at %-ile")
plot!(UB, 1.0 .- c_grid, lab="UB at %-ile")
plot!([pr_y[1]], seriestype = :vline, lab="y prediction point estimate", color="red") 

image
To be clear, I'm fairly confident that what I plotted above is not the predicted density of y given x.
My question is how to recover it...

@pat-alt
Copy link
Member

pat-alt commented Sep 30, 2022

Hi @azev77! Great to see you've already played around with the package. I understand what you have in mind and that would certainly be nice feature to add. It can apparently be done as demonstrated in this paper by @valeman and co-authors, but the package does not support this yet. For now all you can really produce is prediction intervals. Adding support for this in the future would be nice, but it looks too involved for me to do that any time soon. Here's a corresponding tutorial if you want to have a go at it yourself. Or perhaps I'm overthinking this and others know of a straight-forward way to do what you have in mind.

What you have plotted is the user-chosen error rate $\alpha$ as a function of $\hat{y}$ as far as I can tell. I'm not quite sure what to make of this right now, but it is definitely not the predictive posterior $\hat{f}(y|x)$.

@pat-alt pat-alt added enhancement New feature or request help wanted Extra attention is needed labels Sep 30, 2022
@azev77
Copy link
Collaborator Author

azev77 commented Oct 1, 2022

I had a look at the slides:

using MLJ, EvoTrees, ConformalPrediction, Plots, Random, MLJLinearModels, Tables;
########################################
n= 100_000; p=70; σ=100.10;
X = [ones(n) randn(MersenneTwister(49), n, p-1)]
θ = randn(MersenneTwister(49), p)
CEF   = X*θ 
Noise = σ*randn(MersenneTwister(49), n)
y = CEF + Noise
train, calibration, test = partition(eachindex(y), 0.4, 0.4)
########################################
LinearRegressor = @load LinearRegressor pkg=MLJLinearModels
model = LinearRegressor(fit_intercept = false) 
mach = machine(model, Tables.table(X), y)
fit!(mach, rows=train)
pr_y = predict(mach, rows=test)
########################################
conf_mach = conformal_machine(mach)
calibrate!(conf_mach, selectrows(X, calibration), y[calibration])
pr = predict(conf_mach, X[test,:]; coverage=0.95)
pr_lower = [pr[j][1][2][] for j in 1:length(test)]
pr_upper = [pr[j][2][2][] for j in 1:length(test)]
mean(pr_lower .<= y[test] .<= pr_upper)   # 0.94975
###########################################
# recover the predicted distribution
xt = [X[test[1],:] ;;]'
c_grid = .01:.001:0.99 
LB = []; UB = [];
for j in eachindex(c_grid)
    pr = predict(conf_mach, xt; coverage=c_grid[j] )
    push!(LB, pr[1][1][2][])
    push!(UB, pr[1][2][2][])
end
plot(legend=:topleft)
plot!(LB, (1.0 .- c_grid)/2.0, lab="LB, quantile")
plot!(UB, (c_grid[end]/2.0) .+ (c_grid)/2.0, lab="UB, quantile")
plot!([pr_y[1]], seriestype = :vline, lab="y prediction, median", color="red") 

Gives the ECDF (centered at the median)
image

Shouldn't the "density" be the derivative of the ECDF?

@pat-alt
Copy link
Member

pat-alt commented Oct 5, 2022

Thanks @azev77 - just linking the related thread on discourse here for info.

@valeman
Copy link

valeman commented Oct 6, 2022

Here is more relevant paper that deals with any underlying regressor https://proceedings.mlr.press/v91/vovk18a.html
And toy Python package that implements it https://pypi.org/project/pysloth/

@pat-alt pat-alt changed the title Examples [Feature request] Conformal Predictive Distributions Oct 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants