# Non-linear Demand with misspecification

$$
D = f(X) + \epsilon
$$
其中$f$为一个非线性函数$X = (X_1, X_2)$. 可以观测到的变量为$X_{\text observed} = (X_1, X_3)$. 基于手中的数据希望得到一个$D|X_{\text observed}$的条件分位数模型$g(X_{\text observed})$. 

## 生成数据的方式f
1. 考虑使用一个浅层神经网络构造非线性关系, 网络的参数随机生成或是先给定
2. 考虑使用GLM来构造非线性关系，包括Poisson式，指数式等；
3. 尝试polynomial等方法

## 模型的选择g
1. 线性模型 lasso, ridge and elastic net
2. random
3. Kernal-based regression


## benchmark
## misspecification
1. non-linear 方法产生data - 用linear function去estimate
2. 离散 - X_1, X_2, 均有正负，分为四个region, 在这四个region中Y|X的关系是不同的函数. 
3. 连续 - 类似于kernal based方法. 相近的X misspecification也相近？

In [1]:
import numpy as np
import ConformaQuantile as CQ

quantile = 0.8
n_samples = 10000
n_X1 = 20
n_X2 = 2
n_X3 = 2
interval_length = 100
np.random.seed(0)

X1 = abs(np.random.normal(6.4, 10, (n_samples, n_X1)))
X2 = abs(np.random.normal(0.4, 1, (n_samples, n_X2)))
X3 = abs(np.random.normal(0.9, 1, (n_samples, n_X3)))

coefficients = abs(np.random.normal(10, 400, n_X1 + n_X2))
X = np.hstack((X1, X2, X3))
noise = np.random.normal(0, 1, n_samples)

X_true = X[:, :(n_X1 + n_X2)]
X_observed = np.hstack((X1, X3))
Y = np.dot(X_true, coefficients)


train_ratio = 0.6
validation_ratio = 0.2
test_ratio = 0.2


In [5]:
Y0 = Y + noise
loss_unadjusted, loss_adjusted = CQ.perform_regression_analysis(X_observed, Y0, train_ratio, 
                                                                test_ratio, validation_ratio,
                                                                  quantile, model_type='ko')


KeyboardInterrupt: 

In [4]:

loss = {}
models = ['linear', 'quantile', 'lasso', 'ridge', 'random_forest', 'glm', 'neural_network']

Y0 = Y + noise
for model in models:
    loss_unadjusted, loss_adjusted = CQ.perform_regression_analysis(X_observed, Y0, train_ratio, test_ratio, validation_ratio, quantile, model_type=model)
    # 将结果存储在字典中
    loss[model] = {'loss_unadjusted': loss_unadjusted, 'loss_adjusted': loss_adjusted}

min_loss_model = min(loss, key=lambda x: loss[x]['loss_adjusted'])
max_loss_model = max(loss, key=lambda x: loss[x]['loss_adjusted'])

print(f"\n拥有最小调整损失的模型：{min_loss_model}")
print(f"\n拥有最大调整损失的模型：{max_loss_model}")

linear loss unadjusted 6972.893055621992 loss_adjusted 125.59176725989416
quantile loss unadjusted 6883.976768322941 loss_adjusted 126.35026095161547
lasso loss unadjusted 6972.873227794408 loss_adjusted 125.58936046320991
ridge loss unadjusted 6972.8803913867605 loss_adjusted 125.59206856755274
random_forest loss unadjusted 6125.013594244923 loss_adjusted 1645.7233999817074
glm loss unadjusted 6972.893055621939 loss_adjusted 125.59176725989492
neural_network loss unadjusted 48039.76846665287 loss_adjusted 3569.170119478801

拥有最小调整损失的模型：lasso

拥有最大调整损失的模型：neural_network


In [5]:
Y2 = -Y**(-1) + noise

loss = {}
models = ['linear', 'quantile', 'lasso', 'ridge', 'random_forest', 'glm', 'neural_network']
for model in models:
    loss_unadjusted, loss_adjusted = CQ.perform_regression_analysis(X_observed, Y2, train_ratio, test_ratio, validation_ratio, quantile, model_type=model)
    # 将结果存储在字典中
    loss[model] = {'loss_unadjusted': loss_unadjusted, 'loss_adjusted': loss_adjusted}

min_loss_model = min(loss, key=lambda x: loss[x]['loss_adjusted'])
max_loss_model = max(loss, key=lambda x: loss[x]['loss_adjusted'])

print(f"\n拥有最小调整损失的模型：{min_loss_model}")
print(f"\n拥有最大调整损失的模型：{max_loss_model}")

linear loss unadjusted 0.3957079988701452 loss_adjusted 0.2816022719961405
quantile loss unadjusted 0.28178398847513564 loss_adjusted 0.28188539424619086
lasso loss unadjusted 0.39480938155817386 loss_adjusted 0.28117219634896695
ridge loss unadjusted 0.3957079696762227 loss_adjusted 0.28160218570900497
random_forest loss unadjusted 0.3982459967639985 loss_adjusted 0.2848042631490605
glm loss unadjusted 0.3957079988701452 loss_adjusted 0.2816022719961405
neural_network loss unadjusted 0.39462823210200826 loss_adjusted 0.2810275779209757

拥有最小调整损失的模型：neural_network

拥有最大调整损失的模型：random_forest


In [6]:
Y3 = 1 / ( 1 + np.exp(-Y)) + noise
loss = {}
models = ['linear', 'quantile', 'lasso', 'ridge', 'random_forest', 'glm', 'neural_network']
for model in models:
    loss_unadjusted, loss_adjusted = CQ.perform_regression_analysis(X_observed, Y3, train_ratio, test_ratio, validation_ratio, quantile, model_type=model)
    # 将结果存储在字典中
    loss[model] = {'loss_unadjusted': loss_unadjusted, 'loss_adjusted': loss_adjusted}

min_loss_model = min(loss, key=lambda x: loss[x]['loss_adjusted'])
max_loss_model = max(loss, key=lambda x: loss[x]['loss_adjusted'])

print(f"\n拥有最小调整损失的模型：{min_loss_model}")
print(f"\n拥有最大调整损失的模型：{max_loss_model}")

linear loss unadjusted 0.3957080497667438 loss_adjusted 0.28160227079963746
quantile loss unadjusted 0.2817841192360466 loss_adjusted 0.2818855075370898
lasso loss unadjusted 0.39480943289008175 loss_adjusted 0.28117226368876835
ridge loss unadjusted 0.39570802057299337 loss_adjusted 0.28160218451279706
random_forest loss unadjusted 0.39817931893739117 loss_adjusted 0.2847167323341282
glm loss unadjusted 0.39570804976674345 loss_adjusted 0.28160227079963746
neural_network loss unadjusted 0.40951038463396383 loss_adjusted 0.28698669672736765

拥有最小调整损失的模型：lasso

拥有最大调整损失的模型：neural_network
