# Non-linear Demand with misspecification

$$
D = f(X) + \epsilon
$$
其中$f$为一个非线性函数$X = (X_1, X_2)$. 可以观测到的变量为$X_{\text observed} = (X_1, X_3)$. 基于手中的数据希望得到一个$D|X_{\text observed}$的条件分位数模型$g(X_{\text observed})$. 

## 生成数据的方式f
1. 考虑使用一个浅层神经网络构造非线性关系, 网络的参数随机生成或是先给定
2. 考虑使用GLM来构造非线性关系，包括Poisson式，指数式等；
3. 尝试polynomial等方法

## 模型的选择g
1. 线性模型 lasso, ridge and elastic net
2. random
3. Kernal-based regression


## benchmark
## misspecification
1. non-linear 方法产生data - 用linear function去estimate
2. 离散 - X_1, X_2, 均有正负，分为四个region, 在这四个region中Y|X的关系是不同的函数. 
3. 连续 - 类似于kernal based方法. 相近的X misspecification也相近？

In [12]:
import numpy as np
import ConformaQuantile as CQ
import importlib
importlib.reload(CQ)

<module 'ConformaQuantile' from '/Users/wangyanbo/conformal/ConformaQuantile.py'>

In [13]:


quantile = 0.8
n_samples = 3000
n_X1 = 20
n_X2 = 2
n_X3 = 2
np.random.seed(0)

X1 = abs(np.random.normal(6.4, 10, (n_samples, n_X1)))
X2 = abs(np.random.normal(0.4, 1, (n_samples, n_X2)))
X3 = abs(np.random.normal(0.9, 1, (n_samples, n_X3)))

coefficients = abs(np.random.normal(10, 400, n_X1 + n_X2))
X = np.hstack((X1, X2, X3))
noise = np.random.normal(0, 1, n_samples)

X_true = X[:, :(n_X1 + n_X2)]
X_observed = np.hstack((X1, X3))
Y = np.dot(X_true, coefficients)


train_ratio = 0.6
validation_ratio = 0.2
test_ratio = 0.2


In [14]:

loss = {}
models = ['linear', 'quantile', 'lasso', 'ridge', 'random_forest', 'glm', 'neural_network', 'ko', 'quantile_net']

Y0 = Y + noise
for model in models:
    loss_unadjusted, loss_adjusted = CQ.perform_regression_analysis(X_observed, Y0, train_ratio, test_ratio, validation_ratio, quantile, model_type=model)
    # 将结果存储在字典中
    loss[model] = {'loss_unadjusted': loss_unadjusted, 'loss_adjusted': loss_adjusted}

min_loss_model = min(loss, key=lambda x: loss[x]['loss_adjusted'])
max_loss_model = max(loss, key=lambda x: loss[x]['loss_adjusted'])

print(f"\n拥有最小调整损失的模型：{min_loss_model}")
print(f"\n拥有最大调整损失的模型：{max_loss_model}")

linear loss unadjusted 7510.276168567102 loss_adjusted 41.190040121939454
quantile loss unadjusted 7480.8045953555 loss_adjusted 41.48447739941951
lasso loss unadjusted 7510.253193216587 loss_adjusted 41.189845381765615
ridge loss unadjusted 7510.235219215372 loss_adjusted 41.19066625814608
random_forest loss unadjusted 6582.727482334661 loss_adjusted 1851.404524634326
glm loss unadjusted 7510.276168567127 loss_adjusted 41.19004012193957
neural_network loss unadjusted 47455.543670205894 loss_adjusted 3756.664247563416
ko loss unadjusted 8347.963091666206 loss_adjusted 3006.794005559555
quantile_net loss unadjusted 38811.58022525961 loss_adjusted 3425.290646418068

拥有最小调整损失的模型：lasso

拥有最大调整损失的模型：neural_network


In [15]:
Y2 = -Y**(-1) + noise

loss = {}
models = ['linear', 'quantile', 'lasso', 'ridge', 'random_forest', 'glm', 'neural_network','ko', 'quantile_net']
for model in models:
    loss_unadjusted, loss_adjusted = CQ.perform_regression_analysis(X_observed, Y2, train_ratio, test_ratio, validation_ratio, quantile, model_type=model)
    # 将结果存储在字典中
    loss[model] = {'loss_unadjusted': loss_unadjusted, 'loss_adjusted': loss_adjusted}

min_loss_model = min(loss, key=lambda x: loss[x]['loss_adjusted'])
max_loss_model = max(loss, key=lambda x: loss[x]['loss_adjusted'])

print(f"\n拥有最小调整损失的模型：{min_loss_model}")
print(f"\n拥有最大调整损失的模型：{max_loss_model}")

linear loss unadjusted 0.4145035394378913 loss_adjusted 0.279560227431004
quantile loss unadjusted 0.2765467409789086 loss_adjusted 0.27980502484435393
lasso loss unadjusted 0.41177451759140077 loss_adjusted 0.2783519617409534
ridge loss unadjusted 0.41450239185862886 loss_adjusted 0.2795593001298907
random_forest loss unadjusted 0.4082795240532486 loss_adjusted 0.28752075740815325
glm loss unadjusted 0.41450353943789137 loss_adjusted 0.279560227431004
neural_network loss unadjusted 0.4166836654919274 loss_adjusted 0.28666943223619146
ko loss unadjusted 0.5271099652615702 loss_adjusted 0.3946069870359323
quantile_net loss unadjusted 0.32433279829164924 loss_adjusted 0.30845991116715593

拥有最小调整损失的模型：lasso

拥有最大调整损失的模型：ko


In [17]:
Y3 = 1 / ( 1 + np.exp(-Y)) + noise
loss = {}
models = ['linear', 'quantile', 'lasso', 'ridge', 'random_forest', 'glm', 'neural_network', 'ko', 'quantile_net']
for model in models:
    loss_unadjusted, loss_adjusted = CQ.perform_regression_analysis(X_observed, Y3, train_ratio, test_ratio, validation_ratio, quantile, model_type=model)
    # 将结果存储在字典中
    loss[model] = {'loss_unadjusted': loss_unadjusted, 'loss_adjusted': loss_adjusted}

min_loss_model = min(loss, key=lambda x: loss[x]['loss_adjusted'])
max_loss_model = max(loss, key=lambda x: loss[x]['loss_adjusted'])

print(f"\n拥有最小调整损失的模型：{min_loss_model}")
print(f"\n拥有最大调整损失的模型：{max_loss_model}")

linear loss unadjusted 0.41450349225862654 loss_adjusted 0.2795602567742438
quantile loss unadjusted 0.2764887423260208 loss_adjusted 0.2797950755162141
lasso loss unadjusted 0.4117743393610347 loss_adjusted 0.27835191015571753
ridge loss unadjusted 0.4145023446774338 loss_adjusted 0.27955932947106826
random_forest loss unadjusted 0.4070673681557272 loss_adjusted 0.28742415519308073
glm loss unadjusted 0.41450349225862676 loss_adjusted 0.2795602567742438
neural_network loss unadjusted 0.42893453052729513 loss_adjusted 0.2897991591158495
ko loss unadjusted 0.5271094687099268 loss_adjusted 0.39460699593644116
quantile_net loss unadjusted 0.3204266008377796 loss_adjusted 0.32738352285975547

拥有最小调整损失的模型：lasso

拥有最大调整损失的模型：ko


In [19]:
Y4 = 1 / ( 1 + np.exp(-Y)) + Y**2 + Y**(-1) + noise
loss = {}
models = ['linear', 'quantile', 'lasso', 'ridge', 'random_forest', 'glm', 'neural_network', 'ko', 'quantile_net']
for model in models:
    loss_unadjusted, loss_adjusted = CQ.perform_regression_analysis(X_observed, Y3, train_ratio, test_ratio, validation_ratio, quantile, model_type=model)
    # 将结果存储在字典中
    loss[model] = {'loss_unadjusted': loss_unadjusted, 'loss_adjusted': loss_adjusted}

min_loss_model = min(loss, key=lambda x: loss[x]['loss_adjusted'])
max_loss_model = max(loss, key=lambda x: loss[x]['loss_adjusted'])

print(f"\n拥有最小调整损失的模型：{min_loss_model}")
print(f"\n拥有最大调整损失的模型：{max_loss_model}")

linear loss unadjusted 0.41450349225862654 loss_adjusted 0.2795602567742438
quantile loss unadjusted 0.2764887423260208 loss_adjusted 0.2797950755162141
lasso loss unadjusted 0.4117743393610347 loss_adjusted 0.27835191015571753
ridge loss unadjusted 0.4145023446774338 loss_adjusted 0.27955932947106826
random_forest loss unadjusted 0.4070673681557272 loss_adjusted 0.28742415519308073
glm loss unadjusted 0.41450349225862676 loss_adjusted 0.2795602567742438
neural_network loss unadjusted 0.42551157913603094 loss_adjusted 0.2873128512633201
ko loss unadjusted 0.5271094687099268 loss_adjusted 0.39460699593644116
quantile_net loss unadjusted 0.31572729666664606 loss_adjusted 0.31669501551966794

拥有最小调整损失的模型：lasso

拥有最大调整损失的模型：ko
