<a href="https://colab.research.google.com/github/alexisakov/RTPI/blob/master/Hard_numbers_Case_study_3_Proto_hedonic_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Case 3. Simple hedonic regression

In [1]:
import requests
import numpy as np
import pandas as pd
from statsmodels.regression.linear_model import OLS
from statsmodels.tools.tools import add_constant 

Set up connection to the Hard Numbers API - you will need your API Key - reach us to get one: https://t.me/xiskv.

In [2]:
base_url='http://rtpiapi.hrdn.io/'

token = 'YOU API KEY'


request_headers = {'Authorization': f'Bearer {token}',
                    'Content-Type': 'application/json',
                    'Range-Unit': 'items'
                  }

For hedonic regression we will need now all of the data we collect + a bit more data on the characteristics of the mobile phones.

The algo can be as follows:

* get all the positions marked as mobile phones
* get their latest price
* collect the data on the mobile phone characteristics from third source - we hand collect data from the Yandex.Market in this simple case
* estimate linear regression

First, let's collect the data on the mobile phones, should be as easy as doing this:

In [7]:
ITEMCODE = 7104

request_url = base_url + f'rtpi_product_name?select=*, rtpi_price_page(rosstat_id)'
namedgood = requests.get(request_url, headers = request_headers).json()

In [11]:
named = [x for x in namedgood if x['rtpi_price_page']['rosstat_id'] == ITEMCODE]

Now let's get prices for these phones: 

In [15]:
namedid = [x[ 'web_price_id'] for x in named]

In [16]:
request_url =base_url + f"rtpi_price?select=*&web_price_id=in.{tuple(namedid)}"
prices = requests.get(request_url, headers = request_headers).json()

Let's select the latest prices:

In [27]:
for x in named:
  id = x['web_price_id']
  pid = [p for p in prices if (p['web_price_id'] == id and p['current_price'] is not None)]
  pid = sorted(pid, key=lambda k: k['date_observe'])
  x.update({'price': pid[-1]['current_price'], 'date': pid[-1]['date_observe'] })
  
dfprice = pd.DataFrame(named)
dfprice.set_index('web_price_id',inplace=True)

We have prepared a small sample of phone characteristics - let's import it:

In [110]:
dfsm = pd.read_excel('https://github.com/alexisakov/RTPI/raw/master/smchar.xlsx',sheet_name='X',index_col=0)
dfsm.head()

Unnamed: 0_level_0,product_name,brand,screen_size,screen_type,memory,camera_resolution,url
web_price_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
325825,Doogee S68 Pro Mineral Black,Doogee,5.9,IPS,128,21.0,https://market.yandex.ru/product--smartfon-doo...
413103,ZTE Blade A3 2020 NFC Dark Grey,ZTE,5.45,IPS,32,8.0,https://market.yandex.ru/product--smartfon-zte...
325851,ZTE Blade A5 2020 Aquamarine,ZTE,6.09,IPS,32,13.0,https://market.yandex.ru/product--smartfon-zte...
325921,Смартфон Highscreen Max 3 Black,Highscreen,5.93,IPS,64,16.0,https://market.yandex.ru/product--smartfon-hig...
325786,Смартфон Huawei Y6 2019 (MRD-LX1F) Amber Brown,Huawei,5.93,IPS,64,16.0,https://market.yandex.ru/product--smartfon-hig...


Join the prices sample with the phone description data:

In [47]:
df = dfprice.join(dfsm, how='inner',rsuffix='_t2')
df.head()

Unnamed: 0_level_0,product_name,contributor_id,rtpi_price_page,price,date,product_name_t2,brand,screen_size,screen_type,memory,camera_resolution,url
web_price_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
325921,Смартфон Highscreen Max 3 Black,1,{'rosstat_id': 7104},12990,2020-11-07T12:07:33.963,Смартфон Highscreen Max 3 Black,Highscreen,5.93,IPS,64,16.0,https://market.yandex.ru/product--smartfon-hig...
325825,Смартфон Doogee S68 Pro Mineral Black,1,{'rosstat_id': 7104},21490,2020-12-04T04:10:35.713,Doogee S68 Pro Mineral Black,Doogee,5.9,IPS,128,21.0,https://market.yandex.ru/product--smartfon-doo...
413103,Смартфон ZTE Blade A3 2020 NFC Dark Grey,1,{'rosstat_id': 7104},5990,2020-11-05T20:08:25.997,ZTE Blade A3 2020 NFC Dark Grey,ZTE,5.45,IPS,32,8.0,https://market.yandex.ru/product--smartfon-zte...
325851,Смартфон ZTE Blade A5 2020 Aquamarine,1,{'rosstat_id': 7104},7990,2020-12-04T04:10:01.277,ZTE Blade A5 2020 Aquamarine,ZTE,6.09,IPS,32,13.0,https://market.yandex.ru/product--smartfon-zte...
325471,Смартфон Samsung Galaxy S10E Аквамарин,1,{'rosstat_id': 7104},44990,2020-10-19T06:26:01.167,Смартфон Samsung Galaxy S10E Аквамарин,Samsung,5.8,AMOLED,128,16.0,https://market.yandex.ru/product--smartfon-sam...


Prepared the X variables:

In [102]:
X = df[['brand', 'screen_size',	'screen_type',	'memory',	'camera_resolution']]
X.loc[:,'ssg_q'] = X.brand.apply(lambda x: 1 if x == 'Samsung' else 0)

In [104]:
X = X.join(pd.get_dummies(X.screen_type, drop_first=True), how='inner')
X.drop(['brand','screen_type'],inplace=True,axis=1)
X = add_constant(X, prepend=False)

The Y is the price:

In [65]:
Y =  df[['price']]

Estimate the linear retression and view the results:

In [107]:
mo = OLS(Y,X)
res = mo.fit()

In [108]:
print(res.summary())

                            OLS Regression Results                            
Dep. Variable:                  price   R-squared:                       0.980
Model:                            OLS   Adj. R-squared:                  0.966
Method:                 Least Squares   F-statistic:                     70.12
Date:                Wed, 13 Jan 2021   Prob (F-statistic):           7.96e-06
Time:                        09:30:31   Log-Likelihood:                -119.27
No. Observations:                  13   AIC:                             250.5
Df Residuals:                       7   BIC:                             253.9
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
screen_size          21.6020    389.11

  "anyway, n=%i" % int(n))
