# Случайный лес

В этом практическом задании вы решите задачу регрессии на реальных данных при помощи линейной модели и случайного леса.

## Цели практической работы

* Научиться применять случайный лес в задаче регрессии.
* Научиться сравнивать качество случайного леса с качеством линейной модели.
* Научиться настраивать гиперпараметры леса.

## Что входит в работу

* Выполнить предобработку данных.
* Обучить линейную регрессию на данных задачи.
* Обучить случайный лес на данных задачи.
* Подобрать гиперпараметры для леса.
* Визуализировать важность признаков у леса.

## Что оценивается

*  Все ячейки заполнены; при запуске ячеек  Python не выдаёт информацию об ошибках.
*  Качество итоговой модели $R^2$ превышает 0,95.


## Что нужно сделать

Постройте модели, предсказывающие стоимость автомобилей по характеристикам.

Признаков у машин много, но в этой работе обойдёмся только числовыми признаками:
*  year — год производства автомобиля;
*  km_driven — пробег;
*  seats — количество посадочных мест;
*  mileage — другая характеристика пробега;
*  engine — мощность двигателя;
*  max_power — мощность автомобиля.

Целевая переменная — selling_price.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np


from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn.model_selection import GridSearchCV



import warnings
warnings.filterwarnings('ignore')

In [2]:
train = pd.read_csv('cars_train.csv')
test = pd.read_csv('cars_test.csv')

In [3]:
train

Unnamed: 0,name,year,selling_price,km_driven,fuel,seller_type,transmission,owner,mileage,engine,max_power,torque,seats
0,Maruti Swift Dzire VDI,2014.0,450000.0,145500.0,Diesel,Individual,Manual,First Owner,23.4 kmpl,1248 CC,74 bhp,190Nm@ 2000rpm,5.0
1,Skoda Rapid 1.5 TDI Ambition,2014.0,370000.0,120000.0,Diesel,Individual,Manual,Second Owner,21.14 kmpl,1498 CC,103.52 bhp,250Nm@ 1500-2500rpm,5.0
2,Hyundai i20 Sportz Diesel,2010.0,225000.0,127000.0,Diesel,Individual,Manual,First Owner,23.0 kmpl,1396 CC,90 bhp,22.4 kgm at 1750-2750rpm,5.0
3,"Maruti Swift VXI BSIII,2007,130000,120000,Petr...",,,,,,,,,,,,
4,Hyundai Xcent 1.2 VTVT E Plus,2017.0,440000.0,45000.0,Petrol,Individual,Manual,First Owner,20.14 kmpl,1197 CC,81.86 bhp,113.75nm@ 4000rpm,5.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
6993,Hyundai i20 Magna,2013.0,320000.0,110000.0,Petrol,Individual,Manual,First Owner,18.5 kmpl,1197 CC,82.85 bhp,113.7Nm@ 4000rpm,5.0
6994,"Hyundai Verna CRDi SX,2007,135000,119000,Diese...",,,,,,,,,,,,
6995,Maruti Swift Dzire ZDi,2009.0,382000.0,120000.0,Diesel,Individual,Manual,First Owner,19.3 kmpl,1248 CC,73.9 bhp,190Nm@ 2000rpm,5.0
6996,Tata Indigo CR4,2013.0,290000.0,25000.0,Diesel,Individual,Manual,First Owner,23.57 kmpl,1396 CC,70 bhp,140Nm@ 1800-3000rpm,5.0


In [4]:
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6998 entries, 0 to 6997
Data columns (total 13 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   name           6998 non-null   object 
 1   year           6699 non-null   float64
 2   selling_price  6699 non-null   float64
 3   km_driven      6699 non-null   float64
 4   fuel           6699 non-null   object 
 5   seller_type    6699 non-null   object 
 6   transmission   6699 non-null   object 
 7   owner          6699 non-null   object 
 8   mileage        6497 non-null   object 
 9   engine         6497 non-null   object 
 10  max_power      6503 non-null   object 
 11  torque         6497 non-null   object 
 12  seats          6497 non-null   float64
dtypes: float64(4), object(9)
memory usage: 710.9+ KB


In [5]:
train.isnull().sum()

name               0
year             299
selling_price    299
km_driven        299
fuel             299
seller_type      299
transmission     299
owner            299
mileage          501
engine           501
max_power        495
torque           501
seats            501
dtype: int64

В тренировочных данных есть автомобили, для которых неизвестна стоимость. Удалите эти автомобили из трейна.

In [6]:
# Ваш код здесь
train_new = train.dropna(subset=['selling_price'])

In [7]:
train_new.isnull().sum()

name               0
year               0
selling_price      0
km_driven          0
fuel               0
seller_type        0
transmission       0
owner              0
mileage          202
engine           202
max_power        196
torque           202
seats            202
dtype: int64

In [8]:
test

Unnamed: 0,name,year,selling_price,km_driven,fuel,seller_type,transmission,owner,mileage,engine,max_power,torque,seats
0,Mahindra Xylo E4 BS IV,2010,229999,168000,Diesel,Individual,Manual,First Owner,14.0 kmpl,2498 CC,112 bhp,260 Nm at 1800-2200 rpm,7.0
1,Tata Nexon 1.5 Revotorq XE,2017,665000,25000,Diesel,Individual,Manual,First Owner,21.5 kmpl,1497 CC,108.5 bhp,260Nm@ 1500-2750rpm,5.0
2,Honda Civic 1.8 S AT,2007,175000,218463,Petrol,Individual,Automatic,First Owner,12.9 kmpl,1799 CC,130 bhp,172Nm@ 4300rpm,5.0
3,Honda City i DTEC VX,2015,635000,173000,Diesel,Individual,Manual,First Owner,25.1 kmpl,1498 CC,98.6 bhp,200Nm@ 1750rpm,5.0
4,Tata Indica Vista Aura 1.2 Safire BSIV,2011,130000,70000,Petrol,Individual,Manual,Second Owner,16.5 kmpl,1172 CC,65 bhp,96 Nm at 3000 rpm,5.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,Hyundai i10 Magna 1.1L,2008,250000,100000,Petrol,Individual,Manual,Second Owner,19.81 kmpl,1086 CC,68.05 bhp,99.04Nm@ 4500rpm,5.0
996,Hyundai i20 2015-2017 Sportz 1.2,2017,440000,50000,Petrol,Individual,Manual,Second Owner,18.6 kmpl,1197 CC,81.83 bhp,114.7Nm@ 4000rpm,5.0
997,Hyundai i20 Era Diesel,2009,340000,40000,Diesel,Individual,Manual,First Owner,23.0 kmpl,1396 CC,90 bhp,22.4 kgm at 1750-2750rpm,5.0
998,Hyundai i10 Asta,2012,350000,25000,Petrol,Individual,Manual,First Owner,20.36 kmpl,1197 CC,78.9 bhp,111.8Nm@ 4000rpm,5.0


In [9]:
test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 13 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   name           1000 non-null   object 
 1   year           1000 non-null   int64  
 2   selling_price  1000 non-null   int64  
 3   km_driven      1000 non-null   int64  
 4   fuel           1000 non-null   object 
 5   seller_type    1000 non-null   object 
 6   transmission   1000 non-null   object 
 7   owner          1000 non-null   object 
 8   mileage        981 non-null    object 
 9   engine         981 non-null    object 
 10  max_power      981 non-null    object 
 11  torque         981 non-null    object 
 12  seats          981 non-null    float64
dtypes: float64(1), int64(3), object(9)
memory usage: 101.7+ KB


In [10]:
test.isnull().sum()

name              0
year              0
selling_price     0
km_driven         0
fuel              0
seller_type       0
transmission      0
owner             0
mileage          19
engine           19
max_power        19
torque           19
seats            19
dtype: int64

Создайте объекты Xtrain, ytrain, Xtest, ytest. Запишите в них матрицы «объект — признак» и векторы целевой переменной для тренировочного и тестового датасетов.

### Train

In [11]:
Xtrain = train_new.drop('selling_price', axis=1)
ytrain = train_new['selling_price']
ytrain.columns = train_new['selling_price']

In [12]:
ytrain.astype(int)

0       450000
1       370000
2       225000
4       440000
6        45000
         ...  
6992    260000
6993    320000
6995    382000
6996    290000
6997    290000
Name: selling_price, Length: 6699, dtype: int32

Перед обучением моделей обработайте данные.

Обратите внимание, что столбцы mileage, engine и max_power по смыслу числовые. Чтобы превратить их в числовые столбцы, отбросьте единицы измерения и оставьте только числа. В столбцах есть пропущенные значения, их при обработке трогать не нужно.

In [13]:
# Ваш код здесь
mileage_seri = Xtrain['mileage']
mileage_seri = mileage_seri.astype(str).str.replace('[^0-9]', '', regex=True)
mileage_seri = pd.to_numeric(mileage_seri, errors='coerce')
print(mileage_seri)

0        234.0
1       2114.0
2        230.0
4       2014.0
6        161.0
         ...  
6992     189.0
6993     185.0
6995     193.0
6996    2357.0
6997    2357.0
Name: mileage, Length: 6699, dtype: float64


In [14]:
Xtrain['mileage'] = mileage_seri

In [15]:
engine_seri = Xtrain['engine']
engine_seri = engine_seri.astype(str).str.replace('[^0-9]', '', regex=True)
engine_seri = pd.to_numeric(engine_seri, errors='coerce')
print(engine_seri)

0       1248.0
1       1498.0
2       1396.0
4       1197.0
6        796.0
         ...  
6992     998.0
6993    1197.0
6995    1248.0
6996    1396.0
6997    1396.0
Name: engine, Length: 6699, dtype: float64


In [16]:
Xtrain['engine'] = engine_seri

In [17]:
max_power_seri = Xtrain['max_power']
max_power_seri = max_power_seri.astype(str).str.replace('[^0-9]', '', regex=True)
max_power_seri = pd.to_numeric(max_power_seri, errors='coerce')
print(max_power_seri)

0          74.0
1       10352.0
2          90.0
4        8186.0
6          37.0
         ...   
6992      671.0
6993     8285.0
6995      739.0
6996       70.0
6997       70.0
Name: max_power, Length: 6699, dtype: float64


In [18]:
Xtrain['max_power'] = max_power_seri

In [19]:
Xtrain

Unnamed: 0,name,year,km_driven,fuel,seller_type,transmission,owner,mileage,engine,max_power,torque,seats
0,Maruti Swift Dzire VDI,2014.0,145500.0,Diesel,Individual,Manual,First Owner,234.0,1248.0,74.0,190Nm@ 2000rpm,5.0
1,Skoda Rapid 1.5 TDI Ambition,2014.0,120000.0,Diesel,Individual,Manual,Second Owner,2114.0,1498.0,10352.0,250Nm@ 1500-2500rpm,5.0
2,Hyundai i20 Sportz Diesel,2010.0,127000.0,Diesel,Individual,Manual,First Owner,230.0,1396.0,90.0,22.4 kgm at 1750-2750rpm,5.0
4,Hyundai Xcent 1.2 VTVT E Plus,2017.0,45000.0,Petrol,Individual,Manual,First Owner,2014.0,1197.0,8186.0,113.75nm@ 4000rpm,5.0
6,Maruti 800 DX BSII,2001.0,5000.0,Petrol,Individual,Manual,Second Owner,161.0,796.0,37.0,59Nm@ 2500rpm,4.0
...,...,...,...,...,...,...,...,...,...,...,...,...
6992,Maruti Wagon R VXI BS IV with ABS,2013.0,50000.0,Petrol,Individual,Manual,Second Owner,189.0,998.0,671.0,90Nm@ 3500rpm,5.0
6993,Hyundai i20 Magna,2013.0,110000.0,Petrol,Individual,Manual,First Owner,185.0,1197.0,8285.0,113.7Nm@ 4000rpm,5.0
6995,Maruti Swift Dzire ZDi,2009.0,120000.0,Diesel,Individual,Manual,First Owner,193.0,1248.0,739.0,190Nm@ 2000rpm,5.0
6996,Tata Indigo CR4,2013.0,25000.0,Diesel,Individual,Manual,First Owner,2357.0,1396.0,70.0,140Nm@ 1800-3000rpm,5.0


In [20]:
test_new = test.copy()

In [21]:
Xtest = test_new.drop('selling_price', axis=1)
ytest = test_new['selling_price']
ytest.columns = test_new['selling_price']

In [22]:
ytest.astype(int)

0      229999
1      665000
2      175000
3      635000
4      130000
        ...  
995    250000
996    440000
997    340000
998    350000
999    700000
Name: selling_price, Length: 1000, dtype: int32

In [23]:
ytest.describe()

count    1.000000e+03
mean     6.179010e+05
std      7.585539e+05
min      3.100000e+04
25%      2.500000e+05
50%      4.349990e+05
75%      6.700000e+05
max      6.000000e+06
Name: selling_price, dtype: float64

In [24]:
ytrain.describe()

count    6.699000e+03
mean     6.583887e+05
std      8.203222e+05
min      2.999900e+04
25%      2.750000e+05
50%      4.599990e+05
75%      7.000000e+05
max      1.000000e+07
Name: selling_price, dtype: float64

### Test

In [25]:
mileage_seri_test = Xtest['mileage']
mileage_seri_test = mileage_seri_test.astype(str).str.replace('[^0-9]', '', regex=True)
mileage_seri_test = pd.to_numeric(mileage_seri_test, errors='coerce')
print(mileage_seri_test)

0       140.0
1       215.0
2       129.0
3       251.0
4       165.0
        ...  
995    1981.0
996     186.0
997     230.0
998    2036.0
999     260.0
Name: mileage, Length: 1000, dtype: float64


In [26]:
Xtest['mileage'] = mileage_seri_test

In [27]:
engine_seri_test = Xtest['engine']
engine_seri_test = engine_seri_test.astype(str).str.replace('[^0-9]', '', regex=True)
engine_seri_test = pd.to_numeric(engine_seri_test, errors='coerce')
print(engine_seri_test)

0      2498.0
1      1497.0
2      1799.0
3      1498.0
4      1172.0
        ...  
995    1086.0
996    1197.0
997    1396.0
998    1197.0
999    1498.0
Name: engine, Length: 1000, dtype: float64


In [28]:
Xtest['engine'] = engine_seri_test

In [29]:
max_power_seri_test = Xtest['max_power']
max_power_seri_test = max_power_seri_test.astype(str).str.replace('[^0-9]', '', regex=True)
max_power_seri_test = pd.to_numeric(max_power_seri_test, errors='coerce')
print(max_power_seri_test)

0       112.0
1      1085.0
2       130.0
3       986.0
4        65.0
        ...  
995    6805.0
996    8183.0
997      90.0
998     789.0
999     986.0
Name: max_power, Length: 1000, dtype: float64


In [30]:
Xtest['max_power'] = max_power_seri_test

In [31]:
Xtest

Unnamed: 0,name,year,km_driven,fuel,seller_type,transmission,owner,mileage,engine,max_power,torque,seats
0,Mahindra Xylo E4 BS IV,2010,168000,Diesel,Individual,Manual,First Owner,140.0,2498.0,112.0,260 Nm at 1800-2200 rpm,7.0
1,Tata Nexon 1.5 Revotorq XE,2017,25000,Diesel,Individual,Manual,First Owner,215.0,1497.0,1085.0,260Nm@ 1500-2750rpm,5.0
2,Honda Civic 1.8 S AT,2007,218463,Petrol,Individual,Automatic,First Owner,129.0,1799.0,130.0,172Nm@ 4300rpm,5.0
3,Honda City i DTEC VX,2015,173000,Diesel,Individual,Manual,First Owner,251.0,1498.0,986.0,200Nm@ 1750rpm,5.0
4,Tata Indica Vista Aura 1.2 Safire BSIV,2011,70000,Petrol,Individual,Manual,Second Owner,165.0,1172.0,65.0,96 Nm at 3000 rpm,5.0
...,...,...,...,...,...,...,...,...,...,...,...,...
995,Hyundai i10 Magna 1.1L,2008,100000,Petrol,Individual,Manual,Second Owner,1981.0,1086.0,6805.0,99.04Nm@ 4500rpm,5.0
996,Hyundai i20 2015-2017 Sportz 1.2,2017,50000,Petrol,Individual,Manual,Second Owner,186.0,1197.0,8183.0,114.7Nm@ 4000rpm,5.0
997,Hyundai i20 Era Diesel,2009,40000,Diesel,Individual,Manual,First Owner,230.0,1396.0,90.0,22.4 kgm at 1750-2750rpm,5.0
998,Hyundai i10 Asta,2012,25000,Petrol,Individual,Manual,First Owner,2036.0,1197.0,789.0,111.8Nm@ 4000rpm,5.0


Оставьте в данных только шесть числовых столбцов:

year, km_driven, seats, engine, mileage, max_power

In [32]:
# Ваш код здесь
X_train = Xtrain[['year', 'km_driven', 'seats', 'engine', 'mileage', 'max_power']]

In [33]:
X_train

Unnamed: 0,year,km_driven,seats,engine,mileage,max_power
0,2014.0,145500.0,5.0,1248.0,234.0,74.0
1,2014.0,120000.0,5.0,1498.0,2114.0,10352.0
2,2010.0,127000.0,5.0,1396.0,230.0,90.0
4,2017.0,45000.0,5.0,1197.0,2014.0,8186.0
6,2001.0,5000.0,4.0,796.0,161.0,37.0
...,...,...,...,...,...,...
6992,2013.0,50000.0,5.0,998.0,189.0,671.0
6993,2013.0,110000.0,5.0,1197.0,185.0,8285.0
6995,2009.0,120000.0,5.0,1248.0,193.0,739.0
6996,2013.0,25000.0,5.0,1396.0,2357.0,70.0


In [34]:
X_train.isnull().sum()

year           0
km_driven      0
seats        202
engine       202
mileage      202
max_power    196
dtype: int64

In [35]:
X_test = Xtest[['year', 'km_driven', 'seats', 'engine', 'mileage', 'max_power']]

In [36]:
X_test

Unnamed: 0,year,km_driven,seats,engine,mileage,max_power
0,2010,168000,7.0,2498.0,140.0,112.0
1,2017,25000,5.0,1497.0,215.0,1085.0
2,2007,218463,5.0,1799.0,129.0,130.0
3,2015,173000,5.0,1498.0,251.0,986.0
4,2011,70000,5.0,1172.0,165.0,65.0
...,...,...,...,...,...,...
995,2008,100000,5.0,1086.0,1981.0,6805.0
996,2017,50000,5.0,1197.0,186.0,8183.0
997,2009,40000,5.0,1396.0,230.0,90.0
998,2012,25000,5.0,1197.0,2036.0,789.0


In [37]:
X_test.isnull().sum()

year          0
km_driven     0
seats        19
engine       19
mileage      19
max_power    19
dtype: int64

Теперь заполните пропуски следующим образом:

*    вычислите средние значения по столбцам тренировочной выборки;

*    добавьте в пропуски в тренировочных и тестовых данных вычисленные средние.

In [38]:
X_train[['engine', 'seats', 'mileage', 'max_power']] = X_train[['engine', 'seats', 'mileage', 'max_power']]. fillna(X_train[['engine', 'seats', 'mileage', 'max_power']].mean())

In [39]:
X_train.astype(int)

Unnamed: 0,year,km_driven,seats,engine,mileage,max_power
0,2014,145500,5,1248,234,74
1,2014,120000,5,1498,2114,10352
2,2010,127000,5,1396,230,90
4,2017,45000,5,1197,2014,8186
6,2001,5000,4,796,161,37
...,...,...,...,...,...,...
6992,2013,50000,5,998,189,671
6993,2013,110000,5,1197,185,8285
6995,2009,120000,5,1248,193,739
6996,2013,25000,5,1396,2357,70


In [40]:
X_train.isnull().sum()

year         0
km_driven    0
seats        0
engine       0
mileage      0
max_power    0
dtype: int64

In [41]:
X_train.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
year,6699.0,2014.074937,3.914516,1983.0,2012.0,2015.0,2017.0,2020.0
km_driven,6699.0,68071.735632,57801.387411,1.0,32250.0,60000.0,90000.0,2360457.0
seats,6699.0,5.423118,0.944239,2.0,5.0,5.0,5.0,14.0
engine,6699.0,1455.612744,498.356091,624.0,1197.0,1248.0,1582.0,3604.0
mileage,6699.0,991.148376,917.477937,0.0,189.0,252.0,1967.0,3344.0
max_power,6699.0,2887.864832,5317.066792,0.0,115.0,831.0,5816.0,108495.0


In [42]:
X_test[['engine', 'seats', 'mileage', 'max_power']] = X_test[['engine', 'seats', 'mileage', 'max_power']]. fillna(X_test[['engine', 'seats', 'mileage', 'max_power']].mean())

In [43]:
X_test.astype(int)

Unnamed: 0,year,km_driven,seats,engine,mileage,max_power
0,2010,168000,7,2498,140,112
1,2017,25000,5,1497,215,1085
2,2007,218463,5,1799,129,130
3,2015,173000,5,1498,251,986
4,2011,70000,5,1172,165,65
...,...,...,...,...,...,...
995,2008,100000,5,1086,1981,6805
996,2017,50000,5,1197,186,8183
997,2009,40000,5,1396,230,90
998,2012,25000,5,1197,2036,789


In [44]:
X_test.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
year,1000.0,2013.681,4.012149,1995.0,2011.0,2014.0,2017.0,2020.0
km_driven,1000.0,71393.341,48486.218662,1303.0,37000.0,61500.0,100000.0,375000.0
seats,1000.0,5.410805,0.911195,4.0,5.0,5.0,5.0,9.0
engine,1000.0,1458.882773,521.200362,624.0,1197.0,1248.0,1582.0,3604.0
mileage,1000.0,907.270133,906.549558,0.0,182.0,234.0,1910.75,3226.0
max_power,1000.0,2575.874618,3957.854433,35.0,102.0,739.0,2575.874618,25479.0


In [45]:
X_test.isnull().sum()

year         0
km_driven    0
seats        0
engine       0
mileage      0
max_power    0
dtype: int64

Теперь на обработанных тренировочных данных обучите:

*  линейную регрессию,
*  случайный лес с параметрами по умолчанию.

На обработанных тестовых данных сделайте предсказание и вычислите метрику $R^2$.

In [46]:
# Ваш код здесь
model_lin = LinearRegression()
model_lin.fit(X_train, ytrain)

pred_test = model_lin.predict(X_test)

In [47]:
r2_score(ytest, pred_test)

0.48683362277657005

In [48]:
model_rf = RandomForestRegressor(random_state=42)
model_rf.fit(X_train, ytrain)

pred_test_rf = model_rf.predict(X_test)

In [49]:
r2_score(ytest, pred_test_rf)

0.9441055470301495

Какая модель получилась лучше?

In [50]:
# Ваш ответ здесь
# Модель случайного леса лучше, r2 на тесте равен 0,944

Улучшите качество предсказания случайного леса путём подбора гиперпараметров:

*   n_estimators,
*   max_depth,
*   max_features,
*   min_samples_leaf,
*   min_samples_split.

Для подбора гиперпараметров используйте GridSearchCV. Обучайте GridSearchCV по тренировочным данным с разбивкой на три фолда и метрикой $R^2$.

In [None]:
# Ваш код здесь
params = {'n_estimators' : np.arange(10, 120, 20),
          'max_depth' : np.arange(2, 25, 4),
          'max_features' : ['sqrt', 'log2', None],
          'min_samples_split' : np.arange(2, 12, 2),
          'min_samples_leaf' : [1, 2]}

gs = GridSearchCV(RandomForestRegressor(random_state=42), params, cv=3, scoring='r2')

gs.fit(X_train, ytrain)

In [None]:
gs.best_estimator_

Теперь обучите на тренировочных данных случайный лес с найденными гиперпараметрами. Сделайте предсказание на тестовых данных и оцените его качество ($R^2$).

In [None]:
# Ваш код здесь
model_final = gs.best_estimator_

model_final.fit(X_train, ytrain)

In [None]:
pred_test_final = model_final.predict(X_test)

In [None]:
r2_score(ytest, pred_test_final)

Модель готова. Осталось её проинтерпретировать.

При помощи метода model.feature_importances_ визуализируйте гистограмму важности признаков у случайного леса с настроенными гиперпараметрами.

In [None]:
# Ваш код здесь
features = X_train.columns
importances = model_final.feature_importances_
indices = np.argsort(importances)

plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='b', align='center')
plt.yticks(range(len(indices)), [features[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()