## Mercedes-Benz Greener Manufacturing
Course-end Project 1

### DESCRIPTION

#### Reduce the time a Mercedes-Benz spends on the test bench.

**Problem Statement Scenario:**

Since the first automobile, the Benz Patent Motor Car in 1886, Mercedes-Benz has stood for important automotive innovations. These include the passenger safety cell with a crumple zone, the airbag, and intelligent assistance systems. Mercedes-Benz applies for nearly 2000 patents per year, making the brand the European leader among premium carmakers. Mercedes-Benz is the leader in the premium car industry. With a huge selection of features and options, customers can choose the customized Mercedes-Benz of their dreams.

To ensure the safety and reliability of every unique car configuration before they hit the road, the company’s engineers have developed a robust testing system. As one of the world’s biggest manufacturers of premium cars, safety and efficiency are paramount on Mercedes-Benz’s production lines. However, optimizing the speed of their testing system for many possible feature combinations is complex and time-consuming without a powerful algorithmic approach.

You are required to reduce the time that cars spend on the test bench. Others will work with a dataset representing different permutations of features in a Mercedes-Benz car to predict the time it takes to pass testing. Optimal algorithms will contribute to faster testing, resulting in lower carbon dioxide emissions without reducing Mercedes-Benz’s standards.

**Following actions should be performed:**

- If for any column(s), the variance is equal to zero, then you need to remove those variable(s).
- Check for null and unique values for test and train sets.
- Apply label encoder.
- Perform dimensionality reduction.
- Predict your test_df values using XGBoost.


In [1]:
#import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

In [2]:
#import dataset
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')

#### Shape of data

In [3]:
#check the shape of data
print(f'Shape of Train data : {train.shape}')
print(f'Shape of Test data : {test.shape}')

Shape of Train data : (4209, 378)
Shape of Test data : (4209, 377)


#### See the first five rows

In [4]:
#See the first five rows of train data
train.head()

Unnamed: 0,ID,y,X0,X1,X2,X3,X4,X5,X6,X8,...,X375,X376,X377,X378,X379,X380,X382,X383,X384,X385
0,0,130.81,k,v,at,a,d,u,j,o,...,0,0,1,0,0,0,0,0,0,0
1,6,88.53,k,t,av,e,d,y,l,o,...,1,0,0,0,0,0,0,0,0,0
2,7,76.26,az,w,n,c,d,x,j,x,...,0,0,0,0,0,0,1,0,0,0
3,9,80.62,az,t,n,f,d,x,l,e,...,0,0,0,0,0,0,0,0,0,0
4,13,78.02,az,v,n,f,d,h,d,n,...,0,0,0,0,0,0,0,0,0,0


In [5]:
#See the first five rows for test data
test.head()

Unnamed: 0,ID,X0,X1,X2,X3,X4,X5,X6,X8,X10,...,X375,X376,X377,X378,X379,X380,X382,X383,X384,X385
0,1,az,v,n,f,d,t,a,w,0,...,0,0,0,1,0,0,0,0,0,0
1,2,t,b,ai,a,d,b,g,y,0,...,0,0,1,0,0,0,0,0,0,0
2,3,az,v,as,f,d,a,j,j,0,...,0,0,0,1,0,0,0,0,0,0
3,4,az,l,n,f,d,z,l,n,0,...,0,0,0,1,0,0,0,0,0,0
4,5,w,s,as,c,d,y,i,m,0,...,1,0,0,0,0,0,0,0,0,0


## If for any column(s), the variance is equal to zero, then you need to remove those variable(s)

### Remove from train dataset

#### Check for train dataset

In [6]:
pd.options.display.float_format = '{:,.4f}'.format #display the number in float format

var_train = train.var() #return the variance
var_train = var_train.reset_index() #set new index
var_train

Unnamed: 0,index,0
0,ID,5941936.1180
1,y,160.7667
2,X10,0.0131
3,X11,0.0000
4,X12,0.0695
...,...,...
365,X380,0.0080
366,X382,0.0075
367,X383,0.0017
368,X384,0.0005


In [7]:
#set the columns
var_train.columns = ['ID', 'Variance']
var_train

Unnamed: 0,ID,Variance
0,ID,5941936.1180
1,y,160.7667
2,X10,0.0131
3,X11,0.0000
4,X12,0.0695
...,...,...
365,X380,0.0080
366,X382,0.0075
367,X383,0.0017
368,X384,0.0005


In [8]:
#sort the dataframe in ascending order according the value of variance
var_train = var_train.sort_values("Variance", ascending=True)
var_train


Unnamed: 0,ID,Variance
275,X289,0.0000
315,X330,0.0000
254,X268,0.0000
332,X347,0.0000
97,X107,0.0000
...,...,...
347,X362,0.2496
322,X337,0.2498
116,X127,0.2500
1,y,160.7667


In [9]:
#drop the columns which has the variance 0 and less than 0.2
var_t = var_train.loc[var_train['Variance']<0.2, 'ID']
train1 = train.drop(var_t, axis=1)

In [10]:
#we also drop the ID as it has high variance
train1.drop('ID', axis=1, inplace=True)
train1.head()

Unnamed: 0,y,X0,X1,X2,X3,X4,X5,X6,X8,X14,...,X329,X334,X337,X350,X351,X355,X358,X362,X375,X377
0,130.81,k,v,at,a,d,u,j,o,0,...,1,1,0,0,0,0,0,0,0,1
1,88.53,k,t,av,e,d,y,l,o,0,...,1,0,1,0,0,0,0,0,1,0
2,76.26,az,w,n,c,d,x,j,x,0,...,0,1,0,1,0,0,1,0,0,0
3,80.62,az,t,n,f,d,x,l,e,0,...,0,0,0,1,0,0,1,0,0,0
4,78.02,az,v,n,f,d,h,d,n,0,...,0,1,0,1,0,0,1,0,0,0


In [11]:
# check all the columns with variance has successfully removed or not
var_t=np.where(train1.var()==0)
var_t


(array([], dtype=int64),)

#### Comment: So, from the above result we see that all columns with variance zero in train1 dataset have been successfully removed

### Remove columns with variance 0 from test data set

In [12]:
var_test = test.var() #return the variance
var_test = var_test.reset_index() #set new index
var_test.columns = ['ID', 'Variance'] #set the columns name
var_test.sort_values('Variance', ascending= True)

Unnamed: 0,ID,Variance
280,X295,0.0000
353,X369,0.0000
281,X296,0.0000
242,X257,0.0000
243,X258,0.0000
...,...,...
321,X337,0.2493
177,X191,0.2495
346,X362,0.2497
115,X127,0.2499


In [13]:
var_t = var_test.loc[var_test['Variance']<0.2, 'ID']
test1 = test.drop(var_t, axis=1)
test1.drop('ID', axis=1, inplace=True)
test1.head()

Unnamed: 0,X0,X1,X2,X3,X4,X5,X6,X8,X14,X27,...,X329,X334,X337,X350,X351,X355,X358,X362,X375,X377
0,az,v,n,f,d,t,a,w,0,1,...,0,1,0,1,0,0,1,0,0,0
1,t,b,ai,a,d,b,g,y,0,1,...,0,1,0,0,0,0,0,1,0,1
2,az,v,as,f,d,a,j,j,1,1,...,1,1,0,1,0,0,0,0,0,0
3,az,l,n,f,d,z,l,n,0,1,...,0,1,0,1,0,0,1,0,0,0
4,w,s,as,c,d,y,i,m,1,1,...,1,0,1,1,1,0,1,0,1,0


In [14]:
# check all the columns with variance has successfully removed or not
var_t=np.where(test1.var()==0)
var_t


(array([], dtype=int64),)

#### Comment: So, from the above result we see that all columns with variance zero in train1 dataset have been successfully removed

## Check for null and unique values for test and train sets

### Null values for train dataset

In [15]:
#Check the null values
np.where(train1.isna().sum())

(array([], dtype=int64),)

#### Comment: So there is no null values inside train data set

### Null values for test dataset

In [16]:
#check the null values
np.where(test1.isna().sum())

(array([], dtype=int64),)

#### Comment: So there is no null values inside test data set

### Unique values for train data set

In [17]:
train1.head()

Unnamed: 0,y,X0,X1,X2,X3,X4,X5,X6,X8,X14,...,X329,X334,X337,X350,X351,X355,X358,X362,X375,X377
0,130.81,k,v,at,a,d,u,j,o,0,...,1,1,0,0,0,0,0,0,0,1
1,88.53,k,t,av,e,d,y,l,o,0,...,1,0,1,0,0,0,0,0,1,0
2,76.26,az,w,n,c,d,x,j,x,0,...,0,1,0,1,0,0,1,0,0,0
3,80.62,az,t,n,f,d,x,l,e,0,...,0,0,0,1,0,0,1,0,0,0
4,78.02,az,v,n,f,d,h,d,n,0,...,0,1,0,1,0,0,1,0,0,0


In [18]:
#unique values
np.unique(train1.loc[:,'X0':'X8'].values)

array(['a', 'aa', 'ab', 'ac', 'ad', 'ae', 'af', 'ag', 'ah', 'ai', 'aj',
       'ak', 'al', 'am', 'an', 'ao', 'ap', 'aq', 'ar', 'as', 'at', 'au',
       'av', 'aw', 'ax', 'ay', 'az', 'b', 'ba', 'bc', 'c', 'd', 'e', 'f',
       'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's',
       't', 'u', 'v', 'w', 'x', 'y', 'z'], dtype=object)

### Unique values for test data set

In [19]:
test1.head()

Unnamed: 0,X0,X1,X2,X3,X4,X5,X6,X8,X14,X27,...,X329,X334,X337,X350,X351,X355,X358,X362,X375,X377
0,az,v,n,f,d,t,a,w,0,1,...,0,1,0,1,0,0,1,0,0,0
1,t,b,ai,a,d,b,g,y,0,1,...,0,1,0,0,0,0,0,1,0,1
2,az,v,as,f,d,a,j,j,1,1,...,1,1,0,1,0,0,0,0,0,0
3,az,l,n,f,d,z,l,n,0,1,...,0,1,0,1,0,0,1,0,0,0
4,w,s,as,c,d,y,i,m,1,1,...,1,0,1,1,1,0,1,0,1,0


In [20]:
#unique values
np.unique(test1.loc[:,'X0':'X8'].values)

array(['a', 'aa', 'ab', 'ac', 'ad', 'ae', 'af', 'ag', 'ah', 'ai', 'aj',
       'ak', 'al', 'am', 'an', 'ao', 'ap', 'aq', 'as', 'at', 'au', 'av',
       'aw', 'ax', 'ay', 'az', 'b', 'ba', 'bb', 'bc', 'c', 'd', 'e', 'f',
       'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's',
       't', 'u', 'v', 'w', 'x', 'y', 'z'], dtype=object)

## Reduce dimension

#### find the correlation between columns

In [21]:
train1.corr()

Unnamed: 0,y,X14,X27,X46,X51,X58,X64,X85,X100,X115,...,X329,X334,X337,X350,X351,X355,X358,X362,X375,X377
y,1.0,0.1936,-0.0535,-0.136,0.23,0.0226,0.0863,0.1105,0.0444,-0.1193,...,0.0515,0.037,-0.041,-0.1031,0.0732,0.1242,-0.021,-0.0396,0.0291,0.0614
X14,0.1936,1.0,0.1269,-0.4712,0.415,-0.1603,0.4344,0.1392,0.3108,0.1354,...,0.142,-0.1373,0.103,0.108,0.3408,0.0569,0.1618,-0.1082,0.119,-0.0975
X27,-0.0535,0.1269,1.0,-0.0791,0.0243,0.0456,0.171,-0.008,0.2352,-0.0457,...,0.2394,0.0072,0.0052,0.0503,0.2112,-0.0597,0.0778,0.0666,0.1391,0.0435
X46,-0.136,-0.4712,-0.0791,1.0,-0.2936,0.3494,-0.1592,0.0235,-0.1039,-0.1855,...,-0.1097,0.1069,-0.078,0.03,-0.173,-0.0851,0.0173,-0.0204,-0.012,0.0653
X51,0.23,0.415,0.0243,-0.2936,1.0,-0.0008,0.2106,0.2038,0.2479,-0.0688,...,0.1066,-0.0003,-0.0171,0.1756,0.159,0.1457,0.1211,-0.1493,0.08,-0.0643
X58,0.0226,-0.1603,0.0456,0.3494,-0.0008,1.0,0.0647,0.141,0.0652,-0.1054,...,0.1179,0.0343,-0.0307,0.0793,0.0628,-0.0777,0.0591,-0.0213,0.1716,0.1088
X64,0.0863,0.4344,0.171,-0.1592,0.2106,0.0647,1.0,0.1942,0.308,0.0207,...,0.0552,-0.1815,0.1948,0.1162,0.4023,-0.0835,0.2399,-0.1558,0.1154,-0.018
X85,0.1105,0.1392,-0.008,0.0235,0.2038,0.141,0.1942,1.0,0.1268,-0.1698,...,0.1387,-0.0812,0.106,0.2849,0.231,-0.5093,0.7628,-0.863,0.0376,-0.1247
X100,0.0444,0.3108,0.2352,-0.1039,0.2479,0.0652,0.308,0.1268,1.0,0.0221,...,0.1913,-0.1533,0.1414,0.0736,0.1637,-0.0857,0.1171,0.0046,0.1221,0.0702
X115,-0.1193,0.1354,-0.0457,-0.1855,-0.0688,-0.1054,0.0207,-0.1698,0.0221,1.0,...,-0.0044,-0.1614,0.1591,-0.0703,0.0826,0.1244,-0.1608,0.1996,0.0674,-0.0367


In [22]:
cor = train1.corr().abs() #find correlation with absolute values
se_cor=cor.unstack() #series wise
df_cor = pd.DataFrame(se_cor) #dataframe
df_cor.reset_index(inplace=True)
df_cor.head()

Unnamed: 0,level_0,level_1,0
0,y,y,1.0
1,y,X14,0.1936
2,y,X27,0.0535
3,y,X46,0.136
4,y,X51,0.23


In [23]:
df_cor['flag'] = np.where(df_cor["level_0"] == df_cor["level_1"], "same", "not same")
df_cor.columns.values[2]="corr"
df_cor.head()

Unnamed: 0,level_0,level_1,corr,flag
0,y,y,1.0,same
1,y,X14,0.1936,not same
2,y,X27,0.0535,not same
3,y,X46,0.136,not same
4,y,X51,0.23,not same


In [24]:
df_cor.sort_values(["flag","corr"],ascending=[1,0])

Unnamed: 0,level_0,level_1,corr,flag
271,X58,X324,1.0000,not same
481,X118,X119,1.0000,not same
527,X119,X118,1.0000,not same
963,X186,X194,1.0000,not same
1101,X194,X186,1.0000,not same
...,...,...,...,...
2016,X355,X355,1.0000,same
2064,X358,X358,1.0000,same
2112,X362,X362,1.0000,same
2160,X375,X375,1.0000,same


In [25]:
#Remove the variables with correlation more than .9

name = df_cor.loc[(df_cor["corr"] > .9) & (df_cor["flag"] != "same") ,"level_1"]
final_name = name.unique()
final_name

array(['X251', 'X137', 'X324', 'X119', 'X311', 'X118', 'X58', 'X157',
       'X156', 'X250', 'X187', 'X194', 'X362', 'X186', 'X358', 'X178',
       'X14', 'X314', 'X261', 'X337', 'X334', 'X246'], dtype=object)

In [26]:
train2 = train1.drop(final_name,axis=1)
train2.head()

Unnamed: 0,y,X0,X1,X2,X3,X4,X5,X6,X8,X27,...,X223,X224,X273,X313,X329,X350,X351,X355,X375,X377
0,130.81,k,v,at,a,d,u,j,o,0,...,0,0,1,0,1,0,0,0,0,1
1,88.53,k,t,av,e,d,y,l,o,1,...,0,0,1,0,1,0,0,0,1,0
2,76.26,az,w,n,c,d,x,j,x,1,...,1,1,1,0,0,1,0,0,0,0
3,80.62,az,t,n,f,d,x,l,e,1,...,1,0,1,0,0,1,0,0,0,0
4,78.02,az,v,n,f,d,h,d,n,1,...,1,0,1,0,0,1,0,0,0,0


## Apply Label encoding

In [27]:
train2.head()

Unnamed: 0,y,X0,X1,X2,X3,X4,X5,X6,X8,X27,...,X223,X224,X273,X313,X329,X350,X351,X355,X375,X377
0,130.81,k,v,at,a,d,u,j,o,0,...,0,0,1,0,1,0,0,0,0,1
1,88.53,k,t,av,e,d,y,l,o,1,...,0,0,1,0,1,0,0,0,1,0
2,76.26,az,w,n,c,d,x,j,x,1,...,1,1,1,0,0,1,0,0,0,0
3,80.62,az,t,n,f,d,x,l,e,1,...,1,0,1,0,0,1,0,0,0,0
4,78.02,az,v,n,f,d,h,d,n,1,...,1,0,1,0,0,1,0,0,0,0


In [28]:
char = train2.select_dtypes(exclude='number')
char

Unnamed: 0,X0,X1,X2,X3,X4,X5,X6,X8
0,k,v,at,a,d,u,j,o
1,k,t,av,e,d,y,l,o
2,az,w,n,c,d,x,j,x
3,az,t,n,f,d,x,l,e
4,az,v,n,f,d,h,d,n
...,...,...,...,...,...,...,...,...
4204,ak,s,as,c,d,aa,d,q
4205,j,o,t,d,d,aa,h,h
4206,ak,v,r,a,d,aa,g,e
4207,al,r,e,f,d,aa,l,u


In [29]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

categ = ['X0','X1','X2','X3','X4','X5','X6','X8']

encode_df = train2[categ]
encode_df = encode_df.astype('str')
encode_df = encode_df.apply(le.fit_transform)
encode_drop = train2.drop(categ, axis = 1)
train3 = pd.concat([encode_drop, encode_df], axis = 1)

In [30]:
train3.head()

Unnamed: 0,y,X27,X46,X51,X64,X85,X100,X115,X127,X132,...,X375,X377,X0,X1,X2,X3,X4,X5,X6,X8
0,130.81,0,1,0,0,1,0,0,0,0,...,0,1,32,23,17,0,3,24,9,14
1,88.53,1,0,1,0,1,1,0,1,1,...,1,0,32,21,19,4,3,28,11,14
2,76.26,1,1,1,0,1,0,0,0,1,...,0,0,20,24,34,2,3,27,9,23
3,80.62,1,1,0,0,0,0,0,0,1,...,0,0,20,21,34,5,3,27,11,4
4,78.02,1,1,1,0,0,0,0,0,1,...,0,0,20,23,34,5,3,12,3,13


In [31]:
#segregating independent and dependent variable in train data set
X = train3.drop("y",axis=1)
y = train3.loc[:,"y"]

In [32]:
# Splitting data into train and test set
from sklearn.model_selection import train_test_split
train_X, test_X, train_y, test_y = train_test_split(X, y,test_size = 0.3, random_state = 123)

In [33]:
# import xgboost
import xgboost as xg
xgb_r = xg.XGBRegressor(objective ='reg:squarederror',n_estimators = 10, seed = 123)

# Fitting the model
xgb_r.fit(train_X, train_y)

# Predict the model
y_pred = xgb_r.predict(test_X)

In [34]:
#Calculation of mape
def mean_absolute_percentage_error(true, pred):
    abs_error = (np.abs(true - pred)) / true
    sum_abs_error = np.sum(abs_error)
    mape_loss = (sum_abs_error / true.size) * 100
    return mape_loss

mean_absolute_percentage_error(test_y, y_pred)

4.911542331245876

#### We have got a mape of 5.21 meaning an accuracy of almost 95%

In [35]:
#MSE and RMSE
from sklearn.metrics import mean_squared_error
#MSE
print(f'MSE : {mean_squared_error(test_y, y_pred)}')

#RMSE
print(f'RMSE : {np.sqrt(mean_squared_error(test_y, y_pred))}')


MSE : 76.95810136913627
RMSE : 8.772576666472416


## Working on given test data to predict Output

In [36]:
test1.head()

Unnamed: 0,X0,X1,X2,X3,X4,X5,X6,X8,X14,X27,...,X329,X334,X337,X350,X351,X355,X358,X362,X375,X377
0,az,v,n,f,d,t,a,w,0,1,...,0,1,0,1,0,0,1,0,0,0
1,t,b,ai,a,d,b,g,y,0,1,...,0,1,0,0,0,0,0,1,0,1
2,az,v,as,f,d,a,j,j,1,1,...,1,1,0,1,0,0,0,0,0,0
3,az,l,n,f,d,z,l,n,0,1,...,0,1,0,1,0,0,1,0,0,0
4,w,s,as,c,d,y,i,m,1,1,...,1,0,1,1,1,0,1,0,1,0


#### Apply lebel encoding

In [37]:
char = test1.select_dtypes(exclude='number')
char

Unnamed: 0,X0,X1,X2,X3,X4,X5,X6,X8
0,az,v,n,f,d,t,a,w
1,t,b,ai,a,d,b,g,y
2,az,v,as,f,d,a,j,j
3,az,l,n,f,d,z,l,n
4,w,s,as,c,d,y,i,m
...,...,...,...,...,...,...,...,...
4204,aj,h,as,f,d,aa,j,e
4205,t,aa,ai,d,d,aa,j,y
4206,y,v,as,f,d,aa,d,w
4207,ak,v,as,a,d,aa,c,q


In [38]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

categ = ['X0','X1','X2','X3','X4','X5','X6','X8']

encode_df = test1[categ]
encode_df = encode_df.astype('str')
encode_df = encode_df.apply(le.fit_transform)
encode_drop = test1.drop(categ, axis = 1)
test2 = pd.concat([encode_drop, encode_df], axis = 1)

In [39]:
test2

Unnamed: 0,X14,X27,X46,X51,X58,X64,X85,X100,X115,X118,...,X375,X377,X0,X1,X2,X3,X4,X5,X6,X8
0,0,1,1,0,0,0,0,0,0,0,...,0,0,21,23,34,5,3,26,0,22
1,0,1,1,1,1,0,0,0,0,1,...,0,1,42,3,8,0,3,9,6,24
2,1,1,0,1,1,0,1,1,0,0,...,0,0,21,23,17,5,3,0,9,9
3,0,1,1,0,0,0,0,0,0,0,...,0,0,21,13,34,5,3,31,11,13
4,1,1,0,1,1,1,1,1,0,1,...,1,0,45,20,17,2,3,30,8,12
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4204,1,1,0,1,1,1,1,1,1,0,...,0,0,6,9,17,5,3,1,9,4
4205,0,1,1,0,1,0,0,1,0,1,...,0,0,42,1,8,3,3,1,9,24
4206,1,1,0,1,0,1,0,1,1,0,...,0,0,47,23,17,5,3,1,3,22
4207,1,1,0,1,1,1,1,1,0,1,...,0,1,7,23,17,0,3,1,2,16


#### Dropping variables from test2 dataset those are not present in training data (train3)

In [40]:
X.columns

Index(['X27', 'X46', 'X51', 'X64', 'X85', 'X100', 'X115', 'X127', 'X132',
       'X163', 'X171', 'X191', 'X218', 'X220', 'X223', 'X224', 'X273', 'X313',
       'X329', 'X350', 'X351', 'X355', 'X375', 'X377', 'X0', 'X1', 'X2', 'X3',
       'X4', 'X5', 'X6', 'X8'],
      dtype='object')

In [42]:
test3= test2[X.columns]

In [43]:
test3.head()

Unnamed: 0,X27,X46,X51,X64,X85,X100,X115,X127,X132,X163,...,X375,X377,X0,X1,X2,X3,X4,X5,X6,X8
0,1,1,0,0,0,0,0,0,1,0,...,0,0,21,23,34,5,3,26,0,22
1,1,1,1,0,0,0,0,1,0,1,...,0,1,42,3,8,0,3,9,6,24
2,1,0,1,0,1,1,0,0,1,0,...,0,0,21,23,17,5,3,0,9,9
3,1,1,0,0,0,0,0,0,1,0,...,0,0,21,13,34,5,3,31,11,13
4,1,0,1,1,1,1,0,0,1,1,...,1,0,45,20,17,2,3,30,8,12


#### Predict the output with the model

In [44]:
predict_data = xgb_r.predict(test3)
predict_data

array([ 74.62732,  92.24867,  72.93798, ...,  89.3035 , 103.32909,
        91.57656], dtype=float32)