##### Let’s calculate some measurements from our basketball dataset this time predicting the players’ salary. How well does our model do?

#### Tasks:

##### Import the mean squared error and R2 libraries.
##### Calculate the MSE, RMSE, Rs2, and MAP measurement by comparing the true values to what the model predicted on the validation set. Name the objects mse_calc, rmse_calc, re_calc and mape_calc respectively.

In [8]:
import numpy as np
import pandas as pd
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
from sklearn.model_selection import train_test_split, cross_validate
from sklearn.preprocessing import OneHotEncoder, StandardScaler, OrdinalEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer, make_column_transformer
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.svm import SVR
from sklearn.metrics import  mean_squared_error, r2_score, root_mean_squared_error, mean_absolute_percentage_error

##### Loading in the data

In [3]:
bball = pd.read_csv('bball.csv')
bball = bball[(bball['draft_year'] != 'Undrafted') & (bball['draft_round'] != 'Undrafted') & (bball['draft_peak'] != 'Undrafted')]

train_df, test_df = train_test_split(bball, test_size=0.2, random_state=1)

X_train_big = train_df.drop(columns=['full_name', 'jersey', 'b_day', 'college','salary'])
y_train_big = train_df['salary']
X_test = test_df.drop(columns=['full_name', 'jersey', 'b_day', 'college', 'salary'])
y_test = test_df['salary']

X_train, X_valid, y_train, y_valid = train_test_split(X_train_big, 
                                                      y_train_big, 
                                                      test_size=0.3, 
                                                      random_state=123)
numeric_features = [
    "height",
    "weight",
    "draft_year",
    "draft_round",
    "draft_peak"]

categorical_features = [
    "team",
    "country",
    "position"]

numeric_transformer = make_pipeline(SimpleImputer(strategy="median"), StandardScaler())

categorical_transformer = make_pipeline(
    SimpleImputer(strategy="most_frequent"),
    OneHotEncoder(handle_unknown="ignore"),
)

preprocessor = make_column_transformer(
    (numeric_transformer, numeric_features), 
    (categorical_transformer, categorical_features)
)

##### Build a pipeline containing the column transformer and an SVC model

In [4]:
pipe_bb = make_pipeline(preprocessor, SVR())

##### Fit your pipeline on the training data

In [5]:
pipe_bb.fit(X_train, y_train)

##### Using your model, find the predicted values of the validation set
##### Save them in an object named predict_valid

In [6]:
predict_valid = pipe_bb.predict(X_valid)

##### Calculate the MSE and save the result in an object named mse_calc

In [7]:
mse_calc = mean_squared_error(y_valid, predict_valid)
print("MSE:", mse_calc)

MSE: 78343233817724.3


##### Calculate the RMSE and save the result in an object named rmse_calc

In [9]:
rmse_calc = root_mean_squared_error(y_valid, predict_valid)
print("RMSE:", rmse_calc)

RMSE: 8851171.324617116


##### Calculate the R^2 and save the result in an object named r2_calc

In [10]:
r2_calc = r2_score(y_valid, predict_valid)
print("R2:", r2_calc)

R2: -0.1452175827767168


##### Calculate the MAPE and save the result in an object named mape_calc

In [11]:
mape_calc = mean_absolute_percentage_error(y_valid, predict_valid)
print("MAPE:", mape_calc)

MAPE: 2.0518666389399667
