Let’s calculate some measurements from our basketball dataset.

Tasks:

Import the precision, recall, f1 and classification report libraries.

Predict the values on X_valid using the pipe_bb and the .predict() function and save the result in an object named predicted_y.

Using sklearn tools, calculate precision, recall and f1 scores and save them in the respective names precision, recall, and f1. Make sure you are comparing the true y_valid labels to the predicted labels. You will need to assign a positive label to the “Forward”(F) position. This can be specified in the pos_label of each function. Round each calculation to 3 decimal places.

Print a classification report of all the measurements comparing y_valid and predicted_y and assigning the target_names argument to ["F", "G"]. You can use the digits function to round all the calculations to 3 decimal places.

Import the precision, recall, f1 and classification report libraries.

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, cross_validate
from sklearn.preprocessing import OneHotEncoder, StandardScaler, OrdinalEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer, make_column_transformer
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.svm import SVC
from sklearn.metrics import precision_score, recall_score, f1_score, classification_report


##### Loading in the data

In [2]:
bball = pd.read_csv('bball_cm.csv')

train_df, test_df = train_test_split(bball, test_size=0.2, random_state=1)

X_train_big = train_df.drop(columns=['full_name', 'jersey',
                                     'b_day', 'college', 'position'])
y_train_big = train_df['position']
X_test = test_df.drop(columns=['full_name', 'jersey',
                               'b_day', 'college', 'position'])
y_test = test_df['position']

X_train, X_valid, y_train, y_valid = train_test_split(X_train_big, 
                                                      y_train_big, 
                                                      test_size=0.3, 
                                                      random_state=123)

numeric_features = [
    "rating",
    "height",
    "weight",
    "salary",
    "draft_year",
    "draft_round",
    "draft_peak"]

categorical_features = ["team", "country"]

numeric_transformer = make_pipeline(SimpleImputer(strategy="median"), StandardScaler())

categorical_transformer = make_pipeline(
    SimpleImputer(strategy="most_frequent"),
    OneHotEncoder(handle_unknown="ignore"))

preprocessor = make_column_transformer(
    (numeric_transformer, numeric_features), 
    (categorical_transformer, categorical_features))

##### Build a pipeline containing the column transformer and an SVC model

In [3]:
pipe_bb = make_pipeline(preprocessor, SVC())

##### Fit your pipeline on the training data

In [4]:
pipe_bb.fit(X_train, y_train)

##### Predict your values on the test set
##### Save them in an object named predicted_y

Predict the values on X_valid using the pipe_bb and the .predict() function and save the result in an object named predicted_y.

In [5]:
predicted_y = pipe_bb.predict(X_valid)

Using sklearn tools, calculate precision, recall and f1 scores and save them in the respective names precision, recall, and f1. Make sure you are comparing the true y_valid labels to the predicted labels. You will need to assign a positive label to the “Forward”(F) position. This can be specified in the pos_label of each function. Round each calculation to 3 decimal places.

##### Using sklearn tools, calculate precision
##### Save it in an object named precision

In [13]:
precision = precision_score(y_valid, predicted_y, pos_label='F')
print("precision: ", round(precision, 3))

precision:  0.92


##### Using sklearn tools, calculate recall
##### Save it in an object named recall

In [12]:
recall = recall_score(y_valid, predicted_y, pos_label='F')
print("recall: ", round(recall, 3))

recall:  0.742


##### Using sklearn tools, calculate f1
##### Save it in an object named f1

In [11]:
f1 = f1_score(y_valid, predicted_y, pos_label='F')
print("f1:", round(f1, 3))

f1: 0.821


Print a classification report of all the measurements comparing y_valid and predicted_y and assigning the target_names argument to ["F", "G"]. You can use the digits function to round all the calculations to 3 decimal places.

##### Using sklearn tools, print a classification_report

In [9]:
print(classification_report(y_valid, predicted_y, target_names=['F', 'G'], digits=3))

              precision    recall  f1-score   support

           F      0.920     0.742     0.821        31
           G      0.784     0.935     0.853        31

    accuracy                          0.839        62
   macro avg      0.852     0.839     0.837        62
weighted avg      0.852     0.839     0.837        62



Question

Do the numbers in your classification report match the calculations you did using sklearn measurements?

(A)Sure did!

(B)No

Answer: (A) Sure did!