# Threshold Adjustment

👇 Load the player `player_performances.csv` dataset to see what you will be working with.

In [1]:
import pandas as pd

data = pd.read_csv('data/player_performances.csv')

data.head()

Unnamed: 0,games played,minutes played,points per game,field goals made,field goal attempts,field goal percent,3 point made,3 point attempt,3 point %,free throw made,free throw attempts,free throw %,offensive rebounds,defensive rebounds,rebounds,assists,steals,blocks,turnovers,target_5y
0,36,27.4,7.4,2.6,7.6,34.7,0.5,2.1,25.0,1.6,2.3,69.9,0.7,3.4,4.1,1.9,0.4,0.4,1.3,0
1,35,26.9,7.2,2.0,6.7,29.6,0.7,2.8,23.5,2.6,3.4,76.5,0.5,2.0,2.4,3.7,1.1,0.5,1.6,0
2,74,15.3,5.2,2.0,4.7,42.2,0.4,1.7,24.4,0.9,1.3,67.0,0.5,1.7,2.2,1.0,0.5,0.3,1.0,0
3,58,11.6,5.7,2.3,5.5,42.6,0.1,0.5,22.6,0.9,1.3,68.9,1.0,0.9,1.9,0.8,0.6,0.1,1.0,1
4,48,11.5,4.5,1.6,3.0,52.4,0.0,0.1,0.0,1.3,1.9,67.4,1.0,1.5,2.5,0.3,0.3,0.4,0.8,1


ℹ️ Each observation represents a player and each column a characteristic of performance. The target `target_5y` defines whether the player has had a professional career of less than 5 years [0] or 5 years or more [1].

# Preprocessing

👇 To avoid spending too much time on the preprocessing, Robust Scale the entire feature set. This practice is not optimal, but can be used for preliminary preprocessing and/or to get models up and running quickly.

Save the scaled feature set as `X_scaled`.

In [2]:
from sklearn.preprocessing import RobustScaler

X = data.drop(columns='target_5y')
y = data['target_5y']

scaler = RobustScaler()

X_scaled = scaler.fit_transform(X)

X_scaled[:5]

array([[-0.9       ,  0.9338843 ,  0.35294118,  0.25      ,  0.66666667,
        -1.20655738,  1.        ,  1.5       ,  0.08307692,  0.58536585,
         0.57142857, -0.109375  , -0.1       ,  1.0625    ,  0.65979381,
         0.57142857, -0.2       ,  0.5       ,  0.375     ],
       [-0.93333333,  0.89256198,  0.31372549, -0.05      ,  0.45238095,
        -1.87540984,  1.5       ,  2.08333333,  0.03692308,  1.56097561,
         1.35714286,  0.40625   , -0.3       ,  0.1875    , -0.04123711,
         1.85714286,  1.2       ,  0.75      ,  0.75      ],
       [ 0.36666667, -0.0661157 , -0.07843137, -0.05      , -0.02380952,
        -0.22295082,  0.75      ,  1.16666667,  0.06461538, -0.09756098,
        -0.14285714, -0.3359375 , -0.3       ,  0.        , -0.12371134,
        -0.07142857,  0.        ,  0.25      ,  0.        ],
       [-0.16666667, -0.37190083,  0.01960784,  0.1       ,  0.16666667,
        -0.1704918 ,  0.        ,  0.16666667,  0.00923077, -0.09756098,
        -0.142

### ☑️ Check your code

In [3]:
from nbresult import ChallengeResult

result = ChallengeResult('scaled_features',
                         scaled_features = X_scaled
)

result.write()
print(result.check())


platform linux -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0 -- /home/aheggs/.pyenv/versions/3.10.6/envs/lewagon/bin/python
cachedir: .pytest_cache
rootdir: /home/aheggs/code/andyheggs/05-ML/03-Performance-metrics/data-threshold-adjustments/tests
plugins: anyio-3.6.2, asyncio-0.19.0, typeguard-2.13.3
asyncio: mode=strict
[1mcollecting ... [0mcollected 1 item

test_scaled_features.py::TestScaled_features::test_scaled_features [32mPASSED[0m[32m [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/scaled_features.pickle

[32mgit[39m commit -m [33m'Completed scaled_features step'[39m

[32mgit[39m push origin master



In [4]:
!git add tests/scaled_features.pickle

!git commit -m 'Completed scaled_features step'

!git push origin master

[master 1259b79] Completed scaled_features step
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 tests/scaled_features.pickle
Enumerating objects: 17, done.
Counting objects: 100% (17/17), done.
Delta compression using up to 8 threads
Compressing objects: 100% (15/15), done.
Writing objects: 100% (17/17), 44.43 KiB | 1.35 MiB/s, done.
Total 17 (delta 3), reused 0 (delta 0), pack-reused 0 (from 0)
remote: Resolving deltas: 100% (3/3), done.[K
To github.com:andyheggs/data-threshold-adjustments.git
 * [new branch]      master -> master


# Base modeling

🎯 The task is to detect players who will last 5 years minimum as professionals, with a 90% guarantee.

👇 Is a default Logistic Regression model going to satisfy the coach's requirements? Use cross-validation and save the score that supports your answer under variable name `base_score`.

In [5]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

logreg = LogisticRegression()

precision_scores = cross_val_score(logreg, X_scaled, y, cv=5, scoring='precision')

base_score = precision_scores.mean()

base_score

0.7379036747632812

### ☑️ Check your code

In [8]:
from nbresult import ChallengeResult

result = ChallengeResult('base_precision',
                         score = base_score
)

result.write()
print(result.check())


platform linux -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0 -- /home/aheggs/.pyenv/versions/3.10.6/envs/lewagon/bin/python
cachedir: .pytest_cache
rootdir: /home/aheggs/code/andyheggs/05-ML/03-Performance-metrics/data-threshold-adjustments/tests
plugins: anyio-3.6.2, asyncio-0.19.0, typeguard-2.13.3
asyncio: mode=strict
[1mcollecting ... [0mcollected 1 item

test_base_precision.py::TestBase_precision::test_precision_score [32mPASSED[0m[32m  [100%][0m



💯 You can commit your code:

[1;32mgit[39m add tests/base_precision.pickle

[32mgit[39m commit -m [33m'Completed base_precision step'[39m

[32mgit[39m push origin master



In [9]:
!git add tests/scaled_features.pickle

!git commit -m 'Completed scaled_features step'

!git push origin master

On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   Threshold-Adjustments.ipynb[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31mtests/base_precision.pickle[m

no changes added to commit (use "git add" and/or "git commit -a")
Everything up-to-date


# Threshold adjustment

👇 Find the decision threshold that guarantees a 90% precision for a player to last 5 years or more as a professional. Save the threshold under variable name `new_threshold`.

<details>
<summary>💡 Hint</summary>

- Make cross validated probability predictions with [`cross_val_predict`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_predict.html)
    
- Plug the probabilities into [`precision_recall_curve`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_curve.html) to generate precision scores at different thresholds

- Find out which threshold guarantees a precision of 0.9
      
</details>



In [None]:
# YOUR CODE HERE

### ☑️ Check your code

In [None]:
from nbresult import ChallengeResult

result = ChallengeResult('decision_threshold',
                         threshold = new_threshold
)

result.write()
print(result.check())

# Using the new threshold

🎯 The coach has spotted a potentially interesting player, but wants your 90% guarantee that he would last 5 years minimum as a pro. Download the player's data [here](https://wagon-public-datasets.s3.amazonaws.com/Machine%20Learning%20Datasets/ML_New_player.csv).

In [None]:
new_player = pd.read_csv("data/ML_New_player.csv")

new_player

❓ Would you risk recommending the player to the coach? Save your answer as string under variable name `recommendation` as "recommend" or "not recommend".

In [None]:
# YOUR CODE HERE

### ☑️ Check your code

In [None]:
from nbresult import ChallengeResult

result = ChallengeResult('recommendation',
                         recommendation = recommendation
)

result.write()
print(result.check())

# 🏁