# Threshold Adjustment

👇 Load the player `player_performances.csv` dataset to see what you will be working with.

In [None]:
import pandas as pd

data = pd.read_csv('data/player_performances.csv')

data.head()

ℹ️ Each observation represents a player and each column a characteristic of performance. The target `target_5y` defines whether the player has had a professional career of less than 5 years [0] or 5 years or more [1].

# Preprocessing

👇 To avoid spending too much time on the preprocessing, Robust Scale the entire feature set. This practice is not optimal, but can be used for preliminary preprocessing and/or to get models up and running quickly.

Save the scaled feature set as `X_scaled`.

In [None]:
from sklearn.preprocessing import RobustScaler

# Instanciate Scaler
scaler = RobustScaler()

# Transform features
X_scaled = scaler.fit_transform(data.drop(columns = 'target_5y'))

### ☑️ Check your code

In [None]:
from nbresult import ChallengeResult

result = ChallengeResult('scaled_features',
                         scaled_features = X_scaled
)

result.write()
print(result.check())

# Base modeling

🎯 The task is to detect players who will last 5 years minimum as professionals, with a 90% guarantee.

👇 Is a default Logistic Regression model going to satisfy the coach's requirements? Use cross validation and save the score that supports your answer under variable name `base_score`.

In [None]:
from sklearn.model_selection import cross_validate
from sklearn.linear_model import LogisticRegression

# 10-Fold Cross validate model
log_cv_results = cross_validate(LogisticRegression(max_iter=1000), X_scaled, data['target_5y'], cv=10, 
                            scoring=['precision'])

# Mean Precision score
base_score = log_cv_results['test_precision'].mean()

base_score

### ☑️ Check your code

In [None]:
from nbresult import ChallengeResult

result = ChallengeResult('base_precision',
                         score = base_score
)

result.write()
print(result.check())

# Threshold adjustment

👇 Find the decision threshold that guarantees a 90% precision for a player to last 5 years or more as a professional. Save the threshold under variable name `new_threshold`.

<details>
<summary>💡 Hint</summary>

- Make cross validated probability predictions with [`cross_val_predict`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_predict.html)
    
- Plug the probabilities into [`precision_recall_curve`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_curve.html) to generate precision scores at different thresholds

- Find out which threshold guarantees a precision of 0.9
      
</details>



In [None]:
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import precision_recall_curve

# Predict probabilities
y_pred_probas_0, y_pred_probas_1 = cross_val_predict(LogisticRegression(),
                                                     X_scaled, data['target_5y'],
                                                     method = "predict_proba").T

# Generate precision and thresholds (and recalls) using probabilities for class 1
precision, recall, thresholds = precision_recall_curve(data['target_5y'], y_pred_probas_1)

# Populate dataframe with precision and threshold
df_precision = pd.DataFrame({"precision" : precision[:-1], "threshold" : thresholds})

# Find out which threshold guarantees a precision of 0.9
new_threshold = df_precision[df_precision['precision'] >= 0.9]['threshold'].min()

new_threshold

### ☑️ Check your code

In [None]:
from nbresult import ChallengeResult

result = ChallengeResult('decision_threshold',
                         threshold = new_threshold
)

result.write()
print(result.check())

# Using the new threshold

🎯 The coach has spotted a potentially interesting player, but wants your 90% guarantee that he would last 5 years minimum as a pro. Download the player's data [here](https://wagon-public-datasets.s3.amazonaws.com/Machine%20Learning%20Datasets/ML_New_player.csv).

In [None]:
new_player = pd.read_csv("data/ML_New_player.csv")

new_player

❓ Would you risk recommending the player to the coach? Save your answer as string under variable name `recommendation` as "recommend" or "not recommend".

In [None]:
# Scale the new player's data the same way the feature set was scaled
new_player_scaled = scaler.transform(new_player)


# Instanciate and train model
model = LogisticRegression()
model.fit(X_scaled, data['target_5y'])

# Define custom predict function
def custom_predict(X, custom_threshold):
    probs = model.predict_proba(X) # Get probability of each sample being classified as 0 or 1
    expensive_probs = probs[:, 1] # Only keep probabilities of class [1]
    return (expensive_probs > custom_threshold)
    
    
custom_prediction = custom_predict(X=new_player_scaled, custom_threshold=new_threshold)[0] # Update predictions 
print(custom_prediction)
recommendation = "recommend"

### ☑️ Check your code

In [None]:
from nbresult import ChallengeResult

result = ChallengeResult('recommendation',
                         recommendation = recommendation
)

result.write()
print(result.check())

# 🏁