# Threshold-Adjustment

In this exercice, you will adjust the **threshold** of a **Logistic Regression** model to enhance its **Precision**.

In [12]:
import pandas as pd

data = pd.read_csv('data.csv')

data.head()

Unnamed: 0,games played,minutes played,points per game,field goals made,field goal attempts,field goal percent,3 point made,3 point attempt,3 point %,free throw made,free throw attempts,free throw %,offensive rebounds,defensive rebounds,rebounds,assists,steals,blocks,turnovers,target_5y
0,36,27.4,7.4,2.6,7.6,34.7,0.5,2.1,25.0,1.6,2.3,69.9,0.7,3.4,4.1,1.9,0.4,0.4,1.3,0
1,35,26.9,7.2,2.0,6.7,29.6,0.7,2.8,23.5,2.6,3.4,76.5,0.5,2.0,2.4,3.7,1.1,0.5,1.6,0
2,74,15.3,5.2,2.0,4.7,42.2,0.4,1.7,24.4,0.9,1.3,67.0,0.5,1.7,2.2,1.0,0.5,0.3,1.0,0
3,58,11.6,5.7,2.3,5.5,42.6,0.1,0.5,22.6,0.9,1.3,68.9,1.0,0.9,1.9,0.8,0.6,0.1,1.0,1
4,48,11.5,4.5,1.6,3.0,52.4,0.0,0.1,0.0,1.3,1.9,67.4,1.0,1.5,2.5,0.3,0.3,0.4,0.8,1


Each observations represents a player and each column a characteristic of performance. The target defines whether the player has had a professional career of less than 5 years [0] or 5 years or more [1].

The task is to build a model capable of being 90% correct when it identifies players who will last 5 or more years as professionals. In Machine Learning terms, the model needs to have a 90% **precision**.

## 1. Prepare the dataset

👇 Drop the rows that contain missing data

In [13]:
# Drop rows with missing value


Unfortunately, some of the libraries you will use in this exercice do not support cross-validation (`plot_precision_recall_curve`). 

👇 Back to the Holdout Method!
- Split data into train and test sets with `train_test_split` (set `random_state=1`)
- Ready X's and y's. Use all features!

In [14]:

# Train/test split

# Ready train data

# Ready test data


👇 Scale X_train and X_test. Make sure scaling is identical in both sets.

In [15]:

# Fit scaler to train data

# Transform train data

# Transform test data


## 2. Visualize the precision recall curve

👇 Using `plot_precision_recall_curve`, plot the precision/recall curve of a Logistic Regression model.

[Sklearn's `plot_precision_recall_curve` documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.plot_precision_recall_curve.html#sklearn.metrics.plot_precision_recall_curve)

<details>
<summary>💡 Hint</summary>

- Train a Logistic Regression model with the training data

- Pass the trained model and the test data to `plot_precision_recall_curve`
      
</details>



In [1]:

# fit the model

# Visualize precision recall curve


Notice that section where **precision** is above 0.9? This is the spot of interest. Let's find out which probability threshold corresponds to that section.

## 3. Adjusting the threshold

👇 Generate the precision scores and their corresponding thresholds. Store the two elements in a dataframe.

<details>
<summary>💡 Hint</summary>

- Use the model trained above to predict the probabilities of the test data
    
- Plug the probabilities into `precision_recall_curve` to generate precisions and thresholds

- Place precisions and thresholds in a dataframe, each as a column
      
</details>



In [2]:

# Predict probabilities

# Generate precision and thresholds (and recalls) using probabilities for class 1

# Remove the last value of precision (1)

# Populate dataframe with precision and threshold


👇 Find out which threshold guarantees a precision of 0.9

In [3]:
# YOUR CODE HERE

## 4. Using the new threshold

In [19]:
new_player = pd.read_csv("new_player.csv")

new_player

Unnamed: 0,games played,minutes played,points per game,field goals made,field goal attempts,field goal percent,3 point made,3 point attempt,3 point %,free throw made,free throw attempts,free throw %,offensive rebounds,defensive rebounds,rebounds,assists,steals,blocks,turnovers
0,76.0,25.4,12.0,4.5,10.2,44.1,0.1,0.4,13.3,3.0,3.9,78.2,1.5,2.0,3.4,1.4,0.9,0.3,1.9


👇 Given the new threshold, can you give a 90% guarantee that the following player will last at least 5 years as a pro? Compute an answer.

In [4]:
# Scale using original scaler

# Predict probabilities of new player belonging to each class


# 🏁