## NBA Player Performance and Salary Efficiency Analysis


## Overview
When asked whether data science could help improve the roster this season, I approached the problem by separating performance from salary and looking for inefficiencies. Instead of starting with contracts, I started with production. The goal was simple: group players based only on how they perform on the court, and then compare their salaries within those performance tiers. That way, we can clearly see who is being paid fairly, who might be underpaid and worth targeting, and who may be overpaid relative to similar players.

To build this analysis, I merged two datasets: one containing player salaries and another containing performance statistics. After combining them, I ensured that each player appeared only once and that the data was formatted correctly for analysis. Clean data is essential because clustering algorithms are sensitive to duplicates and inconsistent formatting.

Next, I selected performance variables that reflect overall basketball impact. I intentionally included total points, assists, rebounds, steals, blocks, minutes played, effective field goal percentage, and free throw percentage. Points and assists measure offensive production and playmaking. Rebounds, steals, and blocks capture defensive and physical contribution. Minutes played reflects role and usage. Shooting efficiency helps distinguish productive scorers from inefficient ones. Together, these metrics provide a balanced view of player contribution.

Importantly, salary was not included when forming performance groups. If salary were included, the model would partially group players by contract size, which would defeat the purpose of identifying inefficiencies. Performance had to be isolated first.

To form performance tiers, I used a method called clustering. In simple terms, clustering automatically groups players who have similar statistical profiles. Instead of manually labeling players as “stars” or “bench players,” the algorithm looks at patterns in the data and sorts similar players together.

I tested multiple grouping sizes and selected a five-cluster solution because it produced clearer separation and captured about 60% of the meaningful variation in player performance across the league. In practical terms, this means the model grouped players into meaningful tiers that align with basketball intuition: elite producers, high-level starters, strong contributors, rotation players, and lower-usage bench players.

## What each Cluster Represents

The five clusters reflect distinct performance levels.

Cluster 2 represents the highest production tier. Players in this group average approximately 895 points, 229 assists, and 255 rebounds while playing over 1,500 minutes. The average salary in this cluster is about $23.47 million. These are high-impact, high-usage players, superstar level.

Cluster 0 represents strong, well-rounded contributors. These players average about 540 points and 325 rebounds with solid minutes. Their average salary is roughly $13.8 million. These are high-level starters or two-way contributors.

Cluster 1 represents solid starters and reliable rotation players. This group averages around 444 points and 94 assists with approximately 1,000 minutes played. Their average salary is about $12 million.

Cluster 3 represents lower production players who, surprisingly, earn an average of about $7.39 million despite limited output (approximately 127 points and 326 minutes played).

Cluster 4 represents low-usage bench contributors with an average salary of about $4.85 million.

This tier structure allows for fair comparisons since a star player is only compared to other star players, and a bench player only to other bench players.

## Visualization 1: Points vs Revounds by Performance Tier

![Points vs Rebounds](points_vs_rebounds.png)

*Figure 1: Players separate into clear performance tiers. Higher clusters show stronger overall production.*

## Visualization 1 Analysis: Points vs. Rebounds

This graph shows how players separate naturally based on production. The upper-right region contains Cluster 2 players which are high-scoring, high-impact contributors who play a lot of minutes and produce across multiple statistical categories. This is the performance tier with the highest average production and the highest average salary ($23.47M).

Clusters 0 and 1 occupy the middle band of the graph. These players contribute meaningfully but are not their teams superstar. Their production levels are consistent with strong starters and rotation players.

Clusters 3 and 4 sit in the lower-left region, reflecting limited production and fewer minutes. These players typically have reduced usage or smaller roles.

As such we want to focus on the players on cluster 0 who are near the level of POINTS of a superstar level (cluster 2) and have a high Total Rebound as well, all of this can be seen on the graph. 

## Visualization 2: Salary vs. Rebounds by Performance Tier

![Points vs Rebounds](salary_vs_rebounds.png)

*Figure 2: This graph overlays salary onto performance tiers, highlighting contract inefficiencies within each production group.*

## Visualization 2 Analysis: Salary vs. Rebounds

This graph overlays salary onto the performance tiers.

Within Cluster 2, we see massive salary variation. Some players in this group earn over $20M, while others earn under 3M. That gap represents inefficiency.

For example:

Toumani Camara, Russell Westbrook, and Collin Gillespie appear in the same performance tier as $23M players but earn only about 2.2M each. This is clearly visible in the lower portion of the Cluster 2 salary band.

Conversely, Bradley Beal appears in a lower production cluster but sits dramatically higher in salary than his peers in that group. He earns $59M while the cluster average is roughly 7.39M. The visual shows him far above others in his tier.

## Methodology for Player Selection

After the performance clusters were created and visualized, the next step was determining how to select players strategically. The selection process followed a two-step process. 

First, players were grouped strictly based on performance statistics. Salary was intentionally excluded during this stage so that financial information would not influence how players were categorized. This ensured that the clusters reflected only on-court production. Players in the same cluster therefore have statistically similar production profiles.

Second, salary was reintroduced and compared within each performance cluster. For every cluster, I calculated the average salary. Then, for each player, I measured the difference between their salary and the average salary of their cluster. This created a metric called “Salary Difference.”

**Salary Difference = Player Salary − Cluster Average Salary**

A negative salary difference indicates that a player earns less than others who perform at a similar level. A positive salary difference indicates that a player earns more than peers in the same production tier.

Players were selected using the following criteria:

Primary Targets: Players in high-performance clusters (Clusters 0, 1, or 2) with large negative salary differences.

Backup Options: Players in mid-tier clusters with moderate production but significant negative salary differences.

Contracts to Avoid: Players in lower-production clusters with large positive salary differences.

Importantly, players were not selected because of one individual statistic such as rebounding or scoring. While visualizations use rebounds to help illustrate performance separation, the clustering algorithm considered all selected performance metrics simultaneously. A player’s selection was based on overall production relative to cost, not on a single category.

## Results

### High-Value Targets

Within Cluster 2, where the average salary is approximately $23.47 million, three players stand out:

**Toumani Camara** earns $2.22 million while producing 734 points, 300 rebounds, and 144 assists in 56 games. He is underpaid by roughly 21.2 million relative to his performance tier.

**Russell Westbrook** earns $2.29 million while producing 796 points, 342 assists, and 296 rebounds. He is underpaid by about 21.1 million.

**Collin Gillespie** earns $2.29 million while contributing 718 points and 253 assists, also underpaid by approximately 21.1 million.

These players were identified because their overall statistical profiles place them in a high-production tier, yet their salaries are dramatically below that tier’s average. This represents significant surplus value.

Strong backup options in Cluster 1 include Maxime Raynaud, Jaylen Wells, and Quinten Post, each earning roughly $1–2 million while producing at levels consistent with 12 million players. These players provide great basketball performance at low cost.

### Contracts to Avoid

Cluster 3 reveals several inefficient contracts. The average salary in this tier is about $7.39 million, yet:

**Bradley Beal** earns $59.0 million while producing at a level comparable to others in this lower tier, an overpayment of roughly 51.6 million relative to peers.

**Trae Young** earns $45.9 million, overpaid by about 38.6 million.

**Ja Morant, Jalen Green, and Jordan Poole** show similar mismatches between salary and current production.

While injuries and context must be considered, these contracts do not align with the performance tier identified by the model.

## Final Recommendation

### Primary Targets (High Performance, Deeply Underpaid)

The following three players are officially recommended as primary acquisition targets because they belong to Cluster 2, the highest production tier in the model. The average salary in this cluster is approximately $23.47 million, yet each of these players earns just over 2 million.

**Toumani Camara**

- Age: 25

- Position: PF

- Salary: $2.22M

- 56 games played (all starts)

- 1,873 minutes

- 734 points

- 300 rebounds

- 144 assists

- 56 steals

- 25 blocks

Camara is producing at the level of a high-impact starter. His 734 points and 300 rebounds place him firmly in the elite production tier. He plays heavy minutes and contributes across scoring, rebounding, and defense. Despite this, he earns roughly $21.2 million less than the average salary of his performance tier. That gap represents massive surplus value.

**Russell Westbrook**

- Age: 37

- Position: SF

- Salary: $2.29M

- 52 games (46 starts)

- 1,521 minutes

- 796 points

- 342 assists

- 296 rebounds

- 68 steals

Westbrook’s production profile clearly places him in Cluster 2. He leads this group in assists and contributes nearly 800 points while playing over 1,500 minutes. Regardless of reputation, the data shows that he is producing at a level consistent with $23M players while earning just 2.29M — a 21.1M value gap.

**Collin Gillespie**

- Age: 26

- Position: PG

- Salary: $2.29M

- 54 games (35 starts)

- 1,527 minutes

- 718 points

- 253 assists

- 225 rebounds

- 75 steals

Gillespie provides balanced scoring, playmaking, and efficiency. His shooting percentages and assist numbers support his placement in the high-production tier. Yet he is compensated at just over $2M — again, approximately 21.1M below his cluster’s average salary.

As such these three players should be considered to be contracted bu our team. 

### Strong Backup Options

Cluster 1 represents solid contributors with an average salary of approximately $12.05M. The following players produce at that level while earning only around 1–2M.

**Maxime Raynaud**

- Salary: $1.27M

- 48 games (30 starts)

- 1,127 minutes

- 472 points

- 318 rebounds

- 23 blocks

Raynaud provides strong rebounding and interior presence. His 318 rebounds and defensive contributions place him firmly in a mid-tier production group, yet he earns roughly $10.7M less than his cluster average.

**Jaylen Wells**

- Salary: $1.96M

- 52 games (all starts)

- 1,394 minutes

- 629 points

- 178 rebounds

- 86 assists

Wells provides steady production and heavy minutes for a minimal contract cost. His salary sits approximately $10M below the cluster average, making him an efficient rotation option.

**Quinten Post**

- Salary: $1.96M

- 55 games (32 starts)

- 936 minutes

- 422 points

- 209 rebounds

- 26 blocks

Post contributes scoring and interior defense at a cost dramatically below his tier’s average salary.

These players provide strong cost-efficient depth and would strengthen roster stability without financial strain.


## Final Remarks

This analysis demonstrates that performance tiers and salary tiers are not always aligned across the league. By first grouping players strictly based on production and then comparing salaries within those groups, we created a fair and structured framework for identifying financial inefficiencies. This approach prevents misleading comparisons between players with different roles and ensures that value is measured relative to similar contributors.

The findings clearly show that Toumani Camara, Russell Westbrook, and Collin Gillespie are producing at levels consistent with high-tier contributors while earning a fraction of the average salary for that tier. These players represent significant surplus value and should be prioritized as acquisition targets. If those players are unavailable, Maxime Raynaud, Jaylen Wells, and Quinten Post provide strong cost-efficient alternatives that would improve roster depth without limiting financial flexibility.

Conversely, contracts such as Bradley Beal, Trae Young, Ja Morant, Jalen Green, and Jordan Poole represent substantial salary-to-production mismatches within this dataset. While individual circumstances such as injuries must always be considered, the current performance data suggests these contracts would reduce flexibility without providing proportional impact.
