# Introduction | Analyzing the NBA Combine Data for Dominant Draft Factors

My motivation for this analysis is to understand what factor contributes the most for a player to be drafted into the NBA through the NBA Combine Data. I am interested in this analysis because I have a friend that may potentially go into the draft and I want to increase his chances by this research. It's also interesting from a human-centered perspective because it may shed light on possible discrimination for players that have the potential on paper, but do not get drafted because of superficial factors. The most prominent example I can give is Jeremy Lin who was not drafted when he was an NBA-caliber player coming out of Harvard on paper. Something that I hope to learn is what general managers and coaches value in a player, whether it's beyond the physical. I want talented players who deserve a fair shot at the league to shine like the rest, perhaps this research could motivate the league to look beyond how they assess future success. Much of the promises in new drafts don't end up to be the next Michael Jordan or Lebron James. There could be a more accurate way to determine future success in these players.

#

# Background & Related Works

According to studies done around the NBA draft, an average player has a 1 in 3333 chance of making it to the NBA or a 0.03% chance. This is because basketball is constantly evolving and the next generation has more access to better training as well as coaching. Around 16,000 high schoolers will go on to play basketball at the collegiate level and only 110 will go on to play at least one game in the NBA. Getting into the NBA is no simply walk in the park. Considering that only 60 players are drafted into the NBA, two players for each team, determining the most important factor for success is crucial. One of the most obvious finding tends to be height and weight. The average height in the NBA is around 6ft 6in and a weight of around 220lbs. Now, depending on the position you play, this will drastically vary. However, no matter what position you play, you're going to have to play against these players, which is why it is logical to believe that physical stature is a good assessor. https://dunkorthree.com/odds-of-making-it-to-nba/

In more recent studies, in 2020, the NCAA researched the odds of an NCAA athlete getting drafted into the NBA as more slots have opened up for teams and salary caps have increased. The researched showed that there is a 1.2% chance for an NCAA athlete to be drafted into the NBA. https://www.ncaa.org/sports/2015/3/6/estimated-probability-of-competing-in-professional-athletics.aspx

#

# Research Question
#### What is the greatest NBA Combine statistic that will get a player drafted into the NBA?

#

# Methodology

The data that was used to support this research is from Data World by Andrew Chou. https://data.world/achou/nba-draft-combine-measurements The data was scraped from Draft Express and some information from the players at times may be missing, but for the most part it is complete.

The specific data I was interested in was the combined CSV file over all the years that was available through scraping which was from 2012 - 2016. Once I downloaded the CSV file, I realized I needed to organize the data by draft pick order over randomized player names.

Using the python library called pandas, I reorganized the CSV file to be sorted by draft picks beginning from first draft picks at the top and the last or 60th draft picks at the bottom. Apart from the sorting, most of the data was conveniently sorted and ready for visualization. The only problem that I found was that the players had too many repeated statistics that seemed trivial such as height with shoes and height without shoes. Thus, I went in and deleted columns that were essentially duplicates to save space for the visualization. 

Once the CSV file was ready to be manipulated into visualizations, I averaged the data of all top 5 draft picks of all years, average of all draft picks, and the average of the bottom 5 draft picks. This way I am able to see the discrepancies between the farthest endpoints and the overall.

#



#### The code below sorts the raw CSV file by draft picks with the first draft pick at the top and descending downwards. 

In [16]:
# importing pandas package
import pandas as pd
import csv
  
# assign dataset
data = pd.read_csv("NBA_Combine_All.csv")                                       

# sort data frame
sorted_data = data.sort_values(by=["Draft pick"], ascending=True)
  
# displaying sorted data frame
sorted_data.to_csv('Sorted_NBA.csv', index=False)

#



#### Next, I deleted trivial columns which contained duplicate information. 

In [17]:
# import csv
import csv
  
# open input CSV file as source
# open output CSV file as result
with open("Sorted_NBA.csv", "r") as source:
    reader = csv.reader(source)
      
    with open("Cleaned_NBA.csv", "w") as result:
        writer = csv.writer(result)
        for r in reader:
            
            # Use CSV Index to remove a column from CSV
            writer.writerow((r[1], r[3], r[4], r[6], r[7], r[8], r[12], r[13],r[15], r[16], r[17], r[18]))

#



After the CSV file had been cleaned, I was able to take the CSV file and import it to Google Sheets for data visualization. However, before I began applying visualizations, I averaged the top 5, overall, and bottom five draft pick datasets for a cleaner visualization. This was done in Google Sheets due to the lack of coding knowledge this required in Python and limited time. https://docs.google.com/spreadsheets/d/1_qSPzQ68UbEgjSfElEJn3J2S8JFcuqXMt7IDZUpFdc0/edit?usp=sharing


Once I did finished that, I created three graphs out of the averages and analyzed the discrepancies.

#


#### The graph below visualizes the categories of the three averages that I took. You'll notice that the graph could use normalization to view the smaller discrepanices a bit better, so I used the log() funtion to do so in the next graph.

![First_Chart.png](attachment:56ef85b0-c874-48cd-a203-7039f4ac6bc1.png)

#



#### Below is the updated visualization after the log() function which essentially normalizes the graph for a more baseline viewing experience.

![Final_Chart.png](attachment:b03dc883-c64b-4863-8ec9-bf0289186740.png)

#



# Findings

When I came to the final visualization above, I could notice that there were really only a few categories that had significany discrepanicies. The biggest one of all being weight. The Top 5 average difference to the Bottom 5 average was 224.4lbs - 214.8lbs = 9.6lbs. In a broad view, it makes sense that the top 5 would prefer heavier set players as basketball is a very physically dominant sport. Nearing on the similar topic, the next highest discrepancy ended up being height and wingspan, which typically go hand-in-hand as wingspans are meant to represent a person's overall height. However, sometimes players' hand width can be bigger in proportion which gives them often a longer wingspan compared to their overall height. This trend in the data also makes sense as height is favored heavily in basketball as blocking, scoring, stealing, and rebounds are amplified with these attributes.

Conclusively, there is no one clear determiner of factor that would get a player into the NBA, but with this data it seems to align with intuition that height, weight, and wingspan would be highly favored.

#



# Discussion (Including Limitations & Implications)

I realize that there are several limitations to my findings and visualizations. First, there are clear advantages and favors based on certain positions. For example, a point guard would most likely not want to be 275lbs and 7ft tall and converesly a center being 6ft at 180lbs. Unfortunately, the data the was provided for me did not include the positions of each player at the time of their draft, so I would have had to manual fill those in or data scrape which would have taken more time than I had. Secondly, the NBA Combine as a whole does not determine whether a player gets drafted or not, I understand that it is more of a benchmark to see where players are at physically. Thirdly, I did not include college or other basketball history statistics of any of the players to account for my data. This data is probably the most important factor in a player getting drafted to the NBA, because coaches and general managers need to see beyond paper statistics to see if one can really play basketball. Typically, a player will shine if they have the highest average points per game (PPG) or blocks if they are a more center position. However, I did not include this data, because I wanted to see if I could find a trend in just physical stature alone. There is value in the NBA Combine, but I wanted to assess how much and if there was a key factor that could lead a player to the NBA even if they did not necessarily have the greatest skill.

Nonetheless, I did find a greater trend than I was expecting, but my intuition was correct in that height and weight would play the biggest factor at least within the NBA Combine. 

#



# Conclusion

Although I did not get to divide the statistic by player positions, which I think would have had a tremendous impact on the visualization, I still believe that the trend I described in my limitations would have been correct. It is likely the height and weight would play a greater factor for centers and power forwards, whereas PPG and other in-game statistics would shine brighter for scoring postions.

Overall, I think my friend would be pleased to hear that what I found would favor him getting into the NBA as he does play the center position at a height of 7ft 4inches and 275lbs. Height is not something that can be controlled going forward, but weight can and through training I think his in-game statistics can improve as well. More things are in his control of getting drafted into the NBA than not. To be frank, I think this is a great system that is implemented, because hard work weighs more than talent. Even the greatest of players today did not get to where they are by remaining the same throughout all the years. In fact, some even defied expectations such as Stephen Curry who is the greatest shooter the world has ever seen and revolutionized the sport. 

If I had more time, I really would like to visualize the data by player positions and even look into radar graphs as a better form for visualizations. The only problem with radar graphs is that they need a reference point. Perhaps I could take the most extreme player data of each category and mark those as references for the radar graph. As for including player positions, I think this would really have to be done through data scraping, otherwise doing this manually would take too much time. 