# COGS 108 - Project Proposal

## Authors

Jason Oh: Conceptualization, Background Research, Writing-original draft

Luca Georgescu: Methodology, Data curation, Background Research, Writing-original draft

Brendan Keane: Project administration, Writing- review & editing, Conceptualization

## Research Question

What relationships exist between college basketball performance statistics, including traditional box-score metrics, advanced efficiency metrics, and player attributes, and NBA draft selection status and draft position for players eligible for the NBA Draft?


## Background and Prior Work


The NBA Draft is one of the most important mechanisms through which teams acquire young talent, yet predicting which college basketball players will succeed at the professional level remains extremely difficult. Despite teams having access to extensive scouting resources and increasingly detailed performance data, draft outcomes often fail to align with future NBA productivity. This uncertainty has motivated researchers to examine whether measurable college basketball statistics can meaningfully explain draft position and downstream success, and whether decision makers systematically rely on the “right” information when making draft selections.

Prior research by Sailofsky (2018) examined NCAA Division I basketball statistics and pre-draft player characteristics to determine which factors predict NBA draft position and future NBA performance.<sup><a href="#ref1">1</a></sup> Using regression models on players drafted between 2006 and 2013, the study found that NBA teams tend to emphasize variables such as scoring totals, size, and college conference affiliation, even though these factors do not strongly predict NBA success. Instead, the paper highlights offensive efficiency, ball control, and rebounding efficiency as more reliable indicators of professional performance. This work suggests that draft decisions are often influenced by cognitive biases and overconfidence, leading teams to prioritize visible or traditional metrics rather than those most closely tied to winning.

Similarly, Liner (2020) explored the disconnect between draft position and on-court productivity by analyzing historical draft data and player performance metrics.<sup><a href="#ref2">2</a></sup> Their work reinforces the idea that draft order is an imperfect proxy for player value and that teams frequently overestimate their ability to evaluate talent. The study emphasizes that while statistical modeling can identify patterns in draft outcomes, uncertainty and volatility remain central features of the draft process. These findings motivate further analysis into which college-level indicators are consistently associated with draft outcomes, rather than assuming that high draft picks necessarily reflect superior performance potential.

Beyond performance uncertainty, another reason we are interested in analyzing draft position is the financial structure of the NBA draft. Rookie contracts are tightly regulated by the league’s salary scale, meaning that players selected earlier in the draft are guaranteed significantly higher earnings regardless of performance. As outlined by Sports Illustrated, top draft picks receive multimillion-dollar contracts that can shape both team payrolls and player careers for years.<sup><a href="#ref3">3</a></sup> This financial context underscores why understanding the relationship between college performance metrics and draft position is important, as draft evaluations directly impact both competitive balance and economic outcomes for players and teams.

Building on this prior work, our project focuses specifically on examining the statistical relationships between college basketball performance metrics and NBA draft outcomes for players eligible for the NBA Draft. Our analysis aims to identify patterns and associations in the data, contributing to existing research on draft decision-making and highlighting which types of college-level statistics appear most closely linked to draft selection and draft position.

References

1. <a name="ref1"></a>
Sailofsky, D. (2018). Drafting Errors and Decision Making Theory in the NBA Draft. Brock University Master’s Thesis. https://brocku.scholaris.ca/server/api/core/bitstreams/9940fb5f-cdb8-4b32-afd9-09810ea41dbd/content
<a href="#top">^</a>

2. <a name="ref2"></a>
Liner, J. (2020). Determining the Value of NBA Draft Picks using Advanced Statistics. The University of Arizona Thesis. https://repository.arizona.edu/bitstream/handle/10150/651330/azu_etd_hr_2020_0130_sip1_m.pdf
<a href="#top">^</a>

3. <a name="ref3"></a>
Mckeone, L. (2025) Sports Illustrated. “How Much Do NBA Rookies Make? Breaking Down NBA Draft Salary”
https://www.si.com/nba/nba-draft-pick-rookie-salary-breakdown

<a href="#top">^</a>


## Hypothesis


We expect that college basketball players with stronger advanced efficiency metrics, such as true shooting percentage, PER, and BPM, will be more likely to be drafted and picked earlier in the NBA Draft. These metrics capture how efficiently and consistently a player contributes on the court, which likely matters more to NBA teams than raw box-score numbers alone, especially when projecting how a player’s skills will translate to the professional level.


## Data

The ideal dataset for this project would include college basketball players who are eligible for the NBA Draft, along with both their college performance statistics and draft outcomes. The main variables would include traditional box-score stats such as points, rebounds, assists, steals, and blocks, as well as advanced efficiency metrics like true shooting percentage, PER, and BPM. Player attributes such as height, weight, age, position, conference, and team performance would also be included to provide additional context. To have enough data for meaningful analysis, the dataset would ideally include several hundred player observations across multiple college seasons. These data would be collected from publicly available college basketball statistics websites and combined with official NBA draft records. The data would be organized in a tabular format, with each row representing a player and columns representing their statistics and draft outcomes.

Potential Real Datasets

1. https://www.kaggle.com/code/charlsslrahc/nbadraft (Draft)

This source provides a structured dataset containing player attributes, college performance statistics, and NBA career outcome measures, making it highly relevant for analyzing relationships between college performance and draft success. The inclusion of variables such as points, rebounds, shooting percentages, draft position, and career win shares supports both descriptive and predictive analysis aligned with the research question. The dataset is especially useful because it links college metrics directly to professional performance indicators, allowing for evaluation of which statistics best predict draft outcomes. To further strengthen its use, it would be helpful to confirm the dataset’s size, time coverage, and whether it includes advanced efficiency metrics such as PER or BPM needed to fully test the project hypothesis.

2. https://www.kaggle.com/code/yyue11/college-basketball-data-analysis (College) 

This source is useful background because it clearly defines a set of advanced team-level efficiency metrics (e.g., ADJOE, ADJDE, EFG%, TOR) and demonstrates how they can be analyzed to identify performance differences across outcomes. The key takeaway that ADJOE is strongly associated with “championship-caliber” teams supports your hypothesis that efficiency-based metrics can be more informative than raw box-score totals. It also provides a concrete analysis template (correlations, boxplots, and significance testing across groups) that your team could adapt when comparing drafted vs. non-drafted players or earlier vs. later draft picks. To align it more directly with your research question, you’d want to translate these team-level metrics into player-level efficiency measures (or justify using team context) and clarify how these features will be incorporated into your draft prediction models.




## Ethics 

### A. Data Collection
 - [ ] **A.1 Informed consent**:
 - [X] **A.2 Collection bias**:
 The data used in this project may reflect bias introduced during data collection because college basketball statistics and NBA draft outcomes are shaped by unequal media exposure, conference visibility, and scouting attention. Players from major conferences or high-profile programs may receive more attention than equally skilled players from smaller schools. This project acknowledges that draft outcomes are not a purely objective measure of ability and may reflect these existing biases.
 - [X] **A.3 Limit PII exposure**: 
 This project uses only publicly available data and does not include sensitive or private personal information. Player attributes such as height, age, and position are already public and directly relevant to the analysis. No unnecessary personally identifiable information is collected or displayed.
 - [ ] **A.4 Downstream bias mitigation**: 

### B. Data Storage
 - [X] **B.1 Data security**:
  All datasets will be stored locally or in private course repositories and used only for this class project. Since the data is public and non-sensitive, security risks are minimal, but basic precautions such as restricted access and avoiding unnecessary data sharing will still be followed. 
 - [ ] **B.2 Right to be forgotten**:
 - [ ] **B.3 Data retention plan**:

### C. Analysis
 - [ ] **C.1 Missing perspectives**: 
 - [ ] **C.2 Dataset bias**: 
 - [X] **C.3 Honest representation**:
 Visualizations and summary statistics will be designed to accurately reflect the data without exaggerating trends or implying causation. Any limitations or weak relationships observed will be clearly reported rather than omitted.
 - [ ] **C.4 Privacy in analysis**: 
 - [X] **C.5 Auditability**:
  The analysis will be documented in a Jupyter notebook with clear explanations of data sources, cleaning steps, and analysis decisions so that results can be reproduced or reviewed if questions arise later.

### D. Modeling
 - [ ] **D.1 Proxy discrimination**: 
 - [ ] **D.2 Fairness across groups**: 
 - [ ] **D.3 Metric selection**: 
 - [X] **D.4 Explainability**:
  All modeling approaches used in this project will be simple and interpretable, making it clear how variables relate to draft outcomes.
 - [X] **D.5 Communicate limitations**:
  The limitations and assumptions of the analysis will be clearly stated so results are not misunderstood or overgeneralized.

### E. Deployment
 - [ ] **E.1 Monitoring and evaluation**:
 - [ ] **E.2 Redress**:
 - [ ] **E.3 Roll back**:
 - [ ] **E.4 Unintended use**:


## Team Expectations 

* *Team Expectation 1: Communication and Meetings*

Our team will primarily communicate through messages for quick updates and questions, and we will use email for more formal communication if needed. We expect responses within 24 hours during the week unless someone communicates otherwise. The team will meet at least once per week, either in person or virtually, to check in on progress, discuss difficulties, and plan next steps.

* *Team Expectation 2: Tone and Respectful Collaboration*

 We agree to communicate in a respectful, direct, and constructive manner. Feedback should be honest but polite, with the goal of improving the project rather than criticizing individuals.

* *Team Expectation 3: Task Distribution and Accountability*

 Work will be divided so that responsibilities are shared evenly across the project, with each member contributing to multiple aspects such as data collection, analysis, coding, and writing. Tasks and deadlines will be clearly assigned and tracked in a shared document so everyone can see progress. If someone is struggling to complete a task, they are expected to communicate this early so the team can adjust or provide support.

* *Team Expectation 4: Decision-Making Process*

 Most decisions will be made through group discussion and consensus when possible. If a decision needs to be made quickly and not all members are available, the members present will move forward and update the rest of the team afterward. For technical or section-specific decisions, responsibility may be delegated to the member leading that part of the project.

* *Team Expectation 5: Handling Conflict and Team Issues*

 If conflicts or concerns arise, we will address them directly and respectfully as a group rather than letting issues build up. If a team member consistently does not meet agreed upon expectations, the group will first communicate concerns clearly and give the member an opportunity to improve. If issues continue, we will follow the course policy and reach out to the instructor as needed.

## Project Timeline Proposal


| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| Week 2  |  TBD | Read through COGS 108 project expectations and team policies; brainstorm possible project ideas and research questions | Decide on final project topic and research question; agree on communication platform and team expectations | 
| Week 3  |  TBD |  Conduct background research on college basketball analytics and NBA draft evaluation | Discuss background findings; refine research question and hypothesis; outline ideal dataset and ethical considerations | 
| Week 4  | TBD  | Search for and explore potential datasets  | Discuss potential datasets and their limitations. Start working on project proposal   |
| Week 5  | TBD  | Complete project proposal | Decide final dataset(s); assign roles for data wrangling, EDA, and writing   |
| Week 6  | TBD  | Begin data cleaning and wrangling; create initial summaries of key variables | Review data wrangling progress; identify missing values or issues; plan EDA visualizations |
| Week 7  | TBD  | Complete data wrangling; begin exploratory data analysis| Discuss EDA results; identify interesting patterns; decide which relationships to analyze further |
| Week 8  | TBD  | Continue analysis (correlations, regressions); draft results and discussion sections | Review analysis and interpretations; discuss limitations and ethical considerations |
| Week 9  | TBD  | Finalize analysis, visuals, and written sections; edit for clarity and consistency | Final review of full project; assign final edits and prepare for submission |
| Week 10  | TBD  | NA | Submit final notebook and complete group evaluations |