# COGS 108 - Project Proposal

## Authors

Example team list and credits:
- Erika Tong: Writing, Conceptualization
- Jason Wilkens: Writing, Conceptualization
- Elaine Sun: Writing, Background research
- Mohamed Adem: Writing, Data Curation
- Timothy Kim: Analysis, Review and Editing 

## Research Question

Does playing back-to-back NBA games affect team performance? Using data from 2000 to the present, how do key performance metrics—such as points scored, field goal percentage, turnovers, and win/loss outcome—differ between games played on consecutive days and games with at least one day of rest?

## Background and Prior Work

Basketball is a high intensity, physically demanding sport played professionally in the National Basketball Association (NBA), where teams compete in an 82-game regular season typically running from October to April. Each NBA team plays almost daily, often traveling long distances between cities. Over a season, this results in periods of “schedule congestion”, including back-to-back games, where a team plays on consecutive days and games with varying amounts of rest and recovery in between. Performance in these games is usually measured using team-level statistics such as points scored, field goal percentage (FG%), turnovers and the win/loss outcome, as well as more nuanced metrics like net efficiency, effective field goal percentage (an adjusted shooting measure accounting for three-point shots) and pace of play. These metrics provide insight into both offensive and defensive performance in comparison across games and conditions.<a name="cite_ref-1"></a>[<sup>1</sup>](#cite_note-1), <a name="cite_ref-2"></a>[<sup>2</sup>](#cite_note-2)

One specific scheduling feature of interest is the occurrence of back-to-back games, defined as games played by the same team on consecutive days without a full day of rest in between. Back-to-back games are often viewed as particularly challenging because players have limited time for physical recovery and travel between contests. Sports science and basketball analytics suggests that short rest intervals may lead to accumulated fatigue, which can negatively affect shooting efficiency, decision-making and defensive performance.<a name="cite_ref-3"></a>[<sup>3</sup>](#cite_note-3) Moreover, a more closely related prior research on NBA performance metrics has examined which game-related statistics are most strongly associated with winning outcomes in NBA competition.<a name="cite_ref-4"></a>[<sup>4</sup>](#cite_note-4) The research analyzed nearly 4,000 NBA games across multiple seasons and found that field goal percentage and overall shooting efficiency were among the most important variables distinguishing winning and losing teams. The study also observed that teams tend to adopt more conservative styles of play under higher pressure conditions, resulting in changes to points scored and turnover rates. These findings suggest that commonly used performance metrics such as FG%, points scored, and turnovers are meaningful indicators of team success and provide a strong foundation for analyzing how external factors, such as rest and scheduling, may influence performance.

In addition to academic research, prior student-led data science projects have explored NBA performance statistics using approaches that are closely related to the project proposed here. A recent university capstone project<a name="cite_ref-5"></a>[<sup>5</sup>](#cite_note-5) analyzed an NBA Player Performance Statistics dataset sourced from Kaggle, which contains player-level and team-level performance metrics such as points scored, shooting percentages, rebounds, assists and turnovers across multiple seasons. Similar to our project, this work relied on publicly available NBA data and used exploratory data analysis and visualizations to understand how basketball performance metrics relate to one another.

The project investigated relationships between different offensive and performance variables, including patterns of playing style, the association between two-point field goal production and player experience and the relationship between age and three-point scoring. Their findings indicated that player experience is positively associated with two-point scoring output, while the relationship between age and three-point production is not strictly linear. Importantly, the authors emphasized that NBA performance is influenced by multiple interacting variables, highlighting the value of examining several performance metrics together rather than relying on a single statistic. While this prior project focused on player-level characteristics, its methodology and use of NBA performance statistics are relevant to our work. Our project would most likely adopt a similar data-driven and exploratory approach but shifts the unit of analysis to the team and game level, examining how performance metrics differ under different rest conditions, specifically back-to-back games versus games with at least one day of rest. In this way, our analysis builds directly on similar analytical techniques while addressing a distinct and complementary research question within NBA analytics.

Prior research and applied analytics have established that several game-level performance metrics(points scored, field goal percentage, turnovers and rebounding) are strongly associated with winning outcomes in NBA competition. Studies have also demonstrated that contextual factors, including player workload, travel demands and schedule density, influence both physical fatigue and on court decision making. In the same way, demographic and experience-related factors(age, minutes played, and games played) have been shown to correlate with certain aspects of performance, particularly shooting efficiency and scoring output. Together, this body of work suggests that NBA performance is shaped by a combination of tactical choices, player characteristics and external constraints related to scheduling and recovery

However, despite this existing knowledge, explicit comparisons of team-level performance across different rest conditions remain relatively underexplored, particularly using a consistent set of game-level statistics. Much of the prior work focuses either on identifying which performance metrics predict winning or on player-level characteristics such as age and experience, rather than isolating the role of rest availability itself. Fewer studies systematically compare how the same core performance metrics differ between games played on consecutive days and games played with additional rest, even though back-to-back games are a common and structurally important feature of the NBA schedule. This gap motivates our present project. In this study, we treat rest condition, specifically back-to-back games versus games with at least one full day of rest, while also exploring how other team-level performance metrics relate to game outcomes. This approach builds on existing research by focusing directly on scheduling-related factors, while still allowing for exploratory analysis of other relevant performance metrics. Overall, this project represents a logical next step in NBA performance analysis and helps connect prior work on game statistics and scheduling factors to a data-driven research question.




1. <a name="cite_note-1"></a> [^](#cite_ref-1) The Hidden Metrics That Matter Most for Predicting NBA Outcomes - NBAstuffer. (2025, September 19). NBAstuffer. https://www.nbastuffer.com/nba-metrics-for-outcome-predictions/
2. <a name="cite_note-2"></a> [^](#cite_ref-2) Basketball Reference. (2019). Glossary | Basketball-Reference.com. Basketball-Reference.com. https://www.basketball-reference.com/about/glossary.html
3. <a name="cite_note-3"></a> [^](#cite_ref-3) Back to Backs in the NBA | blog maverick. (2005, December 21). Blogmaverick.com. https://blogmaverick.com/2005/12/21/back-to-backs-in-the-nba/
4. <a name="cite_note-4"></a> [^](#cite_ref-4) Cabarkapa, D., Deane, M. A., Fry, A. C., Jones, G. T., Cabarkapa, D. V., Philipp, N. M., & Yu, D. (2022). Game statistics that discriminate winning and losing at the NBA level of basketball competition. PLOS ONE, 17(8), e0273427. https://doi.org/10.1371/journal.pone.0273427
5. <a name="cite_note-5"></a> [^](#cite_ref-5) Yi, B., Balusu, N., Gogineni, R., & Gorji, Z. (2023). NBA Performance Stats Data Visualization. Cmu.edu. https://www.stat.cmu.edu/capstoneresearch/spring2023/315files_s23/team8.html

## Hypothesis


We predict that NBA teams playing back-to-back games will show decreased performance as a result of natural fatigue and minimal time to recover, compared to games played with at least one day of rest. Specifically we expect lower points scored and field goal percentages, higher turnovers, and a lower probability of winning in back-to-back games. However, observed differences may also be influenced by opponent strength or player absences due to injuries, rest, or suspensions.

## Data

1. Ideal Dataset

The ideal dataset for this project would be an NBA game dataset where each observation represents one team in one game. The dataset would include variables such as game date, team and opponent identifiers, points scored, field goal percentage, total turnovers, home/away status, and win/loss outcome. Additional variables could include the number of days since a team’s previous game and an indicator identifying whether the game was played on the second night of a back-to-back or not. To answer this question effectively, the dataset should encompass multiple NBA seasons, yielding tens of thousands of observations of team performances across games. The data would be collected from official NBA game logs and league schedules and stored in a structured tabular format, such as a  CSV file, where each row represents a single team's performance in a game and columns represent performance metrics and rest-related variables.

2. Real Dataset

A potential real dataset for this project is the “NBA Dataset – Box Scores & Stats, 1947–Today” available on Kaggle (https://www.kaggle.com/datasets/eoinamoore/historical-nba-data-and-player-box-scores). The dataset includes team-level statistics for each NBA game, such as game date, team identifiers, points scored, field goal percentage, and turnovers, which are relevant for analyzing performance in back-to-back games.

## Ethics 
### A. Data Collection
 - [X] **A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent?

> The data used in this project is from publicly available records of professional NBA games. Professional players expect their performance statistics and related information to be recorded, published, and analyzed. So informed consent from each player was not obtained for this analysis.

 - [X] **A.2 Collection bias**: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those?

> The data used in this project should contain little collection bias as the data consists of all professional basketball players and their stats throughout their games played. 

 - [X] **A.3 Limit PII exposure**: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis?

> Player names and basic information are publicly reported as part of professional sports and the dataset includes these and does not include sensitive information. Our analysis is limited to player/team statistics and does not attempt to expose players outside of what is already available.

 - [X] **A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)?

### B. Data Storage
 - [X] **B.1 Data security**: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)?

> All data will be collected and uploaded to this github repository, which can only be edited by those who have access, which are graders and members of our group.

 - [X] **B.2 Right to be forgotten**: Do we have a mechanism through which an individual can request their personal information be removed?

> Professional player information is pubicly available and we do not include any sensitive information in our analysis of their statistics aside from their name and the team they play for.

 - [X] **B.3 Data retention plan**: Is there a schedule or plan to delete the data after it is no longer needed?

> Any plans to remove any data can be discussed with groups consent, but by default, we plan to leave the data as it is.

### C. Analysis
 - [X] **C.1 Missing perspectives**: Have we sought to address blindspots in the analysis through engagement with relevant stakeholders (e.g., checking assumptions and discussing implications with affected communities and subject matter experts)?
 - [X] **C.2 Dataset bias**: Have we examined the data for possible sources of bias and taken steps to mitigate or address these biases (e.g., stereotype perpetuation, confirmation bias, imbalanced classes, or omitted confounding variables)?

 > Dataset bias may be present as the data includes games from 2000-2025 which may introduce bias due to missing or inconsistent data in the older seasons. Inconsistencies in the comparisions of data in different eras can be introduced due to changes in leagues, rules, differences in skill, and recording methods. Additionally, player conditions such as injuries are not reflected in the dataset as well. We aim to mitigate these issues by ensuring that the data for every year was collected in a consistent manner and by checking that any outliers in performance was not caused by an injury. 

 - [X] **C.3 Honest representation**: Are our visualizations, summary statistics, and reports designed to honestly represent the underlying data?
 - [X] **C.4 Privacy in analysis**: Have we ensured that data with PII are not used or displayed unless necessary for the analysis?
 - [X] **C.5 Auditability**: Is the process of generating the analysis well documented and reproducible if we discover issues in the future?

### D. Modeling
 - [X] **D.1 Proxy discrimination**: Have we ensured that the model does not rely on variables or proxies for variables that are unfairly discriminatory?
 - [X] **D.2 Fairness across groups**: Have we tested model results for fairness with respect to different affected groups (e.g., tested for disparate error rates)?
 - [X] **D.3 Metric selection**: Have we considered the effects of optimizing for our defined metrics and considered additional metrics?
 - [X] **D.4 Explainability**: Can we explain in understandable terms a decision the model made in cases where a justification is needed?
 - [X] **D.5 Communicate limitations**: Have we communicated the shortcomings, limitations, and biases of the model to relevant stakeholders in ways that can be generally understood?

### E. Deployment
 - [X] **E.1 Monitoring and evaluation**: Do we have a clear plan to monitor the model and its impacts after it is deployed (e.g., performance monitoring, regular audit of sample predictions, human review of high-stakes decisions, reviewing downstream impacts of errors or low-confidence decisions, testing for concept drift)?
 - [X] **E.2 Redress**: Have we discussed with our organization a plan for response if users are harmed by the results (e.g., how does the data science team evaluate these cases and update analysis and models to prevent future harm)?
 - [X] **E.3 Roll back**: Is there a way to turn off or roll back the model in production if necessary?
 - [X] **E.4 Unintended use**: Have we taken steps to identify and prevent unintended uses and abuse of the model and do we have a plan to monitor these once the model is deployed?


## Team Expectations 

* Communication platform: messaages via number
* Response time: respond to messages same day unless messages were sent late at night
* Open communication: let the group know if you’ll be busy and slow to respond or if you need help that week with your task
* Majority vote decisions: if someone doesn't respond within half the day we'll make the decision without them unless its an urgent decision
* Distribution of tasks: tasks would mostly be self-assigned to allow members to select whatever they're most comfortable/specialized in while also ensuring that all members are contributing relatively equally in terms of effort
* Assignment deadlines: aim to complete assignments a day before deadline to allow for each review of each others' work
* Conflict prevention/resolution: define and clarify goals/tasks to be accomplished to ensure everyone is on the same page, keep tabs and monitor group progress to eliminate blockers ahead of time, and open and effective communication nonetheless

## Project Timeline Proposal

| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 1/31  |  10 AM | Read & Think about COGS 108 expectations; brainstorm topics/questions  | Determine best form of communication; Discuss and decide on final project topic; discuss hypothesis; begin background research | 
| 2/1  |  10PM |  Do background research on topic | Discuss ideal dataset(s) and ethics; draft project proposal | 
| 2/4  | 6 PM  | Edit, finalize, and submit proposal; Search for datasets  | Discuss Wrangling and possible analytical approaches; Assign group members to lead each specific part   |
| 2/15  | Before 11:59 PM  | Import & Wrangle Data; EDA | Review/Edit wrangling/EDA; Discuss Analysis Plan   |
| 2/22  | Before 11:59 PM  | Finalize wrangling/EDA; Begin Analysis | Discuss/edit Analysis; Complete project check-in |
| 3/13  | Before 11:59 PM  | Complete analysis; Draft results/conclusion/discussion | Discuss/edit full project |
| 3/16  | Before 11:59 PM  | NA | Turn in Final Project & Group Project Surveys |