# COGS 108 - Project Proposal

## Authors

- Alexander Huang Liu: Project administration, Conceptualization, Software, Writing – review & editing

- Brody Vandiver: Data curation, Software, Methodology, Writing – original draft

- Jay Ma: Formal analysis, Investigation, Visualization, Writing – original draft

- Justin Wu: Background research (Investigation), Data curation, Methodology, Writing – review & editing

- Srinivasa Perisetla: Formal analysis, Software, Validation, Writing – original draft


## Research Question

Does cumulative travel stress, quantified by the interaction between total distance traveled (miles), time-zone shifts, and rest intervals (e.g., back to back games), predict a statistically significant decline in a team's Offensive Efficiency (OE) relative to their season average?

Specifically, can we develop a multiple linear regression model that accurately identifies the amount of fatigue tax on shooting percentages (teams FG%) and turnover rates for visiting teams?

This project will test whether geographic factors like eastward travel and high-mileage road trips correlate with specific performance drops. By isolating these variables, we aim to determine if a predictive threshold exists where travel fatigue becomes a dominant predictor of offensive variance.



## Background and Prior Work

The modern landscape of professional basketball is defined by a rigorous competitive structure that frequently pushes the boundaries of human physiological and cognitive limits.

Within the National Basketball Association (NBA), the standard 82-game regular season involves a high density of competition interspersed with frequent transmeridian travel, creating a unique environment where athletic performance is perpetually modulated by recovery status and circadian alignment.

Travel stress in this context is characterized by two distinct but overlapping phenomena: travel fatigue—the acute exhaustion from the logistics of movement—and jet lag, a circadian rhythm disorder caused by crossing multiple time zones. This biological asymmetry is critical, as the human circadian system is naturally more adept at adapting to westward travel (phase delay) than shortening its cycle through eastward travel (phase advance) <a name="cite_note-1"></a><sup>1</sup>.

Research indicates that sleep disruption is the primary mechanism through which travel stress erodes performance. On the first night after travel, athletes experience a predictable reduction in sleep duration based on the number of time zones crossed, with eastward travel being significantly more disruptive (averaging -24.5 minutes) than westward travel <a name="cite_note-2"></a><sup>2</sup>.

These disruptions can manifest as significant decrements in Offensive Efficiency (OE), a holistic metric that measures scoring effectiveness per possession. Specifically, studies have shown that eastward jet lag is associated with a 1.2% decrease in Effective Field Goal Percentage ($eFG\%$) differential <a name="cite_note-3"></a><sup>3</sup>.

Furthermore, mental fatigue disrupts psychomotor vigilance, essential for shooting accuracy and ball security, often leading to a rise in turnover rates (TOV%) at the end of long road trips.

Existing literature has also identified specific "tipping points" where travel fatigue becomes the dominant predictor of performance variance. One of the most robust findings is that cumulative time zone changes of three or more within a three-day period are significantly detrimental to performance <a name="cite_note-4"></a><sup>4</sup>.

Additionally, the back-to-back game scenario remains the most consistent predictor of decline; visiting teams playing on the second night of a back-to-back win only about 36% of the time.

While total mileage is a factor, its impact is most severe when combined with the duration of a multi-city tour, as the cumulative stress of changing environments leads to a measurable decline in $eFG\%$.

To capture these complex, non-linear interactions, modern sports analytics projects have increasingly moved from simple linear models toward advanced techniques like Random Forests and Gradient Boosting (LightGBM) to accurately identify and predict the "fatigue tax" <a name="cite_note-5"></a><sup>5</sup>.

Footnotes

<a name="cite_note-1"></a> ^ Charest, J., et al. (2021). Eastward Jet Lag is Associated with Impaired Performance and Game Outcome in the National Basketball Association. Journal of Clinical Sleep Medicine. https://pmc.ncbi.nlm.nih.gov/articles/PMC9245584/

<a name="cite_note-2"></a> ^ Vitale, K. C., et al. (2017). Sleep Hygiene for Optimizing Recovery in Athletes: Review and Recommendations. International Journal of Sports Medicine. https://pmc.ncbi.nlm.nih.gov/articles/PMC10520441/

<a name="cite_note-3"></a> ^ NBAstuffer. (2026). Team Stats at Home and Away: How to Find Value in NBA Games. https://www.nbastuffer.com/team-stats-at-home-and-away-how-to-find-value-in-nba-games/

<a name="cite_note-4"></a> ^ Nutting, A. W. (2022). Hiding in plain sight: schedule density and travel influence on NBA game outcomes. ResearchGate. https://www.researchgate.net/publication/357856729

<a name="cite_note-5"></a> ^ MITRA, S. (2026). Predicting-Travel-duration. GitHub. https://github.com/sowmenMITRA/Predicting-Travel-duration


## Hypothesis


We hypothesize that **cumulative travel stress**, more specifically the interaction of eastward travel (time zone shifts), total distance covered, and limited rest (playing back to back games), will serve as a **statistically significant predictor** of a decline in a visiting team’s Offensive Efficiency (OE).

### Specific Predictions
* **Directional Asymmetry:** Eastward travel across two or more time zones will result in a much more noticeable decline in Effective Field Goal Percentage ($eFG\%$) than westward travel of equal distance.
* **The "Fatigue Tax" Tipping Point:** The decline will follow a non-linear trend; the combination of high-mileage road trips and back-to-back scenarios creates a "tipping point" that significantly increases Turnover Rates ($TOV\%$).

### Reasoning
This prediction is based on the biological reality of **circadian asymmetry**, where the human body struggles more with "phase advance" (losing time traveling east) than "phase delay" (traveling west). 



As highlighted in our background research, this sleep disruption impairs both the fine motor skills required for shooting accuracy and the cognitive awareness necessary for ball security. We expect the data to show that while distance matters, the timing and direction of travel are the primary drivers of impact on offensive effectiveness.

## Data

To answer our research question, the ideal dataset would combine NBA game-level performance data with detailed scheduling and geographic travel information for each team. Each observation would correspond to a single NBA game played by a visiting team, allowing us to compare performance relative to that team's season baseline.
 Key variables would include:
 1. Outcome / Performance Variables(Dependent Variables)
   1. Offensive Efficiency (OE)
   2. Field Goal Percentage (FG%)
   3. Effective Field Goal Percentage (eFG%)
   4. Turnover Rate (TOV%)
 2. Primary Predictors (Independent Variables):
   1. Total travel distance since last game (miles)
   2. Number of time zones crossed
   3. Rest days since last game
   4. Back-to-back indicator (binary)
   5. Cumulative travel distance over recent games (e.g., last 3–5 games)
To achieve sufficient statistical power, we would ideally collect multiple full NBA seasons (e.g., 5–10 seasons), resulting in approximately 10,000+ visiting-team game observations.
 Data Collection & Organization:
  Game-level statistics would be collected from publicly available NBA statistics websites.
  Schedule data (dates, opponents, locations) would be merged with arena latitude/longitude data.
  Travel distance and time-zone changes would be computed programmatically using arena coordinates and game dates.
  Data would be stored in structured tabular format (CSV files), with one row per game per team.
Potential Real-World Datasets
 1. NBA Official Statistics: https://www.nba.com/stats
 This dataset provides comprehensive team-level and game-level statistics, including offensive efficiency, shooting percentages, and turnover rates. The data is publicly accessible but require web scraping or API usage; care must be taken to follow usage policies
 2. Basketball-Reference Game Logs: https://www.basketball-reference.com
 Basketball-Reference offers historical NBA game logs, team statistics, and advanced metrics going back multiple decades. The data is freely available for academic use and can be downloaded manually or scraped responsibly.
 3. NBA Schedule & Arena Location Data: https://github.com/rlabausa/nba-schedule-data
 These datasets provide NBA schedules and arena geographic coordinates necessary to compute travel distances and time-zone shifts.
  

## Ethics

Instructions: Keep the contents of this cell. For each item on the checklist
-  put an X there if you've considered the item
-  IF THE ITEM IS RELEVANT place a short paragraph after the checklist item discussing the issue.
  
Items on this checklist are meant to provoke discussion among good-faith actors who take their ethical responsibilities seriously. Your teams will document these discussions and decisions for posterity using this section.  You don't have to solve these problems, you just have to acknowledge any potential harm no matter how unlikely.

Here is a [list of real world examples](https://deon.drivendata.org/examples/) for each item in the checklist that can refer to.

[![Deon badge](https://img.shields.io/badge/ethics%20checklist-deon-brightgreen.svg?style=popout-square)](http://deon.drivendata.org/)

### A. Data Collection
 - [X] **A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent?

> Example of how to use the checkbox, and also of how you can put in a short paragraph that discusses the way this checklist item affects your project.  Remove this paragraph and the X in the checkbox before you fill this out for your project

 - [X] **A.2 Collection bias**: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those?
 - [ ] **A.3 Limit PII exposure**: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis?
 - [ ] **A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)?

### B. Data Storage
 - [ ] **B.1 Data security**: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)?
 - [ ] **B.2 Right to be forgotten**: Do we have a mechanism through which an individual can request their personal information be removed?
 - [ ] **B.3 Data retention plan**: Is there a schedule or plan to delete the data after it is no longer needed?

### C. Analysis
 - [ ] **C.1 Missing perspectives**: Have we sought to address blindspots in the analysis through engagement with relevant stakeholders (e.g., checking assumptions and discussing implications with affected communities and subject matter experts)?
 - [X] **C.2 Dataset bias**: Have we examined the data for possible sources of bias and taken steps to mitigate or address these biases (e.g., stereotype perpetuation, confirmation bias, imbalanced classes, or omitted confounding variables)?
 - [X] **C.3 Honest representation**: Are our visualizations, summary statistics, and reports designed to honestly represent the underlying data?
 - [ ] **C.4 Privacy in analysis**: Have we ensured that data with PII are not used or displayed unless necessary for the analysis?
 - [X] **C.5 Auditability**: Is the process of generating the analysis well documented and reproducible if we discover issues in the future?

### D. Modeling
 - [X] **D.1 Proxy discrimination**: Have we ensured that the model does not rely on variables or proxies for variables that are unfairly discriminatory?
 - [X] **D.2 Fairness across groups**: Have we tested model results for fairness with respect to different affected groups (e.g., tested for disparate error rates)?
 - [ ] **D.3 Metric selection**: Have we considered the effects of optimizing for our defined metrics and considered additional metrics?
 - [ ] **D.4 Explainability**: Can we explain in understandable terms a decision the model made in cases where a justification is needed?
 - [ ] **D.5 Communicate limitations**: Have we communicated the shortcomings, limitations, and biases of the model to relevant stakeholders in ways that can be generally understood?

### E. Deployment
 - [ ] **E.1 Monitoring and evaluation**: Do we have a clear plan to monitor the model and its impacts after it is deployed (e.g., performance monitoring, regular audit of sample predictions, human review of high-stakes decisions, reviewing downstream impacts of errors or low-confidence decisions, testing for concept drift)?
 - [ ] **E.2 Redress**: Have we discussed with our organization a plan for response if users are harmed by the results (e.g., how does the data science team evaluate these cases and update analysis and models to prevent future harm)?
 - [ ] **E.3 Roll back**: Is there a way to turn off or roll back the model in production if necessary?
 - [ ] **E.4 Unintended use**: Have we taken steps to identify and prevent unintended uses and abuse of the model and do we have a plan to monitor these once the model is deployed?


## Team Expectations

* *Team Expectation 1*: The team will mainly communicate and discuess through Discord.
* *Team Expectation 2*: The team will meet 2 times. For the initial meeting of the week (planning), all team members must participate in the organization of the week's goal. For the second meeting (review), all team members must update their progress, but if unable to assist must respond within 24 hours. 
* *Team Expectation 3*: Each decision will be done by majority vote.
* *Team Expectation 4*: In the event of a conflict, each involved party will present their point of view. Then, the rest of the team members will collectively agree on the best approach to solve the conflict.
* *Team Expectation 5*: Be cool!

## Project Timeline Proposal

| Meeting Date | Meeting Time | Completed Before Meeting                                                                                                                    | Discuss at Meeting                                                                                                |
| ------------ | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- |
| **2/4**      | 6 PM         | Review project expectations; finalize research question and hypothesis; identify candidate datasets (game logs, schedules, arena locations) | Confirm final project scope; agree on dependent/independent variables; decide on datasets and analytical approach |
| **2/11**     | 6 PM         | Acquire and import datasets; compute preliminary travel variables (distance, time-zone shifts, rest days); begin data cleaning              | Review dataset structure and quality; discuss feature engineering strategy; identify any data gaps                |
| **2/18**     | 6 PM         | Complete data wrangling; engineer fatigue metrics (eastward travel, cumulative mileage, back-to-backs); conduct initial EDA                 | Review and interpret EDA; confirm trends related to travel direction and rest; finalize modeling plan             |
| **2/25**     | 6 PM         | Finalize EDA; run baseline regression models; begin testing interaction effects                                                             | Discuss regression results; assess statistical significance; refine model specifications                          |
| **3/4**      | 6 PM         | Complete full modeling (including non-linear or threshold effects); generate diagnostics and robustness checks                              | Interpret findings; identify “fatigue tax” tipping point; outline results section                                 |
| **3/11**     | 6 PM         | Draft results, discussion, and limitations; create final figures and tables                                                                 | Edit and polish full project; ensure coherence between hypothesis, analysis, and conclusions                      |
| **3/13**     | 6 PM         | Finalize report; clean and document code                                                                                                    | Submit final project                                                                                              |
