# Notebook 3: What Leads to Success? â€” Advanced Golf Stats 

![Infographic](assets/infographic.png)

## Description

This project explores the relationship between advanced golf statistics and player success on the PGA Tour. By analyzing data from multiple seasons, I aimed to identify key performance indicators that contribute to a golfer's success. The data came from [PGA Tour's database](https://www.pgatour.com/stats.html) and includes various advanced metrics: strokes gained (summed and individual), driving accuracy, driving distance, and greens in regulation. Through data cleaning and exploratory data analysis, I manipulated the dataset to represent the strength in correlation between these advanced stats and two success metrics: money earned and scoring average.

## Project Manifest

| Name | Brief description | Link / location | Type |
| --- | --- | --- | --- |
| final_project GitHub | GitHub Repo containing all files, notebooks, and assets related to this project. |https://github.com/MaxGuryan9/final_project | External resource |
| PGA TOUR Stats Frontend | Frontend website with all PGA TOUR stats displayed. This is how I accessed the downloading API endpoint. |https://www.pgatour.com/stats | External resource |
| PGA TOUR stats download endpoint | Source for PGA Tour statistics that populate the raw CSVs. You must input valid stat_id and year values in in url for the .csv file to be pulled. | https://www.pgatour.com/api/stats-download?timePeriod=THROUGH_EVENT&tourCode=R&statsId={stat_id}&year={year} | External resource |
| Raw stat CSVs (2007-2025) | Unprocessed downloads for strokes gained categories, driving, GIR, scoring, earnings, and FedEx ranking | [data/raw/](data/raw) | Data files |
| Intermediate cleaned stats | Normalized per-stat tables with consistent player/value columns | [data/intermediate/](data/intermediate/) | Data files |
| Processed master season dataset | Combined season-level player metrics used for modeling/analysis | [data/processed/master_player_seasons.csv](data/processed/master_player_seasons.csv) | Data files |
| Per-season master CSVs | One file per season with joined metrics (2007-2025) | [data/processed/](data/processed/) | Data files |
| Data download notebook (NOTEBOOK 1) | Notebook used to fetch and stage source stats prior to cleaning | [data_download_and_read.ipynb](data_download_and_read.ipynb) | Notebook |
| Exploratory analysis notebook (NOTEBOOK 2) | Initial EDA/visualization of cleaned data | [data_exploration.ipynb](data_exploration.ipynb) | Notebook |
| Final analysis notebook (NOTEBOOK 3) | Polished analysis and results (this file) | [final_notebook.ipynb](final_notebook.ipynb) | Notebook |
| Data acquisition script | Downloads PGA TOUR CSVs via the stats endpoint for each selected stat/year | [src/download_stats.py](src/download_stats.py) | Script |
| Parsing/cleaning script | Identifies player/value columns and normalizes each stat table | [src/parse_stats.py](src/parse_stats.py) | Script |
| Master dataset builder | Joins cleaned stat tables into master season-level datasets | [src/build_master.py](src/build_master.py) | Script |
| Visualization assets | Generated figures referenced in analysis/reporting | [assets/](assets/) | Assets |
| boxplot_after_normalization.png | Boxplot of distributions after normalization. Shows easier comparison. Used during EDA. | [assets/boxplot_after_normalization.png](assets/boxplot_after_normalization.png) | Asset |
| boxplot_before_normalization.png | Boxplot of distributions before normalization. Shows that they must be normalized for easier comparison. Used during EDA. | [assets/boxplot_before_normalization.png](assets/boxplot_before_normalization.png) | Asset |
| dual_heatmap.png | Heatmap showing strength in correlation of stat metrics to success metric. Used in infographic. | [assets/dual_heatmap.png](assets/dual_heatmap.png) | Asset |
| golf_hole_graphic.png | Freeform figure. AI generated through ChatGPT. Helps represent where on the golf hole each statistical metric comes from. Used in infographic.| [assets/golf_hole_graphic.png](assets/golf_hole_graphic.png) | Asset |
| grouped_barchart.png | Grouped barchart showing the strength in correlation of stat metrics to two success metrics. Better visual interpretation of magnitude. Used in infographic.| [assets/grouped_barchart.png](assets/grouped_barchart.png) | Asset |
| heatmap_over_time.png | Heatmap comparing the strength in correlation of each stat metric over time compared to money earned. Shows trends in strength over time, per stat metric category. Used in infographic. | [assets/heatmap_over_time.png](assets/heatmap_over_time.png) | Asset |
| histogram_for_normality.png | Histograms confirming normal distribution of each stat metric. Confirms that we can normalize data. Used during EDA. | [assets/histogram_for_normality.png](assets/histogram_for_normality.png) | Asset |
| infographic.png | File of infographic created using Canva. Used in Notebook 3. | [assets/infographic.png](assets/infographic.png) | Asset |
| linear_confirmation.png | Figure showing each categories relationship to money earned through a scatterplot. Confirms that a regression model can be used to confirm the strength in correlation. Used during EDA. | [assets/linear_confirmation.png](assets/linear_confirmation.png) | Asset |